In order to work upstream, Linaro tries to follow the rules of the communities themselves, which are all very different, but they all have the same principles.
This document is an attempt to describe some of those principles and how Linaro works with the communities to provide value to the members at the same time as be relevant in the communities.
The Basics
Linaro's motto is "upstream first". This basically means *anything* we
distribute has to have gone upstream somewhere accepted by the
community.
We may keep local changes for a limited period when one or more of these apply:
- We have an NDA with the company and the change is about hardware under that contract.
- The community hasn't accepted yet but is in the process of accepting.
- It's an experimental feature that we're trying to convince communities to take in.
- It's related to Linaro's own infrastructure and will never go upstream.
Apart from the last one, all changes will have a short life inside
Linaro and we should push as much as we can for those changes to go
upstream. Experiments that don't get accepted upstream should be
thrown away.
We do not have the resources to sustain internal projects for too
long, and we cannot release features that haven't gone upstream.
What is "public"
Everything that is visible to non-Linaro members is public. But not
everything that is public is upstream. Moreover, not everything that
is in an upstream repository is considered "upstream" for our
purposes.
We consider a patch to be upstream once it has successfully landed on
an upstream repository and runs little risk of being reverted or have
its behaviour severely altered. The changes must also be in either an
existing public release or in its certain way to a future release.
Everything else is public, but not upstream.
Examples:
- Upstream: changes committed to the gcc/llvm/linux trunks and stable for at least a few days.
- Upstream: changes merged into official GitHub/GitLab repositories, same as above.
- Public: branches in upstream repositories that will not (necessarily) make into a future release. This includes all "linaro-local", "linaro-dev" and feature branches.
- Public: changes merged in GitHub/GitLab forked repositories (your own or someone else's).
- Public: all changes in Linaro's Git server / GitHub repositories (unless they're the official repos).
This distinction is important, so that we don't stop once we reach
Linaro's git server or GitHub. If we stop then, the changes are not
upstream nor they will ever reach a public release of any tool, and
Linaro will *not* maintain it locally for any period of time.
Essentially, all work will be lost if it's not upstream.
Linaro Releases
In contrast to the upstream policy, Linaro releases a lot of tools
with changes that are not in upstream releases yet. But there is no
paradox, here. All those changes are in some upstream repository
(trunk or next release), and will reach an upstream release soon.
So, all patches to the kernel, toolchain, android that we submit, they
go upstream first (into trunk, usually), and then we backport them
into older / internal releases. In the same way, we backport other
people's patches, too, for our internal releases.
This means a Linaro kernel that works with a board today is guaranteed
to lead to an upstream kernel that works in the same way. Nothing is
kept exclusively inside Linaro.
Steps to upstream
There are two main ways one can work upstream while at Linaro, and
that depends on what you are doing.
If you are fixing bugs, working on trunk features or refactoring
upstream code, you *have* to work directly with the upstream
community. All patches go there first, nothing stays in Linaro for any
period of time. This is important because upstream trees change very
rapidly, and we don't have validation infrastructure to cope with all
those changes.
If you are developing a new tool, a major change or an experimental
feature, you may keep your changes local (inside Linaro's Git or
GitHub or a branch in the upstream repo). Where it lies depends on
what you're trying to do, but it *has* to be public (unless it's under
NDA).
In the second case, once it's time to send upstream, you will
invariably have to work on the patches to make them acceptable to the
project you're aiming at. Normally, that means: re-base, split/merge
patches into logical change-sets, add long and meaningful commit
messages, comments, documentation, etc.
However, you *must* be aware of the "public but not upstream" perils:
- It's not because it's public that people are looking at it
- It's not because it works that the upstream community will want it
- It's not because it's a good solution that it will interact well with the upstream plans/goals
Upstream communities are filled with stories about excellent code that
got refused because no one wanted, or it wouldn't work on other
targets, or the author didn't think about this one tiny problem that
makes his/her whole approach completely invalid.
The important take away from this is: the longer it takes for you to
go upstream, the harder it *can* be.
Developing locally
If you're developing a local tool, or a large new feature, you'll want
to keep things in our repository. But that also means you should push
your changes constantly and not keep them on your machine / company
server. Essentially, when working at Linaro, you should *not* have any
company-local repository and work directly on Linaro's one.
There are a number of benefits of doing so:
1. It backs up your code in multiple locations (we have lots of mirrors).
2. Other people in your group can see your work, use it, comment on
it, propose changes to it.
3. Your tech-lead / manager can follow your progress, identify
bottlenecks, and more easily help you.
4. You can point upstream users to it, to make them aware of your proposal.
5. A tree that changes constantly is a good tree. No one upstream
likes code-dump (one commit tree).
These practices are the first step to a healthy team. If we are all
explicit on our work, we can see each others progress, solutions,
mistakes and we can all learn from the experience.
For example, there are typos, errors, silly changes but also a good
track of progress in this simple repo:
https://git.linaro.org/leg/hpc/ohpc-scripts.git/log/
This is perfectly fine, and it's better to see all changes as they
come, than see one dump every month.
Collaboration
As we evolve as a team, and start to interact on cross-projects
(OpenHPC, LLVM, GCC), we'll need to start using the tools that we have
at our disposal. For now, we're mainly doing separate things, and
that's fine, but soon enough, there will be a time where we'll have to
collaborate, and having a thriving local community will make that
possible.
A few tools that we'll start using soon...
Gerrit: https://review.linaro.org/
Code review tool, for proposing Linaro local patches. Can also be used
to integrate with Jenkins for pre-commit test:
Jenkins: https://ci.linaro.org/
Validation automation, can build a large number of software we have at
Linaro. For example, for LLVM or GCC changes
(https://ci.linaro.org/view/tcwg-ci/), we have jobs that will build,
test and even package a tarball for you to run further tests.
Bugzilla: https://bugs.linaro.org/
For now, we're only using Jira to track our progress, but as soon as
we start shipping tools and distributions (for example, ERP CentOS for
HPC), we'll start receiving bug reports in there.
Benchmarks: TBD
We're in talks with the toolchain team to collaborate on running
benchmarks on their and our infrastructure, to have a single and
reproducible way to track performance.
Mailing Lists
Finally, the last point is about where to post you comments / questions.
A good rule of thumb is to ask *all* questions on public mailing
lists, unless it doesn't make sense.
For example, questions about Linaro's infrastructure, roadmap
planning, conference sessions are all only meaningful to us locally,
so posting to hpc-sig-devel should be enough.
But questions about broken tools, configurations, difficulties on
using or changing source code should all go upstream, to the
respective communities' mailing lists.
Most importantly, *every* upstream effort (to send code, changes,
documentation) *must* go on the upstream mailing lists / pull request.
This is a requirement of all open source projects we work with.
Feel free to copy specific people from the group (for example, many
people at Linaro copy me on emails to the LLVM list).
But *do not* copy a Linaro -list- into an upstream mailing list. This
is known as "cross-posting" and is usually very bad form, because not
everyone in one list have access to the other list, and every reply
ends up with an error on all sides.
So, if you're going to copy people, copy them directly.