Upstream Work

In order to work upstream, Linaro tries to follow the rules of the communities themselves, which are all very different, but they all have the same principles.

This document is an attempt to describe some of those principles and how Linaro works with the communities to provide value to the members at the same time as be relevant in the communities.

The Basics

Linaro's motto is "upstream first". This basically means *anything* we distribute has to have gone upstream somewhere accepted by the community.

We may keep local changes for a limited period when one or more of these apply:

  • We have an NDA with the company and the change is about hardware under that contract.
  • The community hasn't accepted yet but is in the process of accepting.
  • It's an experimental feature that we're trying to convince communities to take in.
  • It's related to Linaro's own infrastructure and will never go upstream.

Apart from the last one, all changes will have a short life inside Linaro and we should push as much as we can for those changes to go upstream. Experiments that don't get accepted upstream should be thrown away.

We do not have the resources to sustain internal projects for too long, and we cannot release features that haven't gone upstream.

What is "public"

Everything that is visible to non-Linaro members is public. But not everything that is public is upstream. Moreover, not everything that is in an upstream repository is considered "upstream" for our purposes.

We consider a patch to be upstream once it has successfully landed on an upstream repository and runs little risk of being reverted or have its behaviour severely altered. The changes must also be in either an existing public release or in its certain way to a future release.

Everything else is public, but not upstream.

Examples:

  • Upstream: changes committed to the gcc/llvm/linux trunks and stable for at least a few days.
  • Upstream: changes merged into official GitHub/GitLab repositories, same as above.
  • Public: branches in upstream repositories that will not (necessarily) make into a future release. This includes all "linaro-local", "linaro-dev" and feature branches.
  • Public: changes merged in GitHub/GitLab forked repositories (your own or someone else's).
  • Public: all changes in Linaro's Git server / GitHub repositories (unless they're the official repos).

This distinction is important, so that we don't stop once we reach Linaro's git server or GitHub. If we stop then, the changes are not upstream nor they will ever reach a public release of any tool, and Linaro will *not* maintain it locally for any period of time.

Essentially, all work will be lost if it's not upstream.

Linaro Releases

In contrast to the upstream policy, Linaro releases a lot of tools with changes that are not in upstream releases yet. But there is no paradox, here. All those changes are in some upstream repository (trunk or next release), and will reach an upstream release soon.

So, all patches to the kernel, toolchain, android that we submit, they go upstream first (into trunk, usually), and then we backport them into older / internal releases. In the same way, we backport other people's patches, too, for our internal releases.

This means a Linaro kernel that works with a board today is guaranteed to lead to an upstream kernel that works in the same way. Nothing is kept exclusively inside Linaro.

Steps to upstream

There are two main ways one can work upstream while at Linaro, and that depends on what you are doing.

If you are fixing bugs, working on trunk features or refactoring upstream code, you *have* to work directly with the upstream community. All patches go there first, nothing stays in Linaro for any period of time. This is important because upstream trees change very rapidly, and we don't have validation infrastructure to cope with all those changes.

If you are developing a new tool, a major change or an experimental feature, you may keep your changes local (inside Linaro's Git or GitHub or a branch in the upstream repo). Where it lies depends on what you're trying to do, but it *has* to be public (unless it's under NDA).

In the second case, once it's time to send upstream, you will invariably have to work on the patches to make them acceptable to the project you're aiming at. Normally, that means: re-base, split/merge patches into logical change-sets, add long and meaningful commit messages, comments, documentation, etc.

However, you *must* be aware of the "public but not upstream" perils:

  • It's not because it's public that people are looking at it
  • It's not because it works that the upstream community will want it
  • It's not because it's a good solution that it will interact well with the upstream plans/goals

Upstream communities are filled with stories about excellent code that got refused because no one wanted, or it wouldn't work on other targets, or the author didn't think about this one tiny problem that makes his/her whole approach completely invalid.

The important take away from this is: the longer it takes for you to go upstream, the harder it *can* be.

Developing locally

If you're developing a local tool, or a large new feature, you'll want to keep things in our repository. But that also means you should push your changes constantly and not keep them on your machine / company server. Essentially, when working at Linaro, you should *not* have any company-local repository and work directly on Linaro's one.

There are a number of benefits of doing so:

  1. It backs up your code in multiple locations (we have lots of mirrors).
  2. Other people in your group can see your work, use it, comment on it, propose changes to it.
  3. Your tech-lead / manager can follow your progress, identify bottlenecks, and more easily help you.
  4. You can point upstream users to it, to make them aware of your proposal.
  5. A tree that changes constantly is a good tree. No one upstream likes code-dump (one commit tree).

These practices are the first step to a healthy team. If we are all explicit on our work, we can see each others progress, solutions, mistakes and we can all learn from the experience.

For example, there are typos, errors, silly changes but also a good track of progress in this simple repo:

https://git.linaro.org/leg/hpc/ohpc-scripts.git/log/

This is perfectly fine, and it's better to see all changes as they come, than see one dump every month.

Collaboration

As we evolve as a team, and start to interact on cross-projects (OpenHPC, LLVM, GCC), we'll need to start using the tools that we have at our disposal. For now, we're mainly doing separate things, and that's fine, but soon enough, there will be a time where we'll have to collaborate, and having a thriving local community will make that possible.

A few tools that we'll start using soon...

Gerrit: https://review.linaro.org/

Code review tool, for proposing Linaro local patches. Can also be used to integrate with Jenkins for pre-commit test:

Jenkins: https://ci.linaro.org/

Validation automation, can build a large number of software we have at Linaro. For example, for LLVM or GCC changes (https://ci.linaro.org/view/tcwg-ci/), we have jobs that will build, test and even package a tarball for you to run further tests.

Bugzilla: https://bugs.linaro.org/

For now, we're only using Jira to track our progress, but as soon as we start shipping tools and distributions (for example, ERP CentOS for HPC), we'll start receiving bug reports in there.

Benchmarks: TBD

We're in talks with the toolchain team to collaborate on running benchmarks on their and our infrastructure, to have a single and reproducible way to track performance.

Mailing Lists

Finally, the last point is about where to post you comments / questions.

A good rule of thumb is to ask *all* questions on public mailing lists, unless it doesn't make sense.

For example, questions about Linaro's infrastructure, roadmap planning, conference sessions are all only meaningful to us locally, so posting to hpc-sig-devel should be enough.

But questions about broken tools, configurations, difficulties on using or changing source code should all go upstream, to the respective communities' mailing lists.

Most importantly, *every* upstream effort (to send code, changes, documentation) *must* go on the upstream mailing lists / pull request. This is a requirement of all open source projects we work with.

Feel free to copy specific people from the group (for example, many people at Linaro copy me on emails to the LLVM list).

But *do not* copy a Linaro -list- into an upstream mailing list. This is known as "cross-posting" and is usually very bad form, because not everyone in one list have access to the other list, and every reply ends up with an error on all sides.

So, if you're going to copy people, copy them directly.