123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086 |
- ==============================
- Moving LLVM Projects to GitHub
- ==============================
- Current Status
- ==============
- We are planning to complete the transition to GitHub by Oct 21, 2019. See
- the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_
- for the latest updates and instructions for how to migrate your workflows.
- .. contents:: Table of Contents
- :depth: 4
- :local:
- Introduction
- ============
- This is a proposal to move our current revision control system from our own
- hosted Subversion to GitHub. Below are the financial and technical arguments as
- to why we are proposing such a move and how people (and validation
- infrastructure) will continue to work with a Git-based LLVM.
- What This Proposal is *Not* About
- =================================
- Changing the development policy.
- This proposal relates only to moving the hosting of our source-code repository
- from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
- using GitHub's issue tracker, pull-requests, or code-review.
- Contributors will continue to earn commit access on demand under the Developer
- Policy, except that that a GitHub account will be required instead of SVN
- username/password-hash.
- Why Git, and Why GitHub?
- ========================
- Why Move At All?
- ----------------
- This discussion began because we currently host our own Subversion server
- and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
- provides limited support, but there is only so much it can do.
- Volunteers are not sysadmins themselves, but compiler engineers that happen
- to know a thing or two about hosting servers. We also don't have 24/7 support,
- and we sometimes wake up to see that continuous integration is broken because
- the SVN server is either down or unresponsive.
- We should take advantage of one of the services out there (GitHub, GitLab,
- and BitBucket, among others) that offer better service (24/7 stability, disk
- space, Git server, code browsing, forking facilities, etc) for free.
- Why Git?
- --------
- Many new coders nowadays start with Git, and a lot of people have never used
- SVN, CVS, or anything else. Websites like GitHub have changed the landscape
- of open source contributions, reducing the cost of first contribution and
- fostering collaboration.
- Git is also the version control many LLVM developers use. Despite the
- sources being stored in a SVN server, these developers are already using Git
- through the Git-SVN integration.
- Git allows you to:
- * Commit, squash, merge, and fork locally without touching the remote server.
- * Maintain local branches, enabling multiple threads of development.
- * Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
- * Inspect the repository history (blame, log, bisect) without Internet access.
- * Maintain remote forks and branches on Git hosting services and
- integrate back to the main repository.
- In addition, because Git seems to be replacing many OSS projects' version
- control systems, there are many tools that are built over Git.
- Future tooling may support Git first (if not only).
- Why GitHub?
- -----------
- GitHub, like GitLab and BitBucket, provides free code hosting for open source
- projects. Any of these could replace the code-hosting infrastructure that we
- have today.
- These services also have a dedicated team to monitor, migrate, improve and
- distribute the contents of the repositories depending on region and load.
- GitHub has one important advantage over GitLab and
- BitBucket: it offers read-write **SVN** access to the repository
- (https://github.com/blog/626-announcing-svn-support).
- This would enable people to continue working post-migration as though our code
- were still canonically in an SVN repository.
- In addition, there are already multiple LLVM mirrors on GitHub, indicating that
- part of our community has already settled there.
- On Managing Revision Numbers with Git
- -------------------------------------
- The current SVN repository hosts all the LLVM sub-projects alongside each other.
- A single revision number (e.g. r123456) thus identifies a consistent version of
- all LLVM sub-projects.
- Git does not use sequential integer revision number but instead uses a hash to
- identify each commit.
- The loss of a sequential integer revision number has been a sticking point in
- past discussions about Git:
- - "The 'branch' I most care about is mainline, and losing the ability to say
- 'fixed in r1234' (with some sort of monotonically increasing number) would
- be a tragic loss." [LattnerRevNum]_
- - "I like those results sorted by time and the chronology should be obvious, but
- timestamps are incredibly cumbersome and make it difficult to verify that a
- given checkout matches a given set of results." [TrickRevNum]_
- - "There is still the major regression with unreadable version numbers.
- Given the amount of Bugzilla traffic with 'Fixed in...', that's a
- non-trivial issue." [JSonnRevNum]_
- - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
- However, Git can emulate this increasing revision number:
- ``git rev-list --count <commit-hash>``. This identifier is unique only
- within a single branch, but this means the tuple `(num, branch-name)` uniquely
- identifies a commit.
- We can thus use this revision number to ensure that e.g. `clang -v` reports a
- user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
- the objections raised above with respect to this aspect of Git.
- What About Branches and Merges?
- -------------------------------
- In contrast to SVN, Git makes branching easy. Git's commit history is
- represented as a DAG, a departure from SVN's linear history. However, we propose
- to mandate making merge commits illegal in our canonical Git repository.
- Unfortunately, GitHub does not support server side hooks to enforce such a
- policy. We must rely on the community to avoid pushing merge commits.
- GitHub offers a feature called `Status Checks`: a branch protected by
- `status checks` requires commits to be whitelisted before the push can happen.
- We could supply a pre-push hook on the client side that would run and check the
- history, before whitelisting the commit being pushed [statuschecks]_.
- However this solution would be somewhat fragile (how do you update a script
- installed on every developer machine?) and prevents SVN access to the
- repository.
- What About Commit Emails?
- -------------------------
- We will need a new bot to send emails for each commit. This proposal leaves the
- email format unchanged besides the commit URL.
- Straw Man Migration Plan
- ========================
- Step #1 : Before The Move
- -------------------------
- 1. Update docs to mention the move, so people are aware of what is going on.
- 2. Set up a read-only version of the GitHub project, mirroring our current SVN
- repository.
- 3. Add the required bots to implement the commit emails, as well as the
- umbrella repository update (if the multirepo is selected) or the read-only
- Git views for the sub-projects (if the monorepo is selected).
- Step #2 : Git Move
- ------------------
- 4. Update the buildbots to pick up updates and commits from the GitHub
- repository. Not all bots have to migrate at this point, but it'll help
- provide infrastructure testing.
- 5. Update Phabricator to pick up commits from the GitHub repository.
- 6. LNT and llvmlab have to be updated: they rely on unique monotonically
- increasing integer across branch [MatthewsRevNum]_.
- 7. Instruct downstream integrators to pick up commits from the GitHub
- repository.
- 8. Review and prepare an update for the LLVM documentation.
- Until this point nothing has changed for developers, it will just
- boil down to a lot of work for buildbot and other infrastructure
- owners.
- The migration will pause here until all dependencies have cleared, and all
- problems have been solved.
- Step #3: Write Access Move
- --------------------------
- 9. Collect developers' GitHub account information, and add them to the project.
- 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
- 11. Update the documentation.
- 12. Mirror Git to SVN.
- Step #4 : Post Move
- -------------------
- 13. Archive the SVN repository.
- 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
- point to GitHub instead.
- Github Repository Description
- =============================
- Monorepo
- ----------------
- The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
- sub-projects in a single source tree. It is often refered to as a monorepo and
- mimics an export of the current SVN repository, with each sub-project having its
- own top-level directory. Not all sub-projects are used for building toolchains.
- For example, www/ and test-suite/ are not part of the monorepo.
- Putting all sub-projects in a single checkout makes cross-project refactoring
- naturally simple:
- * New sub-projects can be trivially split out for better reuse and/or layering
- (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
- dependency on LLVM).
- * Changing an API in LLVM and upgrading the sub-projects will always be done in
- a single commit, designing away a common source of temporary build breakage.
- * Moving code across sub-project (during refactoring for instance) in a single
- commit enables accurate `git blame` when tracking code change history.
- * Tooling based on `git grep` works natively across sub-projects, allowing to
- easier find refactoring opportunities across projects (for example reusing a
- datastructure initially in LLDB by moving it into libSupport).
- * Having all the sources present encourages maintaining the other sub-projects
- when changing API.
- Finally, the monorepo maintains the property of the existing SVN repository that
- the sub-projects move synchronously, and a single revision number (or commit
- hash) identifies the state of the development across all projects.
- .. _build_single_project:
- Building a single sub-project
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Even though there is a single source tree, you are not required to build
- all sub-projects together. It is trivial to configure builds for a single
- sub-project.
- For example::
- mkdir build && cd build
- # Configure only LLVM (default)
- cmake path/to/monorepo
- # Configure LLVM and lld
- cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
- # Configure LLVM and clang
- cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
- .. _git-svn-mirror:
- Outstanding Questions
- ---------------------
- Read-only sub-project mirrors
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- With the Monorepo, it is undecided whether the existing single-subproject
- mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
- be maintained.
- Read/write SVN bridge
- ^^^^^^^^^^^^^^^^^^^^^
- GitHub supports a read/write SVN bridge for its repositories. However,
- there have been issues with this bridge working correctly in the past,
- so it's not clear if this is something that will be supported going forward.
- Monorepo Drawbacks
- ------------------
- * Using the monolithic repository may add overhead for those contributing to a
- standalone sub-project, particularly on runtimes like libcxx and compiler-rt
- that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
- 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
- `git push` collisions when upstreaming. Affected contributors may be able to
- use the SVN bridge or the single-subproject Git mirrors. However, it's
- undecided if these projects will continue to be mantained.
- * Using the monolithic repository may add overhead for those *integrating* a
- standalone sub-project, even if they aren't contributing to it, due to the
- same disk space concern as the point above. The availability of the
- sub-project Git mirrors would addresses this.
- * Preservation of the existing read/write SVN-based workflows relies on the
- GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
- into GitHub and could restrict future workflow changes.
- Workflows
- ^^^^^^^^^
- * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
- * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
- * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
- * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
- * :ref:`Bisecting <workflow-mono-bisecting>`.
- Workflow Before/After
- =====================
- This section goes through a few examples of workflows, intended to illustrate
- how end-users or developers would interact with the repository for
- various use-cases.
- .. _workflow-checkout-commit:
- Checkout/Clone a Single Project, with Commit Access
- ---------------------------------------------------
- Currently
- ^^^^^^^^^
- ::
- # direct SVN checkout
- svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
- # or using the read-only Git view, with git-svn
- git clone http://llvm.org/git/llvm.git
- cd llvm
- git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
- git config svn-remote.svn.fetch :refs/remotes/origin/master
- git svn rebase -l # -l avoids fetching ahead of the git mirror.
- Commits are performed using `svn commit` or with the sequence `git commit` and
- `git svn dcommit`.
- .. _workflow-multicheckout-nocommit:
- Monorepo Variant
- ^^^^^^^^^^^^^^^^
- With the monorepo variant, there are a few options, depending on your
- constraints. First, you could just clone the full repository:
- git clone https://github.com/llvm/llvm-project.git
- At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
- :ref:`doesn't imply you have to build all of them <build_single_project>`. You
- can still build only compiler-rt for instance. In this way it's not different
- from someone who would check out all the projects with SVN today.
- If you want to avoid checking out all the sources, you can hide the other
- directories using a Git sparse checkout::
- git config core.sparseCheckout true
- echo /compiler-rt > .git/info/sparse-checkout
- git read-tree -mu HEAD
- The data for all sub-projects is still in your `.git` directory, but in your
- checkout, you only see `compiler-rt`.
- Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
- usual.
- Note that when you fetch you'll likely pull in changes to sub-projects you don't
- care about. If you are using spasre checkout, the files from other projects
- won't appear on your disk. The only effect is that your commit hash changes.
- You can check whether the changes in the last fetch are relevant to your commit
- by running::
- git log origin/master@{1}..origin/master -- libcxx
- This command can be hidden in a script so that `git llvmpush` would perform all
- these steps, fail only if such a dependent change exists, and show immediately
- the change that prevented the push. An immediate repeat of the command would
- (almost) certainly result in a successful push.
- Note that today with SVN or git-svn, this step is not possible since the
- "rebase" implicitly happens while committing (unless a conflict occurs).
- Checkout/Clone Multiple Projects, with Commit Access
- ----------------------------------------------------
- Let's look how to assemble llvm+clang+libcxx at a given revision.
- Currently
- ^^^^^^^^^
- ::
- svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
- cd llvm/tools
- svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
- cd ../projects
- svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
- Or using git-svn::
- git clone http://llvm.org/git/llvm.git
- cd llvm/
- git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
- git config svn-remote.svn.fetch :refs/remotes/origin/master
- git svn rebase -l
- git checkout `git svn find-rev -B r258109`
- cd tools
- git clone http://llvm.org/git/clang.git
- cd clang/
- git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
- git config svn-remote.svn.fetch :refs/remotes/origin/master
- git svn rebase -l
- git checkout `git svn find-rev -B r258109`
- cd ../../projects/
- git clone http://llvm.org/git/libcxx.git
- cd libcxx
- git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
- git config svn-remote.svn.fetch :refs/remotes/origin/master
- git svn rebase -l
- git checkout `git svn find-rev -B r258109`
- Note that the list would be longer with more sub-projects.
- .. _workflow-monocheckout-multicommit:
- Monorepo Variant
- ^^^^^^^^^^^^^^^^
- The repository contains natively the source for every sub-projects at the right
- revision, which makes this straightforward::
- git clone https://github.com/llvm/llvm-project.git
- cd llvm-projects
- git checkout $REVISION
- As before, at this point clang, llvm, and libcxx are stored in directories
- alongside each other.
- .. _workflow-cross-repo-commit:
- Commit an API Change in LLVM and Update the Sub-projects
- --------------------------------------------------------
- Today this is possible, even though not common (at least not documented) for
- subversion users and for git-svn users. For example, few Git users try to update
- LLD or Clang in the same commit as they change an LLVM API.
- The multirepo variant does not address this: one would have to commit and push
- separately in every individual repository. It would be possible to establish a
- protocol whereby users add a special token to their commit messages that causes
- the umbrella repo's updater bot to group all of them into a single revision.
- The monorepo variant handles this natively.
- Branching/Stashing/Updating for Local Development or Experiments
- ----------------------------------------------------------------
- Currently
- ^^^^^^^^^
- SVN does not allow this use case, but developers that are currently using
- git-svn can do it. Let's look in practice what it means when dealing with
- multiple sub-projects.
- To update the repository to tip of trunk::
- git pull
- cd tools/clang
- git pull
- cd ../../projects/libcxx
- git pull
- To create a new branch::
- git checkout -b MyBranch
- cd tools/clang
- git checkout -b MyBranch
- cd ../../projects/libcxx
- git checkout -b MyBranch
- To switch branches::
- git checkout AnotherBranch
- cd tools/clang
- git checkout AnotherBranch
- cd ../../projects/libcxx
- git checkout AnotherBranch
- .. _workflow-mono-branching:
- Monorepo Variant
- ^^^^^^^^^^^^^^^^
- Regular Git commands are sufficient, because everything is in a single
- repository:
- To update the repository to tip of trunk::
- git pull
- To create a new branch::
- git checkout -b MyBranch
- To switch branches::
- git checkout AnotherBranch
- Bisecting
- ---------
- Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
- Currently
- ^^^^^^^^^
- SVN does not have builtin bisection support, but the single revision across
- sub-projects makes it possible to script around.
- Using the existing Git read-only view of the repositories, it is possible to use
- the native Git bisection script over the llvm repository, and use some scripting
- to synchronize the clang repository to match the llvm revision.
- .. _workflow-mono-bisecting:
- Monorepo Variant
- ^^^^^^^^^^^^^^^^
- Bisecting on the monorepo is straightforward, and very similar to the above,
- except that the bisection script does not need to include the
- `git submodule update` step.
- The same example, finding which commit introduces a regression where clang-3.9
- crashes but not clang-3.8 passes, will look like::
- git bisect start releases/3.9.x releases/3.8.x
- git bisect run ./bisect_script.sh
- With the `bisect_script.sh` script being::
- #!/bin/sh
- cd $BUILD_DIR
- ninja clang || exit 125 # an exit code of 125 asks "git bisect"
- # to "skip" the current commit
- ./bin/clang some_crash_test.cpp
- Also, since the monorepo handles commits update across multiple projects, you're
- less like to encounter a build failure where a commit change an API in LLVM and
- another later one "fixes" the build in clang.
- Moving Local Branches to the Monorepo
- =====================================
- Suppose you have been developing against the existing LLVM git
- mirrors. You have one or more git branches that you want to migrate
- to the "final monorepo".
- The simplest way to migrate such branches is with the
- ``migrate-downstream-fork.py`` tool at
- https://github.com/jyknight/llvm-git-migration.
- Basic migration
- ---------------
- Basic instructions for ``migrate-downstream-fork.py`` are in the
- Python script and are expanded on below to a more general recipe::
- # Make a repository which will become your final local mirror of the
- # monorepo.
- mkdir my-monorepo
- git -C my-monorepo init
- # Add a remote to the monorepo.
- git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
- # Add remotes for each git mirror you use, from upstream as well as
- # your local mirror. All projects are listed here but you need only
- # import those for which you have local branches.
- my_projects=( clang
- clang-tools-extra
- compiler-rt
- debuginfo-tests
- libcxx
- libcxxabi
- libunwind
- lld
- lldb
- llvm
- openmp
- polly )
- for p in ${my_projects[@]}; do
- git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
- git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
- done
- # Pull in all the commits.
- git -C my-monorepo fetch --all
- # Run migrate-downstream-fork to rewrite local branches on top of
- # the upstream monorepo.
- (
- cd my-monorepo
- migrate-downstream-fork.py \
- refs/remotes/local \
- refs/tags \
- --new-repo-prefix=refs/remotes/upstream/monorepo \
- --old-repo-prefix=refs/remotes/upstream/split \
- --source-kind=split \
- --revmap-out=monorepo-map.txt
- )
- # Octopus-merge the resulting local split histories to unify them.
- # Assumes local work on local split mirrors is on master (and
- # upstream is presumably represented by some other branch like
- # upstream/master).
- my_local_branch="master"
- git -C my-monorepo branch --no-track local/octopus/master \
- $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
- refs/remotes/local/split/llvm/${my_local_branch})
- git -C my-monorepo checkout local/octopus/${my_local_branch}
- subproject_branches=()
- for p in ${my_projects[@]}; do
- subproject_branch=${p}/local/monorepo/${my_local_branch}
- git -C my-monorepo branch ${subproject_branch} \
- refs/remotes/local/split/${p}/${my_local_branch}
- if [[ "${p}" != "llvm" ]]; then
- subproject_branches+=( ${subproject_branch} )
- fi
- done
- git -C my-monorepo merge ${subproject_branches[@]}
- for p in ${my_projects[@]}; do
- subproject_branch=${p}/local/monorepo/${my_local_branch}
- git -C my-monorepo branch -d ${subproject_branch}
- done
- # Create local branches for upstream monorepo branches.
- for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
- refs/remotes/upstream/monorepo); do
- upstream_branch=${ref#refs/remotes/upstream/monorepo/}
- git -C my-monorepo branch upstream/${upstream_branch} ${ref}
- done
- The above gets you to a state like the following::
- U1 - U2 - U3 <- upstream/master
- \ \ \
- \ \ - Llld1 - Llld2 -
- \ \ \
- \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
- \ /
- - Lllvm1 - Lllvm2-----
- Each branched component has its branch rewritten on top of the
- monorepo and all components are unified by a giant octopus merge.
- If additional active local branches need to be preserved, the above
- operations following the assignment to ``my_local_branch`` should be
- done for each branch. Ref paths will need to be updated to map the
- local branch to the corresponding upstream branch. If local branches
- have no corresponding upstream branch, then the creation of
- ``local/octopus/<local branch>`` need not use ``git-merge-base`` to
- pinpont its root commit; it may simply be branched from the
- appropriate component branch (say, ``llvm/local_release_X``).
- Zipping local history
- ---------------------
- The octopus merge is suboptimal for many cases, because walking back
- through the history of one component leaves the other components fixed
- at a history that likely makes things unbuildable.
- Some downstream users track the order commits were made to subprojects
- with some kind of "umbrella" project that imports the project git
- mirrors as submodules, similar to the multirepo umbrella proposed
- above. Such an umbrella repository looks something like this::
- UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
- | | | | | | |
- Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1
- The vertical bars represent submodule updates to a particular local
- commit in the project mirror. ``UM3`` in this case is a commit of
- some local umbrella repository state that is not a submodule update,
- perhaps a ``README`` or project build script update. Commit ``UM8``
- updates a submodule of local project ``myproj``.
- The tool ``zip-downstream-fork.py`` at
- https://github.com/greened/llvm-git-migration/tree/zip can be used to
- convert the umbrella history into a monorepo-based history with
- commits in the order implied by submodule updates::
- U1 - U2 - U3 <- upstream/master
- \ \ \
- \ -----\--------------- local/zip--.
- \ \ \ |
- - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
- The ``U*`` commits represent upstream commits to the monorepo master
- branch. Each submodule update in the local ``UM*`` commits brought in
- a subproject tree at some local commit. The trees in the ``L*1``
- commits represent merges from upstream. These result in edges from
- the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
- The ``L*2`` commits did not do any merges from upstream.
- Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
- if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
- commit appearing after the ``Llld1`` commit would actually represent a
- clang tree *earlier* in the upstream clang history. We want the
- ``local/zip`` branch to accurately represent the state of our umbrella
- history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
- clang's tree actually looks like in ``Lclang1``.
- Even so, the edge ``U3 -> Llld1`` could be problematic for future
- merges from upstream. git will think that we've already merged from
- ``U3``, and we have, except for the state of the clang tree. One
- possible migitation strategy is to manually diff clang between ``U2``
- and ``U3`` and apply those updates to ``local/zip``. Another,
- possibly simpler strategy is to freeze local work on downstream
- branches and merge all submodules from the latest upstream before
- running ``zip-downstream-fork.py``. If downstream merged each project
- from upstream in lockstep without any intervening local commits, then
- things should be fine without any special action. We anticipate this
- to be the common case.
- The tree for ``Lclang1`` outside of clang will represent the state of
- things at ``U3`` since all of the upstream projects not participating
- in the umbrella history should be in a state respecting the commit
- ``U3``. The trees for llvm and lld should correctly represent commits
- ``Lllvm1`` and ``Llld1``, respectively.
- Commit ``UM3`` changed files not related to submodules and we need
- somewhere to put them. It is not safe in general to put them in the
- monorepo root directory because they may conflict with files in the
- monorepo. Let's assume we want them in a directory ``local`` in the
- monorepo.
- **Example 1: Umbrella looks like the monorepo**
- For this example, we'll assume that each subproject appears in its own
- top-level directory in the umbrella, just as they do in the monorepo .
- Let's also assume that we want the files in directory ``myproj`` to
- appear in ``local/myproj``.
- Given the above run of ``migrate-downstream-fork.py``, a recipe to
- create the zipped history is below::
- # Import any non-LLVM repositories the umbrella references.
- git -C my-monorepo remote add localrepo \
- https://my.local.mirror.org/localrepo.git
- git fetch localrepo
- subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
- libcxx libcxxabi libunwind lld lldb llgo llvm openmp
- parallel-libs polly pstl )
- # Import histories for upstream split projects (this was probably
- # already done for the ``migrate-downstream-fork.py`` run).
- for project in ${subprojects[@]}; do
- git remote add upstream/split/${project} \
- https://github.com/llvm-mirror/${subproject}.git
- git fetch umbrella/split/${project}
- done
- # Import histories for downstream split projects (this was probably
- # already done for the ``migrate-downstream-fork.py`` run).
- for project in ${subprojects[@]}; do
- git remote add local/split/${project} \
- https://my.local.mirror.org/${subproject}.git
- git fetch local/split/${project}
- done
- # Import umbrella history.
- git -C my-monorepo remote add umbrella \
- https://my.local.mirror.org/umbrella.git
- git fetch umbrella
- # Put myproj in local/myproj
- echo "myproj local/myproj" > my-monorepo/submodule-map.txt
- # Rewrite history
- (
- cd my-monorepo
- zip-downstream-fork.py \
- refs/remotes/umbrella \
- --new-repo-prefix=refs/remotes/upstream/monorepo \
- --old-repo-prefix=refs/remotes/upstream/split \
- --revmap-in=monorepo-map.txt \
- --revmap-out=zip-map.txt \
- --subdir=local \
- --submodule-map=submodule-map.txt \
- --update-tags
- )
- # Create the zip branch (assuming umbrella master is wanted).
- git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
- Note that if the umbrella has submodules to non-LLVM repositories,
- ``zip-downstream-fork.py`` needs to know about them to be able to
- rewrite commits. That is why the first step above is to fetch commits
- from such repositories.
- With ``--update-tags`` the tool will migrate annotated tags pointing
- to submodule commits that were inlined into the zipped history. If
- the umbrella pulled in an upstream commit that happened to have a tag
- pointing to it, that tag will be migrated, which is almost certainly
- not what is wanted. The tag can always be moved back to its original
- commit after rewriting, or the ``--update-tags`` option may be
- discarded and any local tags would then be migrated manually.
- **Example 2: Nested sources layout**
- The tool handles nested submodules (e.g. llvm is a submodule in
- umbrella and clang is a submodule in llvm). The file
- ``submodule-map.txt`` is a list of pairs, one per line. The first
- pair item describes the path to a submodule in the umbrella
- repository. The second pair item secribes the path where trees for
- that submodule should be written in the zipped history.
- Let's say your umbrella repository is actually the llvm repository and
- it has submodules in the "nested sources" layout (clang in
- tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule
- pointing to some downstream repository. The submodule map file should
- look like this (we still want myproj mapped the same way as
- previously)::
- tools/clang clang
- tools/clang/tools/extra clang-tools-extra
- projects/compiler-rt compiler-rt
- projects/debuginfo-tests debuginfo-tests
- projects/libclc libclc
- projects/libcxx libcxx
- projects/libcxxabi libcxxabi
- projects/libunwind libunwind
- tools/lld lld
- tools/lldb lldb
- projects/openmp openmp
- tools/polly polly
- projects/myproj local/myproj
- If a submodule path does not appear in the map, the tools assumes it
- should be placed in the same place in the monorepo. That means if you
- use the "nested sources" layout in your umrella, you *must* provide
- map entries for all of the projects in your umbrella (except llvm).
- Otherwise trees from submodule updates will appear underneath llvm in
- the zippped history.
- Because llvm is itself the umbrella, we use --subdir to write its
- content into ``llvm`` in the zippped history::
- # Import any non-LLVM repositories the umbrella references.
- git -C my-monorepo remote add localrepo \
- https://my.local.mirror.org/localrepo.git
- git fetch localrepo
- subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
- libcxx libcxxabi libunwind lld lldb llgo llvm openmp
- parallel-libs polly pstl )
- # Import histories for upstream split projects (this was probably
- # already done for the ``migrate-downstream-fork.py`` run).
- for project in ${subprojects[@]}; do
- git remote add upstream/split/${project} \
- https://github.com/llvm-mirror/${subproject}.git
- git fetch umbrella/split/${project}
- done
- # Import histories for downstream split projects (this was probably
- # already done for the ``migrate-downstream-fork.py`` run).
- for project in ${subprojects[@]}; do
- git remote add local/split/${project} \
- https://my.local.mirror.org/${subproject}.git
- git fetch local/split/${project}
- done
- # Import umbrella history. We want this under a different refspec
- # so zip-downstream-fork.py knows what it is.
- git -C my-monorepo remote add umbrella \
- https://my.local.mirror.org/llvm.git
- git fetch umbrella
- # Create the submodule map.
- echo "tools/clang clang" > my-monorepo/submodule-map.txt
- echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
- echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
- echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
- echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
- echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
- echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
- echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
- echo "tools/lld lld" >> my-monorepo/submodule-map.txt
- echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
- echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
- echo "tools/polly polly" >> my-monorepo/submodule-map.txt
- echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
- # Rewrite history
- (
- cd my-monorepo
- zip-downstream-fork.py \
- refs/remotes/umbrella \
- --new-repo-prefix=refs/remotes/upstream/monorepo \
- --old-repo-prefix=refs/remotes/upstream/split \
- --revmap-in=monorepo-map.txt \
- --revmap-out=zip-map.txt \
- --subdir=llvm \
- --submodule-map=submodule-map.txt \
- --update-tags
- )
- # Create the zip branch (assuming umbrella master is wanted).
- git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
- Comments at the top of ``zip-downstream-fork.py`` describe in more
- detail how the tool works and various implications of its operation.
- Importing local repositories
- ----------------------------
- You may have additional repositories that integrate with the LLVM
- ecosystem, essentially extending it with new tools. If such
- repositories are tightly coupled with LLVM, it may make sense to
- import them into your local mirror of the monorepo.
- If such repositores participated in the umbrella repository used
- during the zipping process above, they will automatically be added to
- the monorepo. For downstream repositories that don't participate in
- an umbrella setup, the ``import-downstream-repo.py`` tool at
- https://github.com/greened/llvm-git-migration/tree/import can help with
- getting them into the monorepo. A recipe follows::
- # Import downstream repo history into the monorepo.
- git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
- git fetch myrepo
- my_local_tags=( refs/tags/release
- refs/tags/hotfix )
- (
- cd my-monorepo
- import-downstream-repo.py \
- refs/remotes/myrepo \
- ${my_local_tags[@]} \
- --new-repo-prefix=refs/remotes/upstream/monorepo \
- --subdir=myrepo \
- --tag-prefix="myrepo-"
- )
- # Preserve release braches.
- for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
- refs/remotes/myrepo/release); do
- branch=${ref#refs/remotes/myrepo/}
- git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
- done
- # Preserve master.
- git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
- # Merge master.
- git -C my-monorepo checkout local/zip/master # Or local/octopus/master
- git -C my-monorepo merge myrepo/master
- You may want to merge other corresponding branches, for example
- ``myrepo`` release branches if they were in lockstep with LLVM project
- releases.
- ``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
- annotated tags with the given prefix. Due to limitations with
- ``fast_filter_branch.py``, unannotated tags cannot be renamed
- (``fast_filter_branch.py`` considers them branches, not tags). Since
- the upstream monorepo had its tags rewritten with an "llvmorg-"
- prefix, name conflicts should not be an issue. ``--tag-prefix`` can
- be used to more clearly indicate which tags correspond to various
- imported repositories.
- Given this repository history::
- R1 - R2 - R3 <- master
- ^
- |
- release/1
- The above recipe results in a history like this::
- U1 - U2 - U3 <- upstream/master
- \ \ \
- \ -----\--------------- local/zip--.
- \ \ \ |
- - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
- /
- R1 - R2 - R3 <-.
- ^ |
- | |
- myrepo-release/1 |
- |
- myrepo/master--'
- Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
- from ``myrepo``. If you require commits from ``myrepo`` to be
- interleaved with commits on local project branches (for example,
- interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
- appear in an umbrella repository, a new tool will need to be
- developed. Creating such a tool would involve:
- 1. Modifying ``fast_filter_branch.py`` to optionally take a
- revlist directly rather than generating it itself
- 2. Creating a tool to generate an interleaved ordering of local
- commits based on some criteria (``zip-downstream-fork.py`` uses the
- umbrella history as its criterion)
- 3. Generating such an ordering and feeding it to
- ``fast_filter_branch.py`` as a revlist
- Some care will also likely need to be taken to handle merge commits,
- to ensure the parents of such commits migrate correctly.
- Scrubbing the Local Monorepo
- ----------------------------
- Once all of the migrating, zipping and importing is done, it's time to
- clean up. The python tools use ``git-fast-import`` which leaves a lot
- of cruft around and we want to shrink our new monorepo mirror as much
- as possible. Here is one way to do it::
- git -C my-monorepo checkout master
- # Delete branches we no longer need. Do this for any other branches
- # you merged above.
- git -C my-monorepo branch -D local/zip/master || true
- git -C my-monorepo branch -D local/octopus/master || true
- # Remove remotes.
- git -C my-monorepo remote remove upstream/monorepo
- for p in ${my_projects[@]}; do
- git -C my-monorepo remote remove upstream/split/${p}
- git -C my-monorepo remote remove local/split/${p}
- done
- git -C my-monorepo remote remove localrepo
- git -C my-monorepo remote remove umbrella
- git -C my-monorepo remote remove myrepo
- # Add anything else here you don't need. refs/tags/release is
- # listed below assuming tags have been rewritten with a local prefix.
- # If not, remove it from this list.
- refs_to_clean=(
- refs/original
- refs/remotes
- refs/tags/backups
- refs/tags/release
- )
- git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
- xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
- git -C my-monorepo reflog expire --all --expire=now
- # fast_filter_branch.py might have gc running in the background.
- while ! git -C my-monorepo \
- -c gc.reflogExpire=0 \
- -c gc.reflogExpireUnreachable=0 \
- -c gc.rerereresolved=0 \
- -c gc.rerereunresolved=0 \
- -c gc.pruneExpire=now \
- gc --prune=now; do
- continue
- done
- # Takes a LOOOONG time!
- git -C my-monorepo repack -A -d -f --depth=250 --window=250
- git -C my-monorepo prune-packed
- git -C my-monorepo prune
- You should now have a trim monorepo. Upload it to your git server and
- happy hacking!
- References
- ==========
- .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
- .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
- .. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
- .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
- .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
|