GitHubMove.rst 41 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086
  1. ==============================
  2. Moving LLVM Projects to GitHub
  3. ==============================
  4. Current Status
  5. ==============
  6. We are planning to complete the transition to GitHub by Oct 21, 2019. See
  7. the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_
  8. for the latest updates and instructions for how to migrate your workflows.
  9. .. contents:: Table of Contents
  10. :depth: 4
  11. :local:
  12. Introduction
  13. ============
  14. This is a proposal to move our current revision control system from our own
  15. hosted Subversion to GitHub. Below are the financial and technical arguments as
  16. to why we are proposing such a move and how people (and validation
  17. infrastructure) will continue to work with a Git-based LLVM.
  18. What This Proposal is *Not* About
  19. =================================
  20. Changing the development policy.
  21. This proposal relates only to moving the hosting of our source-code repository
  22. from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
  23. using GitHub's issue tracker, pull-requests, or code-review.
  24. Contributors will continue to earn commit access on demand under the Developer
  25. Policy, except that that a GitHub account will be required instead of SVN
  26. username/password-hash.
  27. Why Git, and Why GitHub?
  28. ========================
  29. Why Move At All?
  30. ----------------
  31. This discussion began because we currently host our own Subversion server
  32. and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
  33. provides limited support, but there is only so much it can do.
  34. Volunteers are not sysadmins themselves, but compiler engineers that happen
  35. to know a thing or two about hosting servers. We also don't have 24/7 support,
  36. and we sometimes wake up to see that continuous integration is broken because
  37. the SVN server is either down or unresponsive.
  38. We should take advantage of one of the services out there (GitHub, GitLab,
  39. and BitBucket, among others) that offer better service (24/7 stability, disk
  40. space, Git server, code browsing, forking facilities, etc) for free.
  41. Why Git?
  42. --------
  43. Many new coders nowadays start with Git, and a lot of people have never used
  44. SVN, CVS, or anything else. Websites like GitHub have changed the landscape
  45. of open source contributions, reducing the cost of first contribution and
  46. fostering collaboration.
  47. Git is also the version control many LLVM developers use. Despite the
  48. sources being stored in a SVN server, these developers are already using Git
  49. through the Git-SVN integration.
  50. Git allows you to:
  51. * Commit, squash, merge, and fork locally without touching the remote server.
  52. * Maintain local branches, enabling multiple threads of development.
  53. * Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
  54. * Inspect the repository history (blame, log, bisect) without Internet access.
  55. * Maintain remote forks and branches on Git hosting services and
  56. integrate back to the main repository.
  57. In addition, because Git seems to be replacing many OSS projects' version
  58. control systems, there are many tools that are built over Git.
  59. Future tooling may support Git first (if not only).
  60. Why GitHub?
  61. -----------
  62. GitHub, like GitLab and BitBucket, provides free code hosting for open source
  63. projects. Any of these could replace the code-hosting infrastructure that we
  64. have today.
  65. These services also have a dedicated team to monitor, migrate, improve and
  66. distribute the contents of the repositories depending on region and load.
  67. GitHub has one important advantage over GitLab and
  68. BitBucket: it offers read-write **SVN** access to the repository
  69. (https://github.com/blog/626-announcing-svn-support).
  70. This would enable people to continue working post-migration as though our code
  71. were still canonically in an SVN repository.
  72. In addition, there are already multiple LLVM mirrors on GitHub, indicating that
  73. part of our community has already settled there.
  74. On Managing Revision Numbers with Git
  75. -------------------------------------
  76. The current SVN repository hosts all the LLVM sub-projects alongside each other.
  77. A single revision number (e.g. r123456) thus identifies a consistent version of
  78. all LLVM sub-projects.
  79. Git does not use sequential integer revision number but instead uses a hash to
  80. identify each commit.
  81. The loss of a sequential integer revision number has been a sticking point in
  82. past discussions about Git:
  83. - "The 'branch' I most care about is mainline, and losing the ability to say
  84. 'fixed in r1234' (with some sort of monotonically increasing number) would
  85. be a tragic loss." [LattnerRevNum]_
  86. - "I like those results sorted by time and the chronology should be obvious, but
  87. timestamps are incredibly cumbersome and make it difficult to verify that a
  88. given checkout matches a given set of results." [TrickRevNum]_
  89. - "There is still the major regression with unreadable version numbers.
  90. Given the amount of Bugzilla traffic with 'Fixed in...', that's a
  91. non-trivial issue." [JSonnRevNum]_
  92. - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
  93. However, Git can emulate this increasing revision number:
  94. ``git rev-list --count <commit-hash>``. This identifier is unique only
  95. within a single branch, but this means the tuple `(num, branch-name)` uniquely
  96. identifies a commit.
  97. We can thus use this revision number to ensure that e.g. `clang -v` reports a
  98. user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
  99. the objections raised above with respect to this aspect of Git.
  100. What About Branches and Merges?
  101. -------------------------------
  102. In contrast to SVN, Git makes branching easy. Git's commit history is
  103. represented as a DAG, a departure from SVN's linear history. However, we propose
  104. to mandate making merge commits illegal in our canonical Git repository.
  105. Unfortunately, GitHub does not support server side hooks to enforce such a
  106. policy. We must rely on the community to avoid pushing merge commits.
  107. GitHub offers a feature called `Status Checks`: a branch protected by
  108. `status checks` requires commits to be whitelisted before the push can happen.
  109. We could supply a pre-push hook on the client side that would run and check the
  110. history, before whitelisting the commit being pushed [statuschecks]_.
  111. However this solution would be somewhat fragile (how do you update a script
  112. installed on every developer machine?) and prevents SVN access to the
  113. repository.
  114. What About Commit Emails?
  115. -------------------------
  116. We will need a new bot to send emails for each commit. This proposal leaves the
  117. email format unchanged besides the commit URL.
  118. Straw Man Migration Plan
  119. ========================
  120. Step #1 : Before The Move
  121. -------------------------
  122. 1. Update docs to mention the move, so people are aware of what is going on.
  123. 2. Set up a read-only version of the GitHub project, mirroring our current SVN
  124. repository.
  125. 3. Add the required bots to implement the commit emails, as well as the
  126. umbrella repository update (if the multirepo is selected) or the read-only
  127. Git views for the sub-projects (if the monorepo is selected).
  128. Step #2 : Git Move
  129. ------------------
  130. 4. Update the buildbots to pick up updates and commits from the GitHub
  131. repository. Not all bots have to migrate at this point, but it'll help
  132. provide infrastructure testing.
  133. 5. Update Phabricator to pick up commits from the GitHub repository.
  134. 6. LNT and llvmlab have to be updated: they rely on unique monotonically
  135. increasing integer across branch [MatthewsRevNum]_.
  136. 7. Instruct downstream integrators to pick up commits from the GitHub
  137. repository.
  138. 8. Review and prepare an update for the LLVM documentation.
  139. Until this point nothing has changed for developers, it will just
  140. boil down to a lot of work for buildbot and other infrastructure
  141. owners.
  142. The migration will pause here until all dependencies have cleared, and all
  143. problems have been solved.
  144. Step #3: Write Access Move
  145. --------------------------
  146. 9. Collect developers' GitHub account information, and add them to the project.
  147. 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
  148. 11. Update the documentation.
  149. 12. Mirror Git to SVN.
  150. Step #4 : Post Move
  151. -------------------
  152. 13. Archive the SVN repository.
  153. 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
  154. point to GitHub instead.
  155. Github Repository Description
  156. =============================
  157. Monorepo
  158. ----------------
  159. The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
  160. sub-projects in a single source tree. It is often refered to as a monorepo and
  161. mimics an export of the current SVN repository, with each sub-project having its
  162. own top-level directory. Not all sub-projects are used for building toolchains.
  163. For example, www/ and test-suite/ are not part of the monorepo.
  164. Putting all sub-projects in a single checkout makes cross-project refactoring
  165. naturally simple:
  166. * New sub-projects can be trivially split out for better reuse and/or layering
  167. (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
  168. dependency on LLVM).
  169. * Changing an API in LLVM and upgrading the sub-projects will always be done in
  170. a single commit, designing away a common source of temporary build breakage.
  171. * Moving code across sub-project (during refactoring for instance) in a single
  172. commit enables accurate `git blame` when tracking code change history.
  173. * Tooling based on `git grep` works natively across sub-projects, allowing to
  174. easier find refactoring opportunities across projects (for example reusing a
  175. datastructure initially in LLDB by moving it into libSupport).
  176. * Having all the sources present encourages maintaining the other sub-projects
  177. when changing API.
  178. Finally, the monorepo maintains the property of the existing SVN repository that
  179. the sub-projects move synchronously, and a single revision number (or commit
  180. hash) identifies the state of the development across all projects.
  181. .. _build_single_project:
  182. Building a single sub-project
  183. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  184. Even though there is a single source tree, you are not required to build
  185. all sub-projects together. It is trivial to configure builds for a single
  186. sub-project.
  187. For example::
  188. mkdir build && cd build
  189. # Configure only LLVM (default)
  190. cmake path/to/monorepo
  191. # Configure LLVM and lld
  192. cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
  193. # Configure LLVM and clang
  194. cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
  195. .. _git-svn-mirror:
  196. Outstanding Questions
  197. ---------------------
  198. Read-only sub-project mirrors
  199. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  200. With the Monorepo, it is undecided whether the existing single-subproject
  201. mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
  202. be maintained.
  203. Read/write SVN bridge
  204. ^^^^^^^^^^^^^^^^^^^^^
  205. GitHub supports a read/write SVN bridge for its repositories. However,
  206. there have been issues with this bridge working correctly in the past,
  207. so it's not clear if this is something that will be supported going forward.
  208. Monorepo Drawbacks
  209. ------------------
  210. * Using the monolithic repository may add overhead for those contributing to a
  211. standalone sub-project, particularly on runtimes like libcxx and compiler-rt
  212. that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
  213. 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
  214. `git push` collisions when upstreaming. Affected contributors may be able to
  215. use the SVN bridge or the single-subproject Git mirrors. However, it's
  216. undecided if these projects will continue to be mantained.
  217. * Using the monolithic repository may add overhead for those *integrating* a
  218. standalone sub-project, even if they aren't contributing to it, due to the
  219. same disk space concern as the point above. The availability of the
  220. sub-project Git mirrors would addresses this.
  221. * Preservation of the existing read/write SVN-based workflows relies on the
  222. GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
  223. into GitHub and could restrict future workflow changes.
  224. Workflows
  225. ^^^^^^^^^
  226. * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
  227. * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
  228. * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
  229. * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
  230. * :ref:`Bisecting <workflow-mono-bisecting>`.
  231. Workflow Before/After
  232. =====================
  233. This section goes through a few examples of workflows, intended to illustrate
  234. how end-users or developers would interact with the repository for
  235. various use-cases.
  236. .. _workflow-checkout-commit:
  237. Checkout/Clone a Single Project, with Commit Access
  238. ---------------------------------------------------
  239. Currently
  240. ^^^^^^^^^
  241. ::
  242. # direct SVN checkout
  243. svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
  244. # or using the read-only Git view, with git-svn
  245. git clone http://llvm.org/git/llvm.git
  246. cd llvm
  247. git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
  248. git config svn-remote.svn.fetch :refs/remotes/origin/master
  249. git svn rebase -l # -l avoids fetching ahead of the git mirror.
  250. Commits are performed using `svn commit` or with the sequence `git commit` and
  251. `git svn dcommit`.
  252. .. _workflow-multicheckout-nocommit:
  253. Monorepo Variant
  254. ^^^^^^^^^^^^^^^^
  255. With the monorepo variant, there are a few options, depending on your
  256. constraints. First, you could just clone the full repository:
  257. git clone https://github.com/llvm/llvm-project.git
  258. At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
  259. :ref:`doesn't imply you have to build all of them <build_single_project>`. You
  260. can still build only compiler-rt for instance. In this way it's not different
  261. from someone who would check out all the projects with SVN today.
  262. If you want to avoid checking out all the sources, you can hide the other
  263. directories using a Git sparse checkout::
  264. git config core.sparseCheckout true
  265. echo /compiler-rt > .git/info/sparse-checkout
  266. git read-tree -mu HEAD
  267. The data for all sub-projects is still in your `.git` directory, but in your
  268. checkout, you only see `compiler-rt`.
  269. Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
  270. usual.
  271. Note that when you fetch you'll likely pull in changes to sub-projects you don't
  272. care about. If you are using spasre checkout, the files from other projects
  273. won't appear on your disk. The only effect is that your commit hash changes.
  274. You can check whether the changes in the last fetch are relevant to your commit
  275. by running::
  276. git log origin/master@{1}..origin/master -- libcxx
  277. This command can be hidden in a script so that `git llvmpush` would perform all
  278. these steps, fail only if such a dependent change exists, and show immediately
  279. the change that prevented the push. An immediate repeat of the command would
  280. (almost) certainly result in a successful push.
  281. Note that today with SVN or git-svn, this step is not possible since the
  282. "rebase" implicitly happens while committing (unless a conflict occurs).
  283. Checkout/Clone Multiple Projects, with Commit Access
  284. ----------------------------------------------------
  285. Let's look how to assemble llvm+clang+libcxx at a given revision.
  286. Currently
  287. ^^^^^^^^^
  288. ::
  289. svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
  290. cd llvm/tools
  291. svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
  292. cd ../projects
  293. svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
  294. Or using git-svn::
  295. git clone http://llvm.org/git/llvm.git
  296. cd llvm/
  297. git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
  298. git config svn-remote.svn.fetch :refs/remotes/origin/master
  299. git svn rebase -l
  300. git checkout `git svn find-rev -B r258109`
  301. cd tools
  302. git clone http://llvm.org/git/clang.git
  303. cd clang/
  304. git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
  305. git config svn-remote.svn.fetch :refs/remotes/origin/master
  306. git svn rebase -l
  307. git checkout `git svn find-rev -B r258109`
  308. cd ../../projects/
  309. git clone http://llvm.org/git/libcxx.git
  310. cd libcxx
  311. git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
  312. git config svn-remote.svn.fetch :refs/remotes/origin/master
  313. git svn rebase -l
  314. git checkout `git svn find-rev -B r258109`
  315. Note that the list would be longer with more sub-projects.
  316. .. _workflow-monocheckout-multicommit:
  317. Monorepo Variant
  318. ^^^^^^^^^^^^^^^^
  319. The repository contains natively the source for every sub-projects at the right
  320. revision, which makes this straightforward::
  321. git clone https://github.com/llvm/llvm-project.git
  322. cd llvm-projects
  323. git checkout $REVISION
  324. As before, at this point clang, llvm, and libcxx are stored in directories
  325. alongside each other.
  326. .. _workflow-cross-repo-commit:
  327. Commit an API Change in LLVM and Update the Sub-projects
  328. --------------------------------------------------------
  329. Today this is possible, even though not common (at least not documented) for
  330. subversion users and for git-svn users. For example, few Git users try to update
  331. LLD or Clang in the same commit as they change an LLVM API.
  332. The multirepo variant does not address this: one would have to commit and push
  333. separately in every individual repository. It would be possible to establish a
  334. protocol whereby users add a special token to their commit messages that causes
  335. the umbrella repo's updater bot to group all of them into a single revision.
  336. The monorepo variant handles this natively.
  337. Branching/Stashing/Updating for Local Development or Experiments
  338. ----------------------------------------------------------------
  339. Currently
  340. ^^^^^^^^^
  341. SVN does not allow this use case, but developers that are currently using
  342. git-svn can do it. Let's look in practice what it means when dealing with
  343. multiple sub-projects.
  344. To update the repository to tip of trunk::
  345. git pull
  346. cd tools/clang
  347. git pull
  348. cd ../../projects/libcxx
  349. git pull
  350. To create a new branch::
  351. git checkout -b MyBranch
  352. cd tools/clang
  353. git checkout -b MyBranch
  354. cd ../../projects/libcxx
  355. git checkout -b MyBranch
  356. To switch branches::
  357. git checkout AnotherBranch
  358. cd tools/clang
  359. git checkout AnotherBranch
  360. cd ../../projects/libcxx
  361. git checkout AnotherBranch
  362. .. _workflow-mono-branching:
  363. Monorepo Variant
  364. ^^^^^^^^^^^^^^^^
  365. Regular Git commands are sufficient, because everything is in a single
  366. repository:
  367. To update the repository to tip of trunk::
  368. git pull
  369. To create a new branch::
  370. git checkout -b MyBranch
  371. To switch branches::
  372. git checkout AnotherBranch
  373. Bisecting
  374. ---------
  375. Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
  376. Currently
  377. ^^^^^^^^^
  378. SVN does not have builtin bisection support, but the single revision across
  379. sub-projects makes it possible to script around.
  380. Using the existing Git read-only view of the repositories, it is possible to use
  381. the native Git bisection script over the llvm repository, and use some scripting
  382. to synchronize the clang repository to match the llvm revision.
  383. .. _workflow-mono-bisecting:
  384. Monorepo Variant
  385. ^^^^^^^^^^^^^^^^
  386. Bisecting on the monorepo is straightforward, and very similar to the above,
  387. except that the bisection script does not need to include the
  388. `git submodule update` step.
  389. The same example, finding which commit introduces a regression where clang-3.9
  390. crashes but not clang-3.8 passes, will look like::
  391. git bisect start releases/3.9.x releases/3.8.x
  392. git bisect run ./bisect_script.sh
  393. With the `bisect_script.sh` script being::
  394. #!/bin/sh
  395. cd $BUILD_DIR
  396. ninja clang || exit 125 # an exit code of 125 asks "git bisect"
  397. # to "skip" the current commit
  398. ./bin/clang some_crash_test.cpp
  399. Also, since the monorepo handles commits update across multiple projects, you're
  400. less like to encounter a build failure where a commit change an API in LLVM and
  401. another later one "fixes" the build in clang.
  402. Moving Local Branches to the Monorepo
  403. =====================================
  404. Suppose you have been developing against the existing LLVM git
  405. mirrors. You have one or more git branches that you want to migrate
  406. to the "final monorepo".
  407. The simplest way to migrate such branches is with the
  408. ``migrate-downstream-fork.py`` tool at
  409. https://github.com/jyknight/llvm-git-migration.
  410. Basic migration
  411. ---------------
  412. Basic instructions for ``migrate-downstream-fork.py`` are in the
  413. Python script and are expanded on below to a more general recipe::
  414. # Make a repository which will become your final local mirror of the
  415. # monorepo.
  416. mkdir my-monorepo
  417. git -C my-monorepo init
  418. # Add a remote to the monorepo.
  419. git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
  420. # Add remotes for each git mirror you use, from upstream as well as
  421. # your local mirror. All projects are listed here but you need only
  422. # import those for which you have local branches.
  423. my_projects=( clang
  424. clang-tools-extra
  425. compiler-rt
  426. debuginfo-tests
  427. libcxx
  428. libcxxabi
  429. libunwind
  430. lld
  431. lldb
  432. llvm
  433. openmp
  434. polly )
  435. for p in ${my_projects[@]}; do
  436. git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
  437. git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
  438. done
  439. # Pull in all the commits.
  440. git -C my-monorepo fetch --all
  441. # Run migrate-downstream-fork to rewrite local branches on top of
  442. # the upstream monorepo.
  443. (
  444. cd my-monorepo
  445. migrate-downstream-fork.py \
  446. refs/remotes/local \
  447. refs/tags \
  448. --new-repo-prefix=refs/remotes/upstream/monorepo \
  449. --old-repo-prefix=refs/remotes/upstream/split \
  450. --source-kind=split \
  451. --revmap-out=monorepo-map.txt
  452. )
  453. # Octopus-merge the resulting local split histories to unify them.
  454. # Assumes local work on local split mirrors is on master (and
  455. # upstream is presumably represented by some other branch like
  456. # upstream/master).
  457. my_local_branch="master"
  458. git -C my-monorepo branch --no-track local/octopus/master \
  459. $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
  460. refs/remotes/local/split/llvm/${my_local_branch})
  461. git -C my-monorepo checkout local/octopus/${my_local_branch}
  462. subproject_branches=()
  463. for p in ${my_projects[@]}; do
  464. subproject_branch=${p}/local/monorepo/${my_local_branch}
  465. git -C my-monorepo branch ${subproject_branch} \
  466. refs/remotes/local/split/${p}/${my_local_branch}
  467. if [[ "${p}" != "llvm" ]]; then
  468. subproject_branches+=( ${subproject_branch} )
  469. fi
  470. done
  471. git -C my-monorepo merge ${subproject_branches[@]}
  472. for p in ${my_projects[@]}; do
  473. subproject_branch=${p}/local/monorepo/${my_local_branch}
  474. git -C my-monorepo branch -d ${subproject_branch}
  475. done
  476. # Create local branches for upstream monorepo branches.
  477. for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
  478. refs/remotes/upstream/monorepo); do
  479. upstream_branch=${ref#refs/remotes/upstream/monorepo/}
  480. git -C my-monorepo branch upstream/${upstream_branch} ${ref}
  481. done
  482. The above gets you to a state like the following::
  483. U1 - U2 - U3 <- upstream/master
  484. \ \ \
  485. \ \ - Llld1 - Llld2 -
  486. \ \ \
  487. \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
  488. \ /
  489. - Lllvm1 - Lllvm2-----
  490. Each branched component has its branch rewritten on top of the
  491. monorepo and all components are unified by a giant octopus merge.
  492. If additional active local branches need to be preserved, the above
  493. operations following the assignment to ``my_local_branch`` should be
  494. done for each branch. Ref paths will need to be updated to map the
  495. local branch to the corresponding upstream branch. If local branches
  496. have no corresponding upstream branch, then the creation of
  497. ``local/octopus/<local branch>`` need not use ``git-merge-base`` to
  498. pinpont its root commit; it may simply be branched from the
  499. appropriate component branch (say, ``llvm/local_release_X``).
  500. Zipping local history
  501. ---------------------
  502. The octopus merge is suboptimal for many cases, because walking back
  503. through the history of one component leaves the other components fixed
  504. at a history that likely makes things unbuildable.
  505. Some downstream users track the order commits were made to subprojects
  506. with some kind of "umbrella" project that imports the project git
  507. mirrors as submodules, similar to the multirepo umbrella proposed
  508. above. Such an umbrella repository looks something like this::
  509. UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
  510. | | | | | | |
  511. Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1
  512. The vertical bars represent submodule updates to a particular local
  513. commit in the project mirror. ``UM3`` in this case is a commit of
  514. some local umbrella repository state that is not a submodule update,
  515. perhaps a ``README`` or project build script update. Commit ``UM8``
  516. updates a submodule of local project ``myproj``.
  517. The tool ``zip-downstream-fork.py`` at
  518. https://github.com/greened/llvm-git-migration/tree/zip can be used to
  519. convert the umbrella history into a monorepo-based history with
  520. commits in the order implied by submodule updates::
  521. U1 - U2 - U3 <- upstream/master
  522. \ \ \
  523. \ -----\--------------- local/zip--.
  524. \ \ \ |
  525. - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
  526. The ``U*`` commits represent upstream commits to the monorepo master
  527. branch. Each submodule update in the local ``UM*`` commits brought in
  528. a subproject tree at some local commit. The trees in the ``L*1``
  529. commits represent merges from upstream. These result in edges from
  530. the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
  531. The ``L*2`` commits did not do any merges from upstream.
  532. Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
  533. if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
  534. commit appearing after the ``Llld1`` commit would actually represent a
  535. clang tree *earlier* in the upstream clang history. We want the
  536. ``local/zip`` branch to accurately represent the state of our umbrella
  537. history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
  538. clang's tree actually looks like in ``Lclang1``.
  539. Even so, the edge ``U3 -> Llld1`` could be problematic for future
  540. merges from upstream. git will think that we've already merged from
  541. ``U3``, and we have, except for the state of the clang tree. One
  542. possible migitation strategy is to manually diff clang between ``U2``
  543. and ``U3`` and apply those updates to ``local/zip``. Another,
  544. possibly simpler strategy is to freeze local work on downstream
  545. branches and merge all submodules from the latest upstream before
  546. running ``zip-downstream-fork.py``. If downstream merged each project
  547. from upstream in lockstep without any intervening local commits, then
  548. things should be fine without any special action. We anticipate this
  549. to be the common case.
  550. The tree for ``Lclang1`` outside of clang will represent the state of
  551. things at ``U3`` since all of the upstream projects not participating
  552. in the umbrella history should be in a state respecting the commit
  553. ``U3``. The trees for llvm and lld should correctly represent commits
  554. ``Lllvm1`` and ``Llld1``, respectively.
  555. Commit ``UM3`` changed files not related to submodules and we need
  556. somewhere to put them. It is not safe in general to put them in the
  557. monorepo root directory because they may conflict with files in the
  558. monorepo. Let's assume we want them in a directory ``local`` in the
  559. monorepo.
  560. **Example 1: Umbrella looks like the monorepo**
  561. For this example, we'll assume that each subproject appears in its own
  562. top-level directory in the umbrella, just as they do in the monorepo .
  563. Let's also assume that we want the files in directory ``myproj`` to
  564. appear in ``local/myproj``.
  565. Given the above run of ``migrate-downstream-fork.py``, a recipe to
  566. create the zipped history is below::
  567. # Import any non-LLVM repositories the umbrella references.
  568. git -C my-monorepo remote add localrepo \
  569. https://my.local.mirror.org/localrepo.git
  570. git fetch localrepo
  571. subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
  572. libcxx libcxxabi libunwind lld lldb llgo llvm openmp
  573. parallel-libs polly pstl )
  574. # Import histories for upstream split projects (this was probably
  575. # already done for the ``migrate-downstream-fork.py`` run).
  576. for project in ${subprojects[@]}; do
  577. git remote add upstream/split/${project} \
  578. https://github.com/llvm-mirror/${subproject}.git
  579. git fetch umbrella/split/${project}
  580. done
  581. # Import histories for downstream split projects (this was probably
  582. # already done for the ``migrate-downstream-fork.py`` run).
  583. for project in ${subprojects[@]}; do
  584. git remote add local/split/${project} \
  585. https://my.local.mirror.org/${subproject}.git
  586. git fetch local/split/${project}
  587. done
  588. # Import umbrella history.
  589. git -C my-monorepo remote add umbrella \
  590. https://my.local.mirror.org/umbrella.git
  591. git fetch umbrella
  592. # Put myproj in local/myproj
  593. echo "myproj local/myproj" > my-monorepo/submodule-map.txt
  594. # Rewrite history
  595. (
  596. cd my-monorepo
  597. zip-downstream-fork.py \
  598. refs/remotes/umbrella \
  599. --new-repo-prefix=refs/remotes/upstream/monorepo \
  600. --old-repo-prefix=refs/remotes/upstream/split \
  601. --revmap-in=monorepo-map.txt \
  602. --revmap-out=zip-map.txt \
  603. --subdir=local \
  604. --submodule-map=submodule-map.txt \
  605. --update-tags
  606. )
  607. # Create the zip branch (assuming umbrella master is wanted).
  608. git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
  609. Note that if the umbrella has submodules to non-LLVM repositories,
  610. ``zip-downstream-fork.py`` needs to know about them to be able to
  611. rewrite commits. That is why the first step above is to fetch commits
  612. from such repositories.
  613. With ``--update-tags`` the tool will migrate annotated tags pointing
  614. to submodule commits that were inlined into the zipped history. If
  615. the umbrella pulled in an upstream commit that happened to have a tag
  616. pointing to it, that tag will be migrated, which is almost certainly
  617. not what is wanted. The tag can always be moved back to its original
  618. commit after rewriting, or the ``--update-tags`` option may be
  619. discarded and any local tags would then be migrated manually.
  620. **Example 2: Nested sources layout**
  621. The tool handles nested submodules (e.g. llvm is a submodule in
  622. umbrella and clang is a submodule in llvm). The file
  623. ``submodule-map.txt`` is a list of pairs, one per line. The first
  624. pair item describes the path to a submodule in the umbrella
  625. repository. The second pair item secribes the path where trees for
  626. that submodule should be written in the zipped history.
  627. Let's say your umbrella repository is actually the llvm repository and
  628. it has submodules in the "nested sources" layout (clang in
  629. tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule
  630. pointing to some downstream repository. The submodule map file should
  631. look like this (we still want myproj mapped the same way as
  632. previously)::
  633. tools/clang clang
  634. tools/clang/tools/extra clang-tools-extra
  635. projects/compiler-rt compiler-rt
  636. projects/debuginfo-tests debuginfo-tests
  637. projects/libclc libclc
  638. projects/libcxx libcxx
  639. projects/libcxxabi libcxxabi
  640. projects/libunwind libunwind
  641. tools/lld lld
  642. tools/lldb lldb
  643. projects/openmp openmp
  644. tools/polly polly
  645. projects/myproj local/myproj
  646. If a submodule path does not appear in the map, the tools assumes it
  647. should be placed in the same place in the monorepo. That means if you
  648. use the "nested sources" layout in your umrella, you *must* provide
  649. map entries for all of the projects in your umbrella (except llvm).
  650. Otherwise trees from submodule updates will appear underneath llvm in
  651. the zippped history.
  652. Because llvm is itself the umbrella, we use --subdir to write its
  653. content into ``llvm`` in the zippped history::
  654. # Import any non-LLVM repositories the umbrella references.
  655. git -C my-monorepo remote add localrepo \
  656. https://my.local.mirror.org/localrepo.git
  657. git fetch localrepo
  658. subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
  659. libcxx libcxxabi libunwind lld lldb llgo llvm openmp
  660. parallel-libs polly pstl )
  661. # Import histories for upstream split projects (this was probably
  662. # already done for the ``migrate-downstream-fork.py`` run).
  663. for project in ${subprojects[@]}; do
  664. git remote add upstream/split/${project} \
  665. https://github.com/llvm-mirror/${subproject}.git
  666. git fetch umbrella/split/${project}
  667. done
  668. # Import histories for downstream split projects (this was probably
  669. # already done for the ``migrate-downstream-fork.py`` run).
  670. for project in ${subprojects[@]}; do
  671. git remote add local/split/${project} \
  672. https://my.local.mirror.org/${subproject}.git
  673. git fetch local/split/${project}
  674. done
  675. # Import umbrella history. We want this under a different refspec
  676. # so zip-downstream-fork.py knows what it is.
  677. git -C my-monorepo remote add umbrella \
  678. https://my.local.mirror.org/llvm.git
  679. git fetch umbrella
  680. # Create the submodule map.
  681. echo "tools/clang clang" > my-monorepo/submodule-map.txt
  682. echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
  683. echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
  684. echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
  685. echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
  686. echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
  687. echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
  688. echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
  689. echo "tools/lld lld" >> my-monorepo/submodule-map.txt
  690. echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
  691. echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
  692. echo "tools/polly polly" >> my-monorepo/submodule-map.txt
  693. echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
  694. # Rewrite history
  695. (
  696. cd my-monorepo
  697. zip-downstream-fork.py \
  698. refs/remotes/umbrella \
  699. --new-repo-prefix=refs/remotes/upstream/monorepo \
  700. --old-repo-prefix=refs/remotes/upstream/split \
  701. --revmap-in=monorepo-map.txt \
  702. --revmap-out=zip-map.txt \
  703. --subdir=llvm \
  704. --submodule-map=submodule-map.txt \
  705. --update-tags
  706. )
  707. # Create the zip branch (assuming umbrella master is wanted).
  708. git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
  709. Comments at the top of ``zip-downstream-fork.py`` describe in more
  710. detail how the tool works and various implications of its operation.
  711. Importing local repositories
  712. ----------------------------
  713. You may have additional repositories that integrate with the LLVM
  714. ecosystem, essentially extending it with new tools. If such
  715. repositories are tightly coupled with LLVM, it may make sense to
  716. import them into your local mirror of the monorepo.
  717. If such repositores participated in the umbrella repository used
  718. during the zipping process above, they will automatically be added to
  719. the monorepo. For downstream repositories that don't participate in
  720. an umbrella setup, the ``import-downstream-repo.py`` tool at
  721. https://github.com/greened/llvm-git-migration/tree/import can help with
  722. getting them into the monorepo. A recipe follows::
  723. # Import downstream repo history into the monorepo.
  724. git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
  725. git fetch myrepo
  726. my_local_tags=( refs/tags/release
  727. refs/tags/hotfix )
  728. (
  729. cd my-monorepo
  730. import-downstream-repo.py \
  731. refs/remotes/myrepo \
  732. ${my_local_tags[@]} \
  733. --new-repo-prefix=refs/remotes/upstream/monorepo \
  734. --subdir=myrepo \
  735. --tag-prefix="myrepo-"
  736. )
  737. # Preserve release braches.
  738. for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
  739. refs/remotes/myrepo/release); do
  740. branch=${ref#refs/remotes/myrepo/}
  741. git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
  742. done
  743. # Preserve master.
  744. git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
  745. # Merge master.
  746. git -C my-monorepo checkout local/zip/master # Or local/octopus/master
  747. git -C my-monorepo merge myrepo/master
  748. You may want to merge other corresponding branches, for example
  749. ``myrepo`` release branches if they were in lockstep with LLVM project
  750. releases.
  751. ``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
  752. annotated tags with the given prefix. Due to limitations with
  753. ``fast_filter_branch.py``, unannotated tags cannot be renamed
  754. (``fast_filter_branch.py`` considers them branches, not tags). Since
  755. the upstream monorepo had its tags rewritten with an "llvmorg-"
  756. prefix, name conflicts should not be an issue. ``--tag-prefix`` can
  757. be used to more clearly indicate which tags correspond to various
  758. imported repositories.
  759. Given this repository history::
  760. R1 - R2 - R3 <- master
  761. ^
  762. |
  763. release/1
  764. The above recipe results in a history like this::
  765. U1 - U2 - U3 <- upstream/master
  766. \ \ \
  767. \ -----\--------------- local/zip--.
  768. \ \ \ |
  769. - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
  770. /
  771. R1 - R2 - R3 <-.
  772. ^ |
  773. | |
  774. myrepo-release/1 |
  775. |
  776. myrepo/master--'
  777. Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
  778. from ``myrepo``. If you require commits from ``myrepo`` to be
  779. interleaved with commits on local project branches (for example,
  780. interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
  781. appear in an umbrella repository, a new tool will need to be
  782. developed. Creating such a tool would involve:
  783. 1. Modifying ``fast_filter_branch.py`` to optionally take a
  784. revlist directly rather than generating it itself
  785. 2. Creating a tool to generate an interleaved ordering of local
  786. commits based on some criteria (``zip-downstream-fork.py`` uses the
  787. umbrella history as its criterion)
  788. 3. Generating such an ordering and feeding it to
  789. ``fast_filter_branch.py`` as a revlist
  790. Some care will also likely need to be taken to handle merge commits,
  791. to ensure the parents of such commits migrate correctly.
  792. Scrubbing the Local Monorepo
  793. ----------------------------
  794. Once all of the migrating, zipping and importing is done, it's time to
  795. clean up. The python tools use ``git-fast-import`` which leaves a lot
  796. of cruft around and we want to shrink our new monorepo mirror as much
  797. as possible. Here is one way to do it::
  798. git -C my-monorepo checkout master
  799. # Delete branches we no longer need. Do this for any other branches
  800. # you merged above.
  801. git -C my-monorepo branch -D local/zip/master || true
  802. git -C my-monorepo branch -D local/octopus/master || true
  803. # Remove remotes.
  804. git -C my-monorepo remote remove upstream/monorepo
  805. for p in ${my_projects[@]}; do
  806. git -C my-monorepo remote remove upstream/split/${p}
  807. git -C my-monorepo remote remove local/split/${p}
  808. done
  809. git -C my-monorepo remote remove localrepo
  810. git -C my-monorepo remote remove umbrella
  811. git -C my-monorepo remote remove myrepo
  812. # Add anything else here you don't need. refs/tags/release is
  813. # listed below assuming tags have been rewritten with a local prefix.
  814. # If not, remove it from this list.
  815. refs_to_clean=(
  816. refs/original
  817. refs/remotes
  818. refs/tags/backups
  819. refs/tags/release
  820. )
  821. git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
  822. xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
  823. git -C my-monorepo reflog expire --all --expire=now
  824. # fast_filter_branch.py might have gc running in the background.
  825. while ! git -C my-monorepo \
  826. -c gc.reflogExpire=0 \
  827. -c gc.reflogExpireUnreachable=0 \
  828. -c gc.rerereresolved=0 \
  829. -c gc.rerereunresolved=0 \
  830. -c gc.pruneExpire=now \
  831. gc --prune=now; do
  832. continue
  833. done
  834. # Takes a LOOOONG time!
  835. git -C my-monorepo repack -A -d -f --depth=250 --window=250
  836. git -C my-monorepo prune-packed
  837. git -C my-monorepo prune
  838. You should now have a trim monorepo. Upload it to your git server and
  839. happy hacking!
  840. References
  841. ==========
  842. .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
  843. .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
  844. .. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
  845. .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
  846. .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/