Switching from submodules to subrepos
Everyone hates submodules, I don't mind them that much because I know how to use them and know how to fix them when they break. But I understand that they are difficult and awkward for people, there are some drawbacks and there have been some calls to replace them.
I happened to stumble on the git subtree system recently, and decided to look into it to see if it could be used to replace submodules.
It looks relatively easy to do, I've prepared a branch (called subtree) that has no submodules, and only subtrees, so we can see if we want to switch to it. It has been pushed to the base repo.
First lets talk about why subtrees are an improvement, and then about what isn't so much of an improvement over submodules.
There are several reasons why you might find subtree better to use:
Management of a simple workflow is easy. Older version of git are supported (even before v1.5.2). The sub-project’s code is available right after the clone of the super project is done. This means that in LEAP we don't need a separate submodule check-out phase that could break on any one of the individual submodules that needs to be checked out. subtree does not require users of your repository to learn anything new, they can ignore the fact that you are using subtree to manage dependencies. subtree does not add new metadata files like submodules doe (i.e. .gitmodule). Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.
The drawbacks:
You have to learn something new, and the commands are slightly awkward Contributing code back upstream for the sub-projects is slightly more complicated. The responsibility of not mixing super and sub-project code in commits lies with you.
These drawbacks seem relatively minor. The main one is that it discourages sharing changes with upstream modules, and encourages local changes that don't get merged upstream. But I think if we keep our policy to push things upstream whenever possible, we will be ok.
git subtree is available since May 2012 – 1.7.11+, so it should be available for everyone.
Let me show you how we would track a "module" using git subtree:
First add the subtree at a specified prefix folder:
$ git subtree add --prefix puppet/modules/ruby https://leap.se/git/puppet_ruby master --squash
This produces output like this:
git fetch https://leap.se/git/puppet_ruby master warning: no common commits remote: Counting objects: 85, done. remote: Compressing objects: 100% (64/64), done. remote: Total 85 (delta 24), reused 0 (delta 0) Unpacking objects: 100% (85/85), done. From https://leap.se/git/puppet_ruby * branch master -> FETCH_HEAD Added dir 'puppet/modules/ruby'
The squash parameter here makes it so you do not store the entire history of the sub-project in the main repo. If we did that, the platform repo would get really polluted with this submodule's history. Even with squash, we will get quite a large commit change because each file will be added to the repository. With the squash, for each module, we get two commits:
This records a merge commit by squashing the whole history of the repository into a single one.
commit 470dd7883ae207226e22a8da0a710fbf7eaabf07 Merge: 50e68aa c33d9df Author: Micah Date: Mon May 23 11:15:45 2016 -0400 Merge commit 'c33d9df38f765d4e84c0acb886263a0ca6238f36' as 'puppet/modules/sysctl' commit c33d9df38f765d4e84c0acb886263a0ca6238f36 Author: Micah Date: Mon May 23 11:15:45 2016 -0400 Squashed 'puppet/modules/sysctl/' content from commit 975852b git-subtree-dir: puppet/modules/sysctl git-subtree-split: 975852b7acc1125b4cd9d4d490b9abd8d31217e6
Once these have been added to the repository, they do not need to be pulled separately when you pull the repository for the first time, the code is embedded in the platform.
For updating a module from the upstream, you do a subtree pull (this isn't a super interesting example, because the latest code is already checked out):
$ git subtree pull --prefix puppet/modules/ruby https://leap.se/git/puppet_ruby master --squash From https://leap.se/git/puppet_ruby * branch master -> FETCH_HEAD Subtree is already at commit 9ccd853c49af7d0b57ebd9c2ea7673b193fce24b.
Then you would commit this. Quick and painless, but the commands are slightly lengthy and hard to remember.
We can make the commands shorter by adding the sub-project as a remote... but these remotes are just local for the person who makes them. They are worthwhile to do though, you just do this:
git remote add -f puppet_ruby https://leap.se/git/puppet_ruby
Now we can add the subtree (as before), but now we can refer to the remote in short form:
git subtree add --prefix puppet/modules/ruby puppet_ruby master --squash
The command to update the sub-project at a later date becomes:
git fetch puppet_ruby master git subtree pull --prefix puppet/modules/ruby puppet_ruby master --squash
Not a huge gain, but it helps.
For contributing back to upstream, we can freely commit our fixes to the sub-project in our local working directory now.
When it’s time to contribute back to the upstream project we need to fork the project and add it as another remote (that we can push to):
git remote add puppet_ruby_fixbug4242 ssh://gitolite@leap.se/puppet_ruby
Now we can use the subtree push command like the following:
git subtree push --prefix=puppet/modules/ruby puppet_ruby_fixbug4242 master git push using: puppet_ruby_fixbug4242 master Counting objects: 5, done. Delta compression using up to 4 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 308 bytes, done. Total 3 (delta 2), reused 0 (delta 0) To ssh://gitolite@leap.se/puppet_ruby 02199ea..dcacd4b dcacd4b21fe51c9b5824370b3b224c440b3470cb -} master
After this we can open a pull-request to the maintainer of the package.
(from redmine: created on 2016-05-23, closed on 2016-08-02, relates #8248 (closed))