Grauw’s blog
Mercurial shallow clones and repository sizes
A colleague of mine was asking whether Mercurial supported shallow clones. The short answer is no. The slightly longer answer is, it’s under development.
But if you ask me, shallow clones aren’t really needed. Aside from being sort of against the point of having a DVCS, you don’t actually gain that much. I did some measurements (on Windows):
- Mozilla-central repository:
- Files: 41926
- Repository size: 241 MB
- Working copy size: 287 MB (87,4 MB zipped)
- Clone time: 276 seconds (0,8 MB/sec)
- Python trunk repository (going back to 1990!):
- Files: 4199
- Repository size: 97,1 MB
- Working copy size: 55,7 MB (14,8 MB zipped)
- Clone time: 82 seconds (1,2 MB/sec)
- Mercurial repository:
- Files: 1144
- Repository size: 15,4 MB
- Working copy size: 6,86 MB (2,08 MB zipped)
- Clone time: 25 seconds (0,6 MB/sec)
- Backbase 4 repository (converted from SVN):
- Files: 4691
- Repository size: 77,9 MB
- Working copy size: 74,6 MB (33,6 MB zipped)
- Backbase cobrowse repository (converted from SVN):
- Files: 1703
- Repository size: 79,6 MB
- Working copy size: 39,3 MB (24,0 MB zipped) (was 87,2 MB 400 revisions ago!)
The repository size is roughly what will be transferred over the wire when cloning. The zipped size gives an indication of the theoretical optimal case where you would retrieve just the files and no history. Note that this does not include metainformation which would also get transmitted with shallow clones — e.g. Python’s changelog is ~10 MB. Finally clone time is the time it takes to do hg clone -U, which skips updating the working directory so is a reasonable approximation of the time spent downloading and creating the repository.
Also of note are some further measurements I did with the Mozilla repository. When I copy it on my hard drive between different disks, it actually also takes a lot of time: 174 seconds! (The Python repository took 23 seconds.) Creating a working copy from the repository also takes a long time, 241 seconds. My guess is that this is likely because of the large amount of small files.
So in other words, even if you would cut the download time of the Mozilla repository by making the shallowest of clones, much cloning time is still spent creating the large amount of small files. A little math suggests that for this repository, you would only be able to bring down the clone time by some 20%.
Conclusion
Looking at these numbers, first of all, in most cases just thinking about how ‘shallow’ you want to clone a repository is probably already going to take more time than to just clone it ;p. And what would you gain? A slightly shorter clone time perhaps, but you lose the ability to look at the full history. And how often do you do a complete clone? Only the first time.
For local clones, Mercurial and git actually create hard links, so making a local clone is much faster (74 seconds Mozilla, 11 seconds Python) and hardly takes any disk space. Clones over a local network will of course be faster as well. And as for slow connections, because in the end the repository is completely hosted locally and few things require interaction with a central server, a DVCS is already very friendly towards those.
Finally, consider also that internet connections get faster every year, so even though repository size grows steadily over time, this does not necessarily have to become a problem. And hey, if it does, by that time Mercurial will have shallow cloning too :).

Bundles to the rescue by C2H5OH at 2010-04-09 11:26
Using downloadable bundles (daylies or weeklys) can speed up clone time. For example, downloading a bundle locally and cloning from it, then pulling from the remote repository if there were some new changesets not included in the bundle.