Revision Control
From LentzWiki
Contents |
Concept
A Revision Control system allows you to keep track of changes to a set of files. That could be the sourcecode, build scripts and documentation for a program, or web site pages and scripts, or your system's configuration files... anything, really. Each of those sets is called a repository.
Every time you make a few change, you "check in" or "commit" so the system knows there's a newer version of those files. So you can always go back to some point in time and get that revision back, or just compare that revision with what you have now. Many people make backups of stuff, archive things... of course backups are important, but back up the repository - for the way you work, this is so much easier!
When developing software -and that includes modern web applications- a revision control system should be part of your standard toolkit. It's essential. Say you build a web site. You can have a live (production) site, and also staging or testing servers. These are simply branches in your revision control system. You can easily update the live server to a new release version, and even shift back if there are still some problems.
Teams
Revision control becomes particularly important when more than one person needs to work on the same set of files. You may wish to trace back who did what and when, but also a revision control system can often automatically resolve the situation where two people have edited something in the same file.
Development Process
One of the smart tricks in development is to commit very frequently, so you have just little incremental changes to your project. Having big changes lying around for weeks is just not a good idea. At least if you did regular commits, if you make some nasty mistake, you can go back and sort things out; it may still take some effort, but it's not disastrous.
Text and Binary files
Revision control systems work best with textfiles (that includes RTF, XML, HTML and PHP scripts). Binary formats such as used by PowerPoint (see Presentation_Systems) are not so handy. The system might be able to manage it, but it won't be space-efficient and it's very difficult to compare different revisions. If you keep a web site in a revision control repository, it does make sense to also keep the site's images (PNGs, JPGs) there also. Those files are generally fairly small anyway.
Distributed Systems
In a distributed version control system, each repository is fully self-contained - it does not need to talk to some computer elsewhere for checking out, checking in, or any other basic operation. This means that it's all locally available on your computer, whereever... you could be on an airplane, still do some stuff, and do your frequent commits.
To share your changes or publish a version, you synchronise with another repository. This could be some central server, but it could also be just a local branch repository for your little team. That local branch can later be merged back into the main line of a project. Merging becomes extra important with a distributed system, since repositories are likely to have more changes between each other before they get synchronised. Merging branches turns a tree into a graph, and for tracking history accurately you need to have that information visible. The distributed systems listed here do this properly.
In my opinion, once you've started using a distributed system and become used to the way it works, you'll never want anything else again.
Mercurial and Bazaar
Mercurial (hg) and Bazaar (bzr) appear to have the most traction: active development, major projects are using it, increasing support in development environments, etc. Both are being chosen by some interesting (& big) projects. Right now Bazaar (with multiple developers from Canonical working on it) appears to be moving the fastest now, with Mercurial's latest release being about half a year ago.
- Mercurial - by Matt Mackall (Selenic).
- Eclipse plugins also available!
- Bazaar - by Martin Pool (Canonical)
Monotone
Monotone is also interesting, but perhaps over-engineered; it keeps its repositories in an SQLite database with each attribute of each item individually signed. From the single repository you use different working copies, so that's nice and efficient. Mercurial can have repositories with multiple heads also, but this is not yet well documented. BitKeeper would have been architecturally able to do it, but I believe it was tried and stuffed up (LODs - lines of development) so the concept was ditched in favour of keeping separate repositories for each branch.
- Monotone - by Graedon Hoare
Git
Git was written by Linus when he moved away from BitKeeper, and it's very much geared to the way he personally works. It's also useful for some others, but not everybody.
- Git - by Linus Torvalds
BitKeeper
I've used BitKeeper professionally, but with its licensing it's no longer an option in many cases; my favourite right now is Mercurial, partially because it is in a way very similar to BitKeeper. That's not just a level of comfort to me, I think the system simply makes sense. Bazaar also appears very good though and I am reviewing it now.
- BitKeeper - by Larry McVoy (BitMover) only free (read-only) client available now.
Centralised Systems
As, as you've probably figured from the above, I'm not a fan of centralised systems. Distributed systems can work online, but centralised systems are a pest when it comes to offline situations. So you can work, but you can't commit in little bits... that can prove to be very harmful to your work and process.
CVS and SVN
CVS does not handle branch merges decently, so in essense it tracks history by warping it. Not nice. Subversion (SVN) was meant to resolve the various issues with old CVS, but it doesn't; one key example is the merging. The one thing that CVS and SVN have going for them is excellent tools support. Any IDE (like Eclipse) has hooks built in. The distributed systems are catching up to this now.
People also mention SVK, it allows SVN to be used offline. But this is architecturally the wrong angle... you want your whole infrastructure and history to be geared towards distributed/offline work, not only the ability to commit offline. I see it as a hack, not a neat solution. Sorry.
- Subversion - SVN, the non-perfect successor to CVS. Good tools support though.
- CVS - the old ugly critter, still used in lots of places.
Perforce
Perforce is a proprietary system, from what I've seen it's very good and it has some support for distributed repository servers. It's not a truly distributed architecture though.
Others (historical)
- SCCS - in the old Proprietary Unixes; Architectural ancestor of BitKeeper.
- RCS - Ancient also.
