A Gentle Introduction to Version Control – Part II

Using SVN

Subversion, or svn, is a very popular version control tool, used by some of the best software teams in the world. The fact that it is free and open source, in addition to being really, really good may have something to do with that. Like most good version control tools, Subversion too is split into two components – a server and a client.

You don’t need to worry about the server usually because it is set up on a system far out of your reach. If you are located in North America or Europe, it means that your company has outsourced the Subversion hosting somewhere in South America or India. Funnily enough, good companies in Bangalore (or Pune) usually outsource their Subversion hosting to a really, really good outfit in LA (read, Dreamhost). Silly. But that’s how the global economy works, and which is why it is more fun to be a software developer than to be an economist.

Coming back to our topic…

What you will use on a day-to-day basis is the Subversion command-line client (or TortoiseSVN, a GUI replacement).

Disclaimer: This article touches briefly upon how to get started with the client, without worrying about details. If you are a svn alpha-geek who dreams about hosting your own repository someday or whatever else you svn alpha-geeks dream about, and find my guide to be incomplete, don’t email me to write about the 10,542 variations in syntax that the svn client allows. I know you adore your svn client. But this article is targeted at people who don’t as much as know about svn, much less adore it.

Once the client is installed, you can launch a command prompt session by typing cmd into the Run dialog box in Windows and hitting enter. Once on the command prompt, type the following and hit enter.

C:\svn help

Detailed help is also available for some of the more oft-used commands by modifying the help command like so –

C:\svn help import

Import

You can use the import command to add files from your computer into the repository for the first time, and hence, begin versioning them.

C:\svn import trunk/ file:///F:/projects/notadesigner/trunk/web/index.htm -m "Initial import"

Checkout

If you are a new developer on a team that has been using Subversion for a while, you can use the checkout command to retrieve the source code from the repository onto your hard drive.

C:\cd projects\notadesigner\trunk
C:\projects\notadesigner\trunk\web>svn checkout https://notadesigner.com/svn/trunk

Note how the active directory is first changed to the path where the files have to be stored, and then the checkout command is called.

Alternatively, you can also specify the path to the working folder as a parameter to the checkout command.

C:\svn checkout https://notadesigner.com/svn/trunk C:\projects\notadesigner\trunk

Update

This is probably the next most used command in the Subversion client. Millions of developers across the globe run this command every morning to retrieve the latest version of files from the repository into their working folder. Running this command makes sure that you are running abreast of everyone else on your team, by bringing the latest changes they have committed to the repository onto your machine (and likewise, your latest changes get updated on their machines).

C:\projects\notadesigner\trunk\web>svn update

Like true Zen, it is a deceptively simple command. Just two words that do you a whole lot of good.

Add / Delete

These commands do just what they say. They make changes to your working copy, and schedule the same change to the repository when you commit your changes later.

C:\projects\notadesigner\trunk\web>svn add locations.htm
C:\projects\notadesigner\trunk\web>svn delete directions.htm
C:\projects\notadesigner\trunk\web>svn commit -m "Added new locations. Deleted directions because we're spread all over the city and I didn't know where to send them."

Copy / Move

Copy and move work exactly as they do on your operating system, except that they are targeted at the repository.

C:\projects\notadesigner\trunk\web>svn copy jobs.htm https://notadesigner.com/svn/trunk -m "Adding new jobs page"

Move requires a separate commit to implement your changes into the repository.

C:\projects\notadesigner\trunk\web>svn move jobs.htm jobs/index.htm
C:\projects\notadesigner\trunk\web>svn commit -m "Moved jobs page into separate folder"

Revert

This command rolls back any changes made to a file in your working folder, and restores it to the version on the repository.

C:\projects\notadesigner\trunk\web>svn revert contact.htm

Using TortoiseSVN

TortoiseSVN lets you perform all actions that the svn command-line client lets you, without messing with the command-line. If you are new to Subversion, it’s a compelling replacement to the regular svn client.

Finally, Don’t Do This

I have found a shockingly large number of people doing silly things while using a VCS. I am tired of telling them not to do it and so if you abuse a VCS in any of the following ways and I catch you at it, I’ll make you write this list on a blackboard that screeches, for at least a week…without ear muffs.

It’s not that difficult to avoid them. So listen up.

Don’t make a copy of your working folder to make edits

If you are checking out files from the repository in a working folder and then making a copy of that folder to make edits because you don’t want to overwrite your old files, you haven’t understood the most basic premise of version control.

The repository already has a copy of your files which compile correctly. If you make a mistake, revert your files and you’re back where you began from.

Don’t make a copy of your folder in the repository before editing your files

A more severe case of the previous symptom is when a developer copies the entire trunk in the repository into trunk-2 (effectively creating a branch) and making edits to that.

Don’t delete the current version of the file from the repository to add the edited working copy in its place

The whole point of version control is to have a history of your file over time. If you delete the file from your repository, you destroy that history. If you do it every time, somebody who looks at your file six months down the line will have no idea about the pedigree of your file, the bugs that have been fixed or the new code that has been added to it.

Don’t forget to add comments when committing changes

You’ve got to be really lazy to not add comments at the time of committing your changes. Vague comments such as “updated file” are just as unhelpful.

It helps to be descriptive in the changelog because that’s the first place people go to find out what’s been cooking in your files. If you don’t tell them up front what you’ve been up to, they’ll just think that you’ve been messing up and saddle you with all the blame.

Don’t forget to add new files to the repository

This is a common mistake and can happen by even the most seasoned developers. But I’m writing it here to simply reinforce it that this should be avoided.

References

If you find this article doesn’t quite quench your thirst about version control systems or Subversion, you can learn more stuff by visiting the following websites.

http://svnbook.red-bean.com/
Colloquially referred to as the svn-book, this is the granddaddy of all Subversion books. Contains detailed explanations about the history of svn, repository and server setup and administration, detailed explanations about the acrobatic feats that the client lets you do, and if you’re really interested, a description of the C APIs that let you hook up svn with your own applications.

http://www.joelonsoftware.com
A delightful collection of extremely well-written and funny articles about computers, software and management by Joel Spolsky. This should be a must-read for every developer.

http://www.ericsink.com/scm/source_control.html
Eric Sink is the founder of SourceGear, a company that specializes in selling version control software. You can’t get anybody better to write about version control.

http://betterexplained.com/articles/a-visual-guide-to-version-control/
Kalid Azad maintains this brilliant blog about all things mathematical and computer-sciencey. It’s a refreshing view to everyday complexities that you take for granted.

A Gentle Introduction to Version Control – Part I

In our rush to Web 2.0 our lives, we seem to have forgotten to imbibe the essentials. Everyone is guilty of that – from fruit farmers using artificial fertilizer, to politicians driving nations to war, to developers not using version control.

Shocking, really.

I do not have much pull with the fruit farmers or international diplomats. But as a fellow developer, it’s my duty to bring back the lost software teams of today into the fold. Forget “enterprisey architectures” for a while and look at the basics. In this article I will introduce you to the wholesome goodness of Subversion, a fantastic version control tool.

Essentially Missing

Version control is an essential tool in any software team’s kit. In spite of that, I regularly run into senior developers with years of experience who have never used it. Which is, to be honest, quite frustrating because then their working directory tends to look like this –

Product-latest

Prodcut-new-12-march (yes, it is misspelled)

Product-new-mar-15

and the disaster repeats till eternity.

If you notice, this directory structure is already functions as a basic version control repository, in that it separates successive updates to the product, but without all the extra goodies that a true version control tool would let you have such as comments, labels and developer history.

Towards the Light

Whenever I have had the good fortune of introducing developers to version control – even with rogue tools like Visual SourceSafe – they knew this was the elixir they were missing for so long. It takes some work to actually drive it into their system. But once there, it stays on forever.

The single biggest advantage all of them cite is how easy it becomes to synchronize files between team members. Because everybody is updating their files every morning from the same location, it becomes easier to keep abreast with everyone’s changes.

The next biggest advantage is the ability to roll back mistakes. If a file has not been checked in, simply revert it back and the VCS replaces it with the latest one in the repository. Even after the file has been checked in, a previous version of the file can be retrieved from the repository and used to restore the changes made.

A side-effect of this feature is the ability to sandbox major changes. Rewriting core algorithms of your accounting product using that new design patterns book? (Hah!) Do it in a local working copy, test it and then throw it away when you discover you suck at patterns check it in after it works fine.

As a project’s requirements evolve, files mutate into completely different beasts from their initial incarnation. By logging a note about each change made to the file in the VCS, developers are able to track the project history in the long term.

Another nice feature of the tracking tools is that they help assign ownership by logging the person who has made a change. This proves to be quite helpful when giving credit, or more frequently, blamestorming.

And the greatest relief that a VCS provides to all stakeholders is the daily backup that occurs automatically when developers check-out the latest changes every morning. It’s rare for a team using VCS to lose a lot of data due to hard drive failures.

The VCS Dictionary

Let’s begin with learning the terminology used when dealing with version control.

Parts of a VCS

  • Repository: The database storing the files. The repository is usually expected to be on a central location such as a network server, although it can exist on locally stored directories.
  • Server: The computer storing the repository. If the repository is stored on a directory on your own computer, then your computer is called the server although there is no network access involved.
  • Client: The computer connecting to the repository. Your computer.
  • Working Set/Working Copy: Your local directory of files, where you make changes.
  • Trunk/Main: The primary location for code in the repository. This is the in-progress version of code, with untested or partially implemented features. Feature-complete snapshots are stored in a branch folder.

Common Terms

  • Revision: What version a file is on (v1, v2, v3, etc.).
  • Head: The latest revision in the repository.
  • Commit Message: A short message entered at the time of committing a file, describing what was changed.
  • Changelog/History: A list of changes made to a file since it was created.

Basic Actions

  • Add: Put a file into the repository for the first time.
  • Check-out: Download a file from the repository.
  • Commit/Check-in: Upload a file to the repository. The file gets a new revision number, and people can check out the latest one.
  • Update/Sync: Synchronize your files with the latest from the repository. This lets you grab the latest revisions of all files. Do this at least once a day.
  • Revert: Throw away your local changes and return to the latest version from the repository.
  • Diff/Change/Delta: Finding the differences between two files. Useful for seeing what changed between revisions. This only works on text files (e.g. .as, .htm, .cs). Binary files (e.g. .psd, .fla, .doc) cannot be diffed.

Advanced Actions

Branch: Create a separate copy of a file/folder for private use (bug fixing, testing, etc). Branch is both a verb (“branch the code”) and a noun (“Which branch is it in?”).

Merge: Apply the changes from one file to another, to bring it up-to-date. For example, you can merge features from one branch into another.

Conflict: Occurs when two people edit the same file simultaneously. The first person to edit the file does not face any error. However, the file on the server is now out of sync with the file on the second person’s working folder. When he attempts to commit the file to the repository he gets a conflict.

Conflicts can be resolved for text files by manually selecting which lines of code to keep and which ones to discard.

There is no way to resolve binary files. If there is a conflict in binary files, then the second person has to begin again by retrieving the latest file, and re-creating the changes made to it before the conflict occurred.

To avoid conflicts on binary files, a user can lock the file before editing.

  • Resolve: Fixing the changes that contradict each other and checking in the correct version.
  • Locking: Flagging a file for exclusive use until it is committed again.
  • Breaking the lock: Forcibly unlocking a file so you can edit it. It may be needed if someone locks a file and goes on vacation (or “calls in sick” the day Halo 3 comes out).

An Illustrated Example

Pop Candy is a Java developer who has just joined the team at Timeless Software. Her task is to add an email client to their product. She creates a folder on her hard drive for the project, and proceeds to check out the files from the repository.

Once she has the entire product code check out onto her hard drive, she compiles it and familiarizes herself with its features.

Once she’s ready, and has read the spec, she begins editing the first file. She adds a few more files to the project, compiles and tests. Makes a few changes, goes back and compiles and tests, and in no time at all, she has reached her first milestone. Her email client connects to the server and successfully handles the server’s response to HELO.

She feels she’s achieved quite a bit for the day and proceeds to add her changes to the repository. The first step is to select the new files she’s created and adding them to the repository. Then, she selects the new files, and the ones she’s modified and commits them all in a single operation. She remembers to enter a log comment that describes the change clearly. When the other developers come in the next day and update their working copies, they’ll see that Candy has begun working on the email client.

The next day Candy is working on adding a rich text editor into the email client. She uses an off-the-shelf JavaScript library to do the job to save time. Smart!

But when her team lead reviews it, he points out to her that the terms of license of this library conflict with those of their commercial product. Candy is grumpy because it took an entire day to integrate the editor. It was her fault, though. The spec clearly stated that no off-the-shelf library is to be used. So she has to go back and undo her changes. She deletes all the unversioned files in the project folder and proceeds to revert the edited files to their head version.

A few days into development, as the email client begins adding bulk, the testers complain that the product itself seems to be very sluggish. Everybody scrambles to get their hands on the changelog on the date since when the testers noticed the slowdown.

All fingers are pointed at Candy’s code for the moment, but the team lead defers any decision until he’s actually reviewed her commits on that particular day. He retrieves the files of the previous revision and runs them through the tests. On a whim, he decides to replace their in-house SMTP test server with the live email server.

Luckily, it turns out that it was their testing SMTP server which was misbehaving. Instead of closing the connections on receiving QUIT, it continued to hold them. And the testers were the only ones to notice this because they sent the client through a gruelling 1,000,000 rounds of sending and receiving email.

You now see how easy it is to manipulate readers into seeing from your viewpoint with a contrived example veiling your actual agenda.

Oops!

No, that’s not what I meant. The moral of this story is that it is always a good idea to use version control. Version control is mother’s love and apple pie. It is the cat’s whiskers and the bee’s knees. And you just gotta have it!

Move on to part two here.