A Gentle Introduction to Version Control – Part II

Using SVN

Subversion, or svn, is a very popular version control tool, used by some of the best software teams in the world. The fact that it is free and open source, in addition to being really, really good may have something to do with that. Like most good version control tools, Subversion too is split into two components – a server and a client.

You don’t need to worry about the server usually because it is set up on a system far out of your reach. If you are located in North America or Europe, it means that your company has outsourced the Subversion hosting somewhere in South America or India. Funnily enough, good companies in Bangalore (or Pune) usually outsource their Subversion hosting to a really, really good outfit in LA (read, Dreamhost). Silly. But that’s how the global economy works, and which is why it is more fun to be a software developer than to be an economist.

Coming back to our topic…

What you will use on a day-to-day basis is the Subversion command-line client (or TortoiseSVN, a GUI replacement).

Disclaimer: This article touches briefly upon how to get started with the client, without worrying about details. If you are a svn alpha-geek who dreams about hosting your own repository someday or whatever else you svn alpha-geeks dream about, and find my guide to be incomplete, don’t email me to write about the 10,542 variations in syntax that the svn client allows. I know you adore your svn client. But this article is targeted at people who don’t as much as know about svn, much less adore it.

Once the client is installed, you can launch a command prompt session by typing cmd into the Run dialog box in Windows and hitting enter. Once on the command prompt, type the following and hit enter.

C:\svn help

Detailed help is also available for some of the more oft-used commands by modifying the help command like so –

C:\svn help import

Import

You can use the import command to add files from your computer into the repository for the first time, and hence, begin versioning them.

C:\svn import trunk/ file:///F:/projects/notadesigner/trunk/web/index.htm -m "Initial import"

Checkout

If you are a new developer on a team that has been using Subversion for a while, you can use the checkout command to retrieve the source code from the repository onto your hard drive.

C:\cd projects\notadesigner\trunk
C:\projects\notadesigner\trunk\web>svn checkout https://notadesigner.com/svn/trunk

Note how the active directory is first changed to the path where the files have to be stored, and then the checkout command is called.

Alternatively, you can also specify the path to the working folder as a parameter to the checkout command.

C:\svn checkout https://notadesigner.com/svn/trunk C:\projects\notadesigner\trunk

Update

This is probably the next most used command in the Subversion client. Millions of developers across the globe run this command every morning to retrieve the latest version of files from the repository into their working folder. Running this command makes sure that you are running abreast of everyone else on your team, by bringing the latest changes they have committed to the repository onto your machine (and likewise, your latest changes get updated on their machines).

C:\projects\notadesigner\trunk\web>svn update

Like true Zen, it is a deceptively simple command. Just two words that do you a whole lot of good.

Add / Delete

These commands do just what they say. They make changes to your working copy, and schedule the same change to the repository when you commit your changes later.

C:\projects\notadesigner\trunk\web>svn add locations.htm
C:\projects\notadesigner\trunk\web>svn delete directions.htm
C:\projects\notadesigner\trunk\web>svn commit -m "Added new locations. Deleted directions because we're spread all over the city and I didn't know where to send them."

Copy / Move

Copy and move work exactly as they do on your operating system, except that they are targeted at the repository.

C:\projects\notadesigner\trunk\web>svn copy jobs.htm https://notadesigner.com/svn/trunk -m "Adding new jobs page"

Move requires a separate commit to implement your changes into the repository.

C:\projects\notadesigner\trunk\web>svn move jobs.htm jobs/index.htm
C:\projects\notadesigner\trunk\web>svn commit -m "Moved jobs page into separate folder"

Revert

This command rolls back any changes made to a file in your working folder, and restores it to the version on the repository.

C:\projects\notadesigner\trunk\web>svn revert contact.htm

Using TortoiseSVN

TortoiseSVN lets you perform all actions that the svn command-line client lets you, without messing with the command-line. If you are new to Subversion, it’s a compelling replacement to the regular svn client.

Finally, Don’t Do This

I have found a shockingly large number of people doing silly things while using a VCS. I am tired of telling them not to do it and so if you abuse a VCS in any of the following ways and I catch you at it, I’ll make you write this list on a blackboard that screeches, for at least a week…without ear muffs.

It’s not that difficult to avoid them. So listen up.

Don’t make a copy of your working folder to make edits

If you are checking out files from the repository in a working folder and then making a copy of that folder to make edits because you don’t want to overwrite your old files, you haven’t understood the most basic premise of version control.

The repository already has a copy of your files which compile correctly. If you make a mistake, revert your files and you’re back where you began from.

Don’t make a copy of your folder in the repository before editing your files

A more severe case of the previous symptom is when a developer copies the entire trunk in the repository into trunk-2 (effectively creating a branch) and making edits to that.

Don’t delete the current version of the file from the repository to add the edited working copy in its place

The whole point of version control is to have a history of your file over time. If you delete the file from your repository, you destroy that history. If you do it every time, somebody who looks at your file six months down the line will have no idea about the pedigree of your file, the bugs that have been fixed or the new code that has been added to it.

Don’t forget to add comments when committing changes

You’ve got to be really lazy to not add comments at the time of committing your changes. Vague comments such as “updated file” are just as unhelpful.

It helps to be descriptive in the changelog because that’s the first place people go to find out what’s been cooking in your files. If you don’t tell them up front what you’ve been up to, they’ll just think that you’ve been messing up and saddle you with all the blame.

Don’t forget to add new files to the repository

This is a common mistake and can happen by even the most seasoned developers. But I’m writing it here to simply reinforce it that this should be avoided.

References

If you find this article doesn’t quite quench your thirst about version control systems or Subversion, you can learn more stuff by visiting the following websites.

http://svnbook.red-bean.com/
Colloquially referred to as the svn-book, this is the granddaddy of all Subversion books. Contains detailed explanations about the history of svn, repository and server setup and administration, detailed explanations about the acrobatic feats that the client lets you do, and if you’re really interested, a description of the C APIs that let you hook up svn with your own applications.

http://www.joelonsoftware.com
A delightful collection of extremely well-written and funny articles about computers, software and management by Joel Spolsky. This should be a must-read for every developer.

http://www.ericsink.com/scm/source_control.html
Eric Sink is the founder of SourceGear, a company that specializes in selling version control software. You can’t get anybody better to write about version control.

http://betterexplained.com/articles/a-visual-guide-to-version-control/
Kalid Azad maintains this brilliant blog about all things mathematical and computer-sciencey. It’s a refreshing view to everyday complexities that you take for granted.

A Gentle Introduction to Version Control – Part I

In our rush to Web 2.0 our lives, we seem to have forgotten to imbibe the essentials. Everyone is guilty of that – from fruit farmers using artificial fertilizer, to politicians driving nations to war, to developers not using version control.

Shocking, really.

I do not have much pull with the fruit farmers or international diplomats. But as a fellow developer, it’s my duty to bring back the lost software teams of today into the fold. Forget “enterprisey architectures” for a while and look at the basics. In this article I will introduce you to the wholesome goodness of Subversion, a fantastic version control tool.

Essentially Missing

Version control is an essential tool in any software team’s kit. In spite of that, I regularly run into senior developers with years of experience who have never used it. Which is, to be honest, quite frustrating because then their working directory tends to look like this –

Product-latest

Prodcut-new-12-march (yes, it is misspelled)

Product-new-mar-15

and the disaster repeats till eternity.

If you notice, this directory structure is already functions as a basic version control repository, in that it separates successive updates to the product, but without all the extra goodies that a true version control tool would let you have such as comments, labels and developer history.

Towards the Light

Whenever I have had the good fortune of introducing developers to version control – even with rogue tools like Visual SourceSafe – they knew this was the elixir they were missing for so long. It takes some work to actually drive it into their system. But once there, it stays on forever.

The single biggest advantage all of them cite is how easy it becomes to synchronize files between team members. Because everybody is updating their files every morning from the same location, it becomes easier to keep abreast with everyone’s changes.

The next biggest advantage is the ability to roll back mistakes. If a file has not been checked in, simply revert it back and the VCS replaces it with the latest one in the repository. Even after the file has been checked in, a previous version of the file can be retrieved from the repository and used to restore the changes made.

A side-effect of this feature is the ability to sandbox major changes. Rewriting core algorithms of your accounting product using that new design patterns book? (Hah!) Do it in a local working copy, test it and then throw it away when you discover you suck at patterns check it in after it works fine.

As a project’s requirements evolve, files mutate into completely different beasts from their initial incarnation. By logging a note about each change made to the file in the VCS, developers are able to track the project history in the long term.

Another nice feature of the tracking tools is that they help assign ownership by logging the person who has made a change. This proves to be quite helpful when giving credit, or more frequently, blamestorming.

And the greatest relief that a VCS provides to all stakeholders is the daily backup that occurs automatically when developers check-out the latest changes every morning. It’s rare for a team using VCS to lose a lot of data due to hard drive failures.

The VCS Dictionary

Let’s begin with learning the terminology used when dealing with version control.

Parts of a VCS

  • Repository: The database storing the files. The repository is usually expected to be on a central location such as a network server, although it can exist on locally stored directories.
  • Server: The computer storing the repository. If the repository is stored on a directory on your own computer, then your computer is called the server although there is no network access involved.
  • Client: The computer connecting to the repository. Your computer.
  • Working Set/Working Copy: Your local directory of files, where you make changes.
  • Trunk/Main: The primary location for code in the repository. This is the in-progress version of code, with untested or partially implemented features. Feature-complete snapshots are stored in a branch folder.

Common Terms

  • Revision: What version a file is on (v1, v2, v3, etc.).
  • Head: The latest revision in the repository.
  • Commit Message: A short message entered at the time of committing a file, describing what was changed.
  • Changelog/History: A list of changes made to a file since it was created.

Basic Actions

  • Add: Put a file into the repository for the first time.
  • Check-out: Download a file from the repository.
  • Commit/Check-in: Upload a file to the repository. The file gets a new revision number, and people can check out the latest one.
  • Update/Sync: Synchronize your files with the latest from the repository. This lets you grab the latest revisions of all files. Do this at least once a day.
  • Revert: Throw away your local changes and return to the latest version from the repository.
  • Diff/Change/Delta: Finding the differences between two files. Useful for seeing what changed between revisions. This only works on text files (e.g. .as, .htm, .cs). Binary files (e.g. .psd, .fla, .doc) cannot be diffed.

Advanced Actions

Branch: Create a separate copy of a file/folder for private use (bug fixing, testing, etc). Branch is both a verb (“branch the code”) and a noun (“Which branch is it in?”).

Merge: Apply the changes from one file to another, to bring it up-to-date. For example, you can merge features from one branch into another.

Conflict: Occurs when two people edit the same file simultaneously. The first person to edit the file does not face any error. However, the file on the server is now out of sync with the file on the second person’s working folder. When he attempts to commit the file to the repository he gets a conflict.

Conflicts can be resolved for text files by manually selecting which lines of code to keep and which ones to discard.

There is no way to resolve binary files. If there is a conflict in binary files, then the second person has to begin again by retrieving the latest file, and re-creating the changes made to it before the conflict occurred.

To avoid conflicts on binary files, a user can lock the file before editing.

  • Resolve: Fixing the changes that contradict each other and checking in the correct version.
  • Locking: Flagging a file for exclusive use until it is committed again.
  • Breaking the lock: Forcibly unlocking a file so you can edit it. It may be needed if someone locks a file and goes on vacation (or “calls in sick” the day Halo 3 comes out).

An Illustrated Example

Pop Candy is a Java developer who has just joined the team at Timeless Software. Her task is to add an email client to their product. She creates a folder on her hard drive for the project, and proceeds to check out the files from the repository.

Once she has the entire product code check out onto her hard drive, she compiles it and familiarizes herself with its features.

Once she’s ready, and has read the spec, she begins editing the first file. She adds a few more files to the project, compiles and tests. Makes a few changes, goes back and compiles and tests, and in no time at all, she has reached her first milestone. Her email client connects to the server and successfully handles the server’s response to HELO.

She feels she’s achieved quite a bit for the day and proceeds to add her changes to the repository. The first step is to select the new files she’s created and adding them to the repository. Then, she selects the new files, and the ones she’s modified and commits them all in a single operation. She remembers to enter a log comment that describes the change clearly. When the other developers come in the next day and update their working copies, they’ll see that Candy has begun working on the email client.

The next day Candy is working on adding a rich text editor into the email client. She uses an off-the-shelf JavaScript library to do the job to save time. Smart!

But when her team lead reviews it, he points out to her that the terms of license of this library conflict with those of their commercial product. Candy is grumpy because it took an entire day to integrate the editor. It was her fault, though. The spec clearly stated that no off-the-shelf library is to be used. So she has to go back and undo her changes. She deletes all the unversioned files in the project folder and proceeds to revert the edited files to their head version.

A few days into development, as the email client begins adding bulk, the testers complain that the product itself seems to be very sluggish. Everybody scrambles to get their hands on the changelog on the date since when the testers noticed the slowdown.

All fingers are pointed at Candy’s code for the moment, but the team lead defers any decision until he’s actually reviewed her commits on that particular day. He retrieves the files of the previous revision and runs them through the tests. On a whim, he decides to replace their in-house SMTP test server with the live email server.

Luckily, it turns out that it was their testing SMTP server which was misbehaving. Instead of closing the connections on receiving QUIT, it continued to hold them. And the testers were the only ones to notice this because they sent the client through a gruelling 1,000,000 rounds of sending and receiving email.

You now see how easy it is to manipulate readers into seeing from your viewpoint with a contrived example veiling your actual agenda.

Oops!

No, that’s not what I meant. The moral of this story is that it is always a good idea to use version control. Version control is mother’s love and apple pie. It is the cat’s whiskers and the bee’s knees. And you just gotta have it!

Move on to part two here.

Keeping Count II – A Tale of Many Stores

In my previous post I described Daryl’s experience as a programmer writing an inventory control system for a candy store. Over the next few weeks Betty, the store owner, spread the word about Daryl’s fantastic inventory and billing management software amongst her friends. Daryl was flooded with requests for a computer application “just like the Betty’s”.

So he got down to work again. But Daryl now felt that writing inventory control wasn’t as much fun any more. So he wanted to get away with as little code, in the shortest amount of time as possible. He looked at what he had written for Betty and realised that most of the core inventory code was supposed to work exactly the same. All that was needed was to detail out some business specific stuff such as sale-units. This code could go into a thin layer on top of the code inventory module.

This is what his base Inventory class looks like.

class Inventory
{
    function AddUnits(Units:Number)
    {
        m_units += Units;
    }

    function RemoveUnits(Units:Number)
    {
        m_units -= Units;
    }

    function PrintUnits()
    {
        print("Remaining units: " + m_units);
    }
}

Daryl adds a member variable called FunctionReference and a method called SetUnitConvertor() to this class, which accepts a function reference as a parameter.

function SetUnitConvertor(ObjectReference:Object, FunctionReference:Function)
{
    m_unitConvertor = Delegate.create(ObjectReference, FunctionReference)
}

Then he modifies the PrintUnits() method to use this function reference.

function PrintUnits()
{
    print("Remaining units: " + m_unitConvertor(m_units));
}

He copies the Inventory class into the project for Mr. Coton’s cloth store and adds a class specific to that store inheriting from the Inventory class.

class ClothStoreInventory extends Inventory
{
    function ClothStoreInventory()
    {
        super.SetUnitConvertor(this, ConvertUnitsToLength);
    }

    function ConvertUnitsToLength(Units:Number)
    {
        // Find the total length of cloth sold
        return Units / 1000
    }
}

And another class is added for Mr. Chiseller’s consultancy firm.

class ConsultancyFirmInventory extends Inventory
{
    function ConvertUnitsToTime(Units:Number)
    {
        // Show the number of hours worked
        Seconds = Math.floor(Units / 1000);
        Minutes = Math.floor(Seconds / 60);
        Seconds = Seconds % 60;
        Hours = Math.floor(Minutes / 60);
        Minutes = Minutes % 60;
        return Hours + ":" + Minutes + ":" + Seconds;
    }
}

The result of this hoopla is that the base class will always call the formatting function from its inherited class (which would be quite impossible otherwise without using a messy callback system, or hard-coding the object reference into the base class, or (gasp!) event generation.

“But why use a delegate, you silly goose! The base class can override the PrintUnits() method with whatever it wants to use in its place.”

Well, that’s true. But when you start overriding methods from the base class, it means that the base class has not been designed to anticipate future use. This is a very basic example. But if your PrintUnits() method becomes more elaborate, such as drawing a dialog box with icons and buttons, or maybe even handling multiple output devices such as printers and LCD tickers, replicating all that code in an inherited class is really bad design. By using a delegate, you also allow yourself room to use static utility classes for mundane functions such as this.

Keeping Count I – The Candy Shop

Its been quite a while since I wrote about a serious programming-specific problem. In this two part series I will show you a popular database normalization method, and an alternative use for delegates. All code is written in Flash ActionScript and is purely illustrative.

Betty Green runs a candy shop that is wildly popular with the neighbourhood kids. The sweets she sells are sold by weight or by piece, depending upon the type. For example, peppermints are sold by weight, while chocolate bars are sold by the number of bars purchased. Being a good manager, Betty also keeps a register to track the amount of items of each type that she’s sold. At the end of the day she totals up the register and updates her inventory for the next day.

The system itself is quite good, but Betty would prefer that she didn’t have to wait till the end of the day to check out on which items she’s running low on, because then it means that her supplier can be notified only the next day. If she could let him know sooner, then she could stock up again on the same day and not lose customers.

Betty has received a new computer on her birthday from her aunt, which she feels can be used to good effect in her store. Daryl, a friend of hers, is a computer geek of sorts who has offered to write a software application for her billing and inventory management. He offers her many snazzy features such as automatic SMS order placement to her supplier when inventory falls low, a digital gallery of candies that she can display on an LCD ticker outside her store and of course, email. But what really gets her attention is a boring feature called inventory management. That is, the computer keeps track of her inventory and can give her updates after every sale, which allows her to place orders immediately if stock runs low.

So Daryl gets down to work. One thing that keeps nagging him is that inventory is to be maintained in two different units – grams and number of items. In her book-register, Betty used to draw two columns – one for weight and one for pieces. Whenever a sale was made, she’d fill in the appropriate column based upon the type of sweet she sold. Now, why should the database care how she sells her sweets? That is something that only Betty needs to know when she makes inventory. Daryl designs his database with a single UnitsSold column, in which he stores the number of units of each sale. His application interprets the sale units depending upon the type of sweet and displays the value with the appropriate unit symbols.

This is a simple illustration of an extremely powerful concept in data processing. All data are eventually converted to integers for the processor to work upon. By understanding how those numbers are encoded for abstract data types, you not only understand what’s going on behind the scenes, you can also drop down into the primitive level to perform operations that are not supported on abstract data types. For example, bitwise operators will balk at strings, but will gladly accept integers.

You can easily tell when a person doesn’t understand this, because their tracking database for fifty different activities will contain fifty columns, some of which contain numeric values, some floating point, some boolean and some time. If new activities have to be added, they will add another column to the table and replace the database files. Everything looks okay until someone logs in the next time after the update and finds that their previous scores are all gone. Whoops!

In my next article I’ll explain how multiple data types can be parsed efficiently at runtime, without rewriting too much code. Stay tuned.

Where Things Go Wrong

Daryl is a programmer. He works at a small ISV, writing web based applications and online games. Daryl is also a very good communicator and has a fair amount of graphic design skills in Photoshop. Because of his communication skills Daryl’s boss entrusts him with writing the specs of any new project. He’s doing a fairly good job at it too, and is looking forward to a good review at the end of the year.

But there’s a glitch.

Because it’s a wee little company, Daryl’s boss has very little leverage before his much larger clients and strategic partners. Often, they have to take up boring maintenance or debugging tasks on a codebase that only the most insane minds would be capable of writing. Life sucks. But they hang on with a brave face. Daryl keeps at it because he knows his boss is really nice at heart and understands that he doesn’t have much say in the matter. His boss doesn’t mind because these projects keep the cash inflow coming, without which he’d never be able to hand out that hefty paycheck to Daryl at the end of the month.

Now one fine day his boss lands up with a really exciting project to develop a game for a large publisher. Daryl is really excited about the entire affair and dives into it with gusto. Being a sane, level headed developer, he begins with the specifications. Whirrrrrr…buzzzzz…clank…clank…bling! Dust flies, sparks shoot and the racket is just about to get on everyone’s nerves when Daryl stops and hands out a pristine 20 page document in Adobe PDF format. It’s a gem of a specification and everyone’s really excited about it.

His boss negotiates the terms, the schedule is fixed (of course Daryl has a say in what will be developed when), and work begins.

Daryl is a little worried because he’s using a new technology called Gleam, that’s supposed to be the hottest thing in online gaming since GUI’s. But because its a fairly popular technology, he’s sure to find a lot of help from forums and websites. And he begins coding.

On day one things are fine. On day two, Daryl’s boss comes in looking very harried. They’ve just been arm-twisted into working on a tiny maintenance project by one of their partners, Extravagance Software. Okkkkk…Daryl says to himself. I think I’ll put this game that I’m working on aside for a bit and quickly finish off this maintenance task. It’ll mean a day lost, but I’ll catch up with it by putting in a weekend. Or so he thinks.

The day ends and he somehow manages to patch the code from hell that he’s had to maintain. Some of the fixes work but there are others where things are not so good. And more bugs come up during further QA. Daryl, blissfully unaware about this, comes in to work the next day and is just about to get back to his game when the manager from Extravagance shows up outside his office. He’s tearing his hair out, and screaming obscenities at the client, his ‘stupid’ coders, and pleads before Daryl on bended knee. Daryl looks at his boss, who shrugs back at him. So Daryl puts his work aside again and spends another day fixing someone else’s code instead of writing his own.

…and the next day, the story repeats.

Daryl is about to lose his patience, but feels he owes his company enough commitment to do what is required for the moment rather than crib about how boring a maintenance task is. But how far should he go? Does he have to continue slaving over this at the cost of his own project? Does his boss have to continue to let him do that?

I don’t know. I’ve spent enough time at companies, both small and large, to have been on both sides of this very real issue. Larger organisations often have enough financial and strategic muscle to control the smaller service vendors. Smaller outfits have often enough been in dire straits to have to constantly worry about keeping up the cash inflow. A level headed developer will usually end up unscathed through his entire match, and maybe even learn to accept it as a way of life. But if somebody can find a solution, it’ll mean a breath of fresh air for ISV’s and encourage more people to start one.