Version Control
Steven J Zeil
Abstract
Version control (a.k.a. version management is concerned with the management of change in the software artifacts being developed.
In this lesson we look at the kind of practical problems that arise during software development and that can be addressed by proper version control.
- maintaining the history of changes to the code base
- exploring possible changes without breaking the code base
- allowing collaboration among developers needing simultaneous access to the code base
- Version control is sometimes considered a sub-area of configuration management
- a.k.a., Software Configuration Management (SCM)
- Oddly enough, many tools labeled and marketed as SCM tools only address version control
1 Issues in Version Control
The issues addressed by version control are:
-
History
-
How has the software changed since date-or-version-number? Who made those changes? Why were they made? Can we go back?
-
-
Exploration
- Can we try out a set of plausible changes without affecting the “main” software build? Even if exploration of the effects of those changes may take a long time?
-
Collaboration
- Can we have multiple developers working on the code without interfering with one another’s work?
2 Approaches and Tools
-
Local version control systems manage history by setting aside directories on the same file system where the software under control is housed.
- sccs, rcs
-
Centralized version control systems keep the system history at a centralized location accessible via the network.
Developers check out a copy of the current (or a desired older) version of the software onto their own machines.
- CVS, Subversion
-
Distributed version control systems allow developers to keep the full system history on their own machines.
- Most often, a central location holds a base copy for management/distribution purposes, but this is not required.
-
git, Mercurial
2.1 History
VC (Version Control) systems keep code in a repository.
The primary operations of a VC repository are
- check out a copy of (a selected version of) the code into a working directory, and
- check in or commit a set of changes to that code
Each commit results in a new revision (version) of the code:
2.1.1 Diffs
Early VC systems made a big deal about only keeping “diffs” of the changes instead of entire copies of the differnet file versions.
That probably matters less now, but modern VC systems still like to describe each version in temrs of “diffs” from the prior one.
2.2 Exploration
2.2.1 Branches
We can start a branch to explore our idea while others continue work on the main trunk.
2.2.2 Merging a Branch
-
If the idea in the branch does not pay off, the branch can simply be abandoned.
-
If you decide to adopt the changes in the branch, you can elect to merge it back into the trunk.
- Need to resolve any conflicts introduced by continued development along the trunk.
- then the resulting combined file checked in with a trunk number
2.2.3 Combating Drift
Over time, a long-running branch can get so far out of sync with changes being made to the trunk that the final merge becomes difficult or even impossible.
- An effective strategy for combating this is to periodically merge the trunk into the branch
- the reverse of the “normal” merge direction
- the reverse of the “normal” merge direction
2.3 Collaboration
2.3.1 Locking
-
Early VC systems managed collaboration via locking
- Before editing a file, a programmer had to obtain an exclusive lock on that file
- A commit would release that lock.
-
This approach was cumbersome at best.
- often abused & circumvented.
2.3.2 Conflicts & Resolution
Modern VC systems inspect a proposed commit from a checked out revision rev to see if it alters any files that have changed in the repository since rev was checked out.
- If any such files exist, the commit is refused and the files in question are marked as conflicting.
- The developer must resolve the conflict by (presumably) inspecting the two conflicting versions of the file and saving/committing a new version of that file.
- VC-aware IDEs typically provide difference-based editors to aid this process.
2.3.3 Conflicts and Branching
Because different branches are often edited in parallel, conflicts are most often detected when attempting to merge branches.
- If both branches have seen lots of work, the number of conflicts can be large and the resolution of them quite complicated.
-
Popularly known as merge hell
-
In early VC systems, this was a powerful disincentive to using branches.
-
In modern distributed VC systems, branches are unavoidable – we’re swimming in branches
-
3 Forges & Repository Hosts
Software Forges provide a collection of project management tools for software development.
-
Examples include github, gitlab
-
Usually built around a service of hosting version control repositories.
- …leading to a “return” to a centralized model
- Added features typically include:
- project membership management
- Wikis
- Issue tracking
- and, increasingly, …
- support for CI/CD (internal and/or external)
Forges differ from artifact repositories (e.g., Maven Central & JCenter) in their emphasis on
- source code rather than binaries
- full version control rather than distinct copies by version number