git -- Distributed Version Control

Steven J Zeil

Last modified: Sep 22, 2020

Contents:

1 Two Levels of Committing

1.1 Push and Pull

2 git

2.1 Where do files live?

Abstract

Distributed version controls models relax the dependency upon a central repository as the keeper of the one true project.

Every developer has a snapshot of an entire development history.
- In essence, you check out the entire past history of a project.
- And every checked out copy becomes an independent branch.
Developers may decide for themselves which of these branches should merge
- Merging and conflict resolution, which are treated as exceptional operations in centralized systems, are regarded as the norm in this model.

We will look at git, a popular distributed version control system.

Sounds Like Anarchy

In practice, projects often due have a central repository for “official” releases.
But splinter projects are easier to form
- and can continue to share some changes until the code base diverges too much.

1 Two Levels of Committing

A Synthesis of Local and Remote

In a distributed model, a developer maintains

a local repository
- into which changes can be committed (as in local models like rcs)
and periodically may synchronize with a remote repository
- which might be centralized or just another developer’s

1.1 Push and Pull

We still have the familiar operations
- check out a copy of a revision into a working directory
- commit changes in the working directory ino the repository
In a distributed VC, these operate on the local repository.
And we add new operations
- pull, to fetch commits from a remote repository and merge those changes into a branch in our local repository
- push to send commits from our local repository and merge them into a branch in the remote repository

1.1.1 Branches are Everywhere

The use of the term “merge” to describe push and pull is not an accident.

In the distributed model, branches are ubiquitous.
But pairs of (local,remote) branches are generally synchronized (and usually given identical names to reflect that).

1.1.2 A Fear of Committment

The local/remote, two-Level commit approach helps resolve a common dilemma from centralized VC systems:

When or how often should we commit changes?
- In a centralized system, we have conflicting goals
  - Safeguard against losing work: argues for committing frequently
  - Avoid interfering with other developers by not checking in incomplete work
In a two-level system, we can commit frequently to the local repository and only push to the remote repository when a “unit” of work is completed.
- Wait too long, though, and we may still face merge hell.

2 git

2.1 Where do files live?

Edit files in your work area
- Your ordinary directories/folders of files
Stage the files that you want to commit.
- The stage is also sometimes called the index.
A commit copies updates the local repository with the files on the stage.
Push sends commits from your local repo to a remote one.
Pull fetches commits from the remote repo into your local one.
- If safe, merges changes into your work area as well.

2.2 Revisions

Unlike earlier VC systems, a git revision is a state of the entire project rather than of a single file/directory.
- After committing a change, the entire system, even unchanged files, advance to a new revision ID
  - Of course, “behind the curtain” you are still going to have incremental diffs, but that does not affect our visible interactions
Because of the distributed model,
- revision numbers cannot simply be incremented in any meaningful fashion
- there is a need to easily determine when two revisions in two different repositories are, in fact, copies of the same system state
Revision numbers are therefore replaced by hash codes computed over the file set that constitutes the entire project

git Snapshots

A git repository contains, conceptually, a collection of snapshots (a.k.a., commit objects, a.k.a. revisions, a.k.a. versions).

Each snapshot contains

The set of files for the project
The name of this snapshot (hash code)
References to the parent snapshots
- Most have one parent
- Initial commit would have zero
- Merges can result in a snapshot with multipel parents

Heads

A git repository also contains a collection of heads.

These are human-assigned names for selected snapshots.

Heads refer to the most recent snapshot in a chain of commits
- Hence heads actually identify branches
Every repository has a head “master”.
At any given time, one head is considered active. This one is aliased to the head “HEAD”.

How shall I name thee?

Snapshots in a repository may be identified by giving

Its SHA1 hashcode
A long enough prefix of that hashcode to be unique
By a head
Relative to one of these: ^ means “parent-of”
- e.g., HEAD^ would be the state before our most recent commit

2.3 History

Common Local History Commands

git add files stages modified files, scheduling the current version to be included in the next commit (recursing through directories)
- An intermediate step not needed in earlier VC systems
git commit -m message commits all staged changes to the local respository
- Add a -a to add all modified files in the current directory and below to the staging set
git status lists modified files
git diff file displays what was changed

2.4 Exploration

Every Local Repository is its own collection of branches

So one way to “branch” in git is to simply check out a new copy.

But sometimes we want to branch within a local repository

Branching Within a Local Repository

git branch newHeadName/*-i desiredParentSnapshot

creates a new branch
git checkout branchHead

switches to a new branch
- Replaces the files in the current directory by a copy of the state for that branch.

When Should I Commit? (Another perspective)

git users consider branches to be cheap.

So some advocate

Always work in branches
Keep the master branch in a releasable state

Remember, every local copy of the repository a branch in its own right. So one way to achieve the same effect is to commit frequently in your local repository but only push to the central repository when you have something in a releasable state.

This approach delays making your unfinished code available to other members of your team. Whether this is a viable approach depends on

Whether your local repository is backed up.
The chances of other team members making conflicting changes.
- The longer you go between pushes and pulls, the more likely you are to encounter merge conflicts and the harder they will be to resolve.

Merging Local Branches

git merge head

produces a new snapshot representing the merge of the current one (HEAD) with the named head.

The merged revision will have both HEAD and head as parents.
- git identifies the more recent common ancestor of the two branches and performs a 3-way merge
  - If a change (compared to the common ancestor) does not conflict (overlap) any changes from the other branch, the change is copied automatically into the merged state.
  - If conflicts are determined, markers are inserted into the working copy of the file and the user alerted.
- If the merge completes without conflict, the resulting merged state is committed.
  - If conflicts were found, the working copy is updated but no commit takes place.
Branches not needed after a merge can be deleted

git branch -d head removes the head name from the repository (but does not actually delete the history of changes along the branch.

3 Collaboration

Collaboration in git takes the form of interaction between your local repository and a remote repository.

Concepts (and, sometimes, commands) are much the same as in the local mode

Starting from a Remote Repository

If you are working with an existing remote repository

git clone remoteSpec

creates a new local repository as a copy of the remote one.

The remoteSpec names the remote repository
- Could be a simple file path if on the same machine
- Could be an http:// URL (generally for anonymous access)
- Could be an ssh address

Cloning

Suppose that we have a remote repository with two branches and a few commit objects on each.

Our local cloned repository will remember its remote origin repository.
All heads from the remote repository will be cloned as origin/head
We will get a local master head
You can request local heads for non-master branches by tracking, e.g.

git branch --track enhanced origin/enhanced

Life after Cloning

Starting from this local repository, …

… suppose each repository adds a commit along the trunk:

Our local heads separate from the remembered positions of the remote ones.

Fetching Remote Changes

The basic command to get changes from the remote repository is git fetch

Remember, each repository is, in essence, a new (set of) branch(es)

If states are not identical, they are fetched as new branches
Local heads are unaffected

Pulling Remote Changes

More commonly used than fetching is pulling, which combines a fetch and a merge

Starting with this remote repository…

…and this local one.

(Note that commits (F, G) have been made to both repositories since the clone was created.)

Then

git pull origin master

yields this new version of the local:

Pushing to the Remote Repository

The push command

sends local commits to a remote repository
Advances the remote head marker to the end of the list of changes.

If the remote repository looks like this

and our local repository looks like this

git push origin master

yields this remote repository.

Push is NOT the Opposite of Pull

It’s actually the opposite of fetch

No merge is done when pushing

This leads to an important restriction

The remote head must point, before a push, to an ancestor of the commit that it would point to after the push.

This Push Will Fail

If the remote repository looks like this

and our local repository looks like this,

the push will fail

because if it went through, we would lose access to a state already committed in the remote repository.

Avoiding Bad Pushes

Easiest thing to do is to do a pull into the local repository first, then do the push.
- And hope no one sneaks in ahead of you
An alternative is rebasing

Rebasing

Rebasing changes the parent relationship of the current head so that it appears to have been derived directly from some other selected head.

If we start with this and do

git rebase master

we get this.

Now looks like enhanced was derived directly from the master head.
Despite all the talk of rebasing in the git literature, rebasing is a pretty rare operation and usually only required when thinhgs have gone terribly wrong.

from The System