We need your help!
Cyber Sprocket is looking to qualify for a small business grant so we can continue our development efforts. We are working on a custom application builder platform so you can build custom mobile apps for your business. If we reach our 250-person goal have a better chance of being selected.
It is free and takes less than 2 minutes!
Go to www.missionsmallbusiness.com.
Click on the “Login and Vote” button.
Put “Cyber Sprocket” in the search box and click search.
When our name comes up click on the vote button.
And now on to our article…
During a meeting yesterday we were talking about our internal repo library and reading some file out of all of the repositories for the purposes of generating some type of overview page. And it was mentioned that this would be a little tricky since all of the Git repositories would be “bare repositories.” I want to explain want that means.
A bare repository is one with no working tree. That is, there is nothing checked out. At the top level of every repository you will find a ‘.git’ directory. If you made that directory the only one in the repository, then that repository would be bare. This is the reason why it is convention for bare repository directories to be named ‘foo.git’, since when you clone them they are the contents of ‘/foo/.git’.
For example, here is a bare version of the a project database interface repository.
$ git clone --bare ~/Projects/3DC/DBInterface/ Initialized empty Git repository in /home/eric/Temp/DBInterface.git/ $ cd DBInterface.git/ && ls branches/ config description HEAD hooks/ info/ objects/ packed-refs refs/
Notice the directory has none of the files checked out. You cannot do work in a bare repository, because of this. You cannot even try to check out a working tree.
$ git checkout HEAD^ fatal: This operation must be run in a work tree
So what’s the point of bare repositories then?
- They make useful conduits for pushing and pulling work. Git will fight like Hell to stop you from pushing into a repository with a work tree, because this is potentially destructive. Since bare repos have no work tree, this is a non-issue for them.
- They save space. All of the files for the repository are stored—one way or another—in the ‘objects’ directory in the example above. Checking out files will create a copy of much of those objects. If we need a repository only for pushing and pulling, and do not intend to do work there, then we save space by having nothing checked out.
- They are safer from accidental destruction. Because they have no working tree, many commands will fail automatically. You cannot screw up a bare repository via mistaken merge or rebase or whatever. They will all fail.
Note that all three of these benefits come from the fact that bare repositories have no working tree.
But while that is all nice, it complicates the task we were talking about. Namely, reading the contents of some given file from a bare repository. A moments consideration tells us that this must be possible somehow. If it were impossible to get at the file information, then how could we ever meaningfully clone a bare repository to do actual programming? So that file info is somewhere.
To figure out how to get it, we have to understand the low-level Git objects. There are four of them:
In our case we only care about the first two. All objects have two components:
- SHA-1 Hash for a Name
The name is always a hash of the content. The nature of that content depends on the object in question. Blobs are nothing more than content and a hash of said content. When you add a file in a repository, Git creates a blob for the contents of that file. Blobs store nothing but the content; they do not even store things like file names. When people say that Git tracks content and not files, they are referring to this. Because blobs store only content, if you add duplicates of a file with different names, Git stores them only once, in one blob.
Git stores those file names in trees. The content of a tree is a list of file names and permissions associated with blobs or other trees. They are like the ‘directories’ of low-level Git objects. We can get at trees by using the so-called ‘plumbing’ commands. Let’s say we are still in the bare repository from the previous example. We want to see the files on the ‘master’ branch. We can do that with the ls-tree command, which takes a ‘tree-ish’ argument. In Git terminology, a ‘tree-ish’ is anything which can name a single tree, which may not be the name of a tree itself.
What I mean is this: we already know that a branch is nothing more than a pointer to a certain commit. A commit object is a snapshot of a single tree at a given point in history, along with some other information like message, commit time, author name, and so on. Because a commit is always a snapshot of one tree, a commit is an acceptable tree-ish.
This all means we can do this in our bare repository:
$ git ls-tree master 040000 tree 59913b97f0f1ce48b2a8c6f5d77fa986418f3292 Schemata
This output is telling us the ‘master’ branch has one tree. If we cloned and checked out the branch, we would have the ‘Schemata’ directory. Since we have the name of the tree here—the hash—we can look inside that directory without cloning anything or checking anything out.
$ git ls-tree 59913b97f0f1ce 100644 blob ecc73640542ac00ee0dbfb5e781fa219ea9f2abd SharedUsers.MySQL
The tree has one blob inside, with the name ‘SharedUsers.MySQL’. We can look at that blob if we want.
$ git show ecc7364 [ Long SQL file here… ]
Using these two commands we can navigate the Git trees and blobs as if they were the directories and files of a non-bare repository. At this point I think the solution of pulling out info from a fixed file should be obvious. We run ls-tree on the ‘master’ branch (or whatever is our convention), grep for the hash of the blob for our file, and then print out its contents, probably into some other script or program.