Git Notes

Main Site : CNK's space : Unix : Git Notes

I am trying to like git. The code community around github is rather nice. Easy forking of projects you want to adapt is cool - and most of the major Rails plugins are now distributed via github so that is very nice. However, I am still having some issues really making git sit up and dance the way I can make CVS and svn. Here are some notes on things that I have needed and were not readily apparent to me.

Working with a distributed repository

gitk shows 'branches' that correspond to my various clones of my master repository - even though I have not done anything that I had considered explicit branching. Interesting. So does that mean github tracks who clones things? It certainly tracks forking of projects - but I sort of thought that 'fork' was just a synonym for 'make a branch'.

Ahhhh well I finally made an hour in my day for watching Linus Torvalds' talk at Google about git. In it he answers the question I just had - any time you go distributed, you have created branches. They may all have the same default name 'master' but they are branches from the perspective of the other repositories.

The index

The addition of the index as a construct between my working copy and my next commit has also given me trouble. The index is a good thing. Its existance allows some of gits cooler features - like being able to easily do partial commits of the stuff you have been working on - including committing just some of the edits within a file!!! To do that, use "git add -i", add some files, then use the 'patch' subcommand to go through the diffs in the files you have added to include and exclude specific sections. But it takes some getting used to.

OK I am not the only one who has found that the index takes some getting used to. From Oliver Steele's blog post My Git Workflow:

Git’s problem is its complexity. Half of that is because it’s actually more powerful than the other systems: it’s got features that make it look scary but that you can ignore. Another half is that Git uses nonstandard names for about half its most common operations. (The rest of the VCS world has more or less settled on a basic command set, with names such as “checkout” and “revert”. Not Git!) And the third half is the index. The index is a mechanism for preventing what you commit from matching what you tested in your working directory. Huh?

A couple of big hints about the index that I could have used a few weeks before I finally found them came from http://smalltalk.gnu.org/blog/bonzinip/using-git-without-feeling-stupid-part-2

Now, what does the index have to do with conflicts? The answer is simple. If a merge has conflicts, git takes care of adding unconflicted files in the index, and leaves conflicted files out of the index. Read it again. Slowly. Once more. Then, go ahead.
Also, git diff, without any arguments, does not show changes that are staged in the index. In fact it diffs the working tree against the index, not against the repository. Read it again. Slowly. Once more. Then, go ahead.

So to diff between your working copy, index, last local commit, and remote repository:

  git diff 
    Gives you everything that has changed in your working copy that is
    not yet staged for the next commit.

  git diff --cached
    Gives you the changes staged to the index. This is what you want
    to use to figure out what will be included when you next commit.

  git diff HEAD
    All changes in the working tree since your last commit - staged
    and unstaged. 

  git diff master origin/master
    What has changed locally (master) since we last merged from our
    upstream repository (origin/master)

 

One thing I am still having trouble with is figuring out what has been done on the other repositories relative to the one I am in. For example I have my local working repository. And it has my github repository listed as a remote. I basically treat my github repository as if it were the one true repository - like it was my CVS or svn repsitory. So how do I figure out what has changed on github since I last pulled code from there - essentially the equivalent of "cvs -n up | grep U"? I was hoping to be able to find the correct parameters to feed to "git log" or "git diff" but I am beginning to suspect that isn't possible; that the information I need for that query is only available on github unless I first make the information local. I think 'git fetch' will bring down the repository information without affecting my working copy of the code.

Rebasing

Good diagrams about how git objects relate to each other - with some very nice diagrams of what you would get if you used rebase rather than merge. http://eagain.net/articles/git-for-computer-scientists/ Rebasing is a way to hide the unimportant details of how a couple of lines of development were going on in parallel before they came together. However, it is one form of rewriting history - so (from the article above) don't rebase branches that other repositories have created new commits on top of. A useful and probably fairly safe workflow is that you only rebase rather than merge if you have not shared your commits with anyone else yet.

git filter-branch

Warning, this can create a mess - use with caution. But that said, it is the way to rewrite history to pretend something never happened. In CVS, if I commit a file that shouldn't be there, I just go to the repository and remove the ,v file that contains it's revision information from the file system. In svn ?? I am pretty sure there is a command that let's you remove a file from all versions - even ones where it was originally included. In git you do:

  git filter-branch --index-filter 'git update-index --remove filename' HEAD
and then you commit. The trouble is, that command just recalculated all treeish objects since that file was originally created. So if, for example, you commit your database.yml file at the time you created the project, you need to recalculate basically all the SHA1s since project inception. Not a huge deal locally - but if you have remote repositories you need to keep in sync with? ugh. After removing the database.yml file and committing locally, I couldn't push to github. It rejected my merge (CNK recreate what message I got). I had to do 'git push -f origin master'. When I looked, the database.yml file was gone. However, since then I have tried to sync up some of my other local repositories to github. And, I am not sure if it was a sin of commission or a sin of omission, but my github repository now has that pesky database.yml file again. AND I can't fetch, pull or push between my repository on holden and github.

gitk

One of the best things about git is it's visualization tools (see the screenshow at the top of the page for an example). Mostly I use

gitk --all
because mostly what I am trying to see is what has changed between differnt branches (or local and remote branches). But I recently wanted to see just the branch I was working with and master - without the other branches that I don't care about. To do this I used:
$ gitk --argscmd="git for-each-ref --format='%(refname)' refs/heads" 

Misc.

Heroku uses git to deploy your code (aka, build and start dynos). The heroku ruby gem automates a lot of this - including adding heroku as a remote to your git repository. One side effect of this is that you can always grab a copy of what Heroku has for your app by cloning from that repository:

git clone -o heroku git@heroku.com:<APP_NAME>.git

Non-Fastforward pushes to Github

What if you have changed your mind about which commits should be part of a give branch - after you already pushed that branch to GitHub (or your remote repository of choice)? For example, our Rails class wanted to back up and redo one of the chapters we worked on the previous class. So we branched at the commit for the previous class and started there. After working on the branch for a while, we decided we just wanted to continue with that code and didn't care about the previous master branch. On our local copy, we just retagged the current commit as "master" and all was well. Until we tried to push up to GitHub so everyone could see it.

$ git push origin master
To git@github.com:LA-Rubyists/depot.git
 ! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'git@github.com:LA-Rubyists/depot.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again.  See the
'Note about fast-forwards' section of 'git push --help' for details. 

Fortunately, the docs for git push cover just the use case I have - complete with an ascii diagram:

git push origin +dev:master

Update the origin repository’s master branch with the dev branch, allowing non-fast-forward updates. This can leave unreferenced commits dangling in the origin repository. Consider the following situation, where a fast-forward is not possible:

            o---o---o---A---B  origin/master
                     \
                      X---Y---Z  dev

The above command would change the origin repository to:

                      A---B  (unnamed branch)
                     /
            o---o---o---X---Y---Z  master

Commits A and B would no longer belong to a branch with a symbolic name, and so would be unreachable. As such, these commits would be removed by a git gc command on the origin repository.

So for my case, I simply did:

$ git push origin +master
Counting objects: 130, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (90/90), done.
Writing objects: 100% (93/93), 12.24 KiB, done.
Total 93 (delta 53), reused 0 (delta 0)
To git@github.com:LA-Rubyists/depot.git
 + a274fc3...b4da34b master -> master (forced update)


Other


cnk@ugcs.caltech.edu