Using Git and GitHub for collaboration¶
Overview¶
This lesson show you how to submit and work on homework assignments using git version control and the GitHub collaboration platform. Assignments are to be worked on as part of the GitHub FutureSystems organization.
Upon completion of this lesson you will be able to use GitHub for submitting assignments for your course.
In order to use GitHub you will need (detailed below):
- a GitHub account
- to form a group with other students (by Week 10)
- a FutureSystems account (see Getting a FutureSystems account)
- an SSH key
Getting a GitHub account¶
Go to GitHub and sign up for an account.
This prerequisite is satisfied if you are able to
- go to https://github.com
- sign in
Form a Group and Identify a Project (By Week 10)¶
You need to form a group with other students and determine a project to work on together by the 10th week of the class. If you have difficulty on deciding a project please contact the instructors. The instructors need to approve your project so please send them a report detailing:
- the names and email address of group members
- a project title
- the goal of the project
- a list of all deliverable
- assignments of tasks to group members
This prerequisite is satisfied if you and other students have formed a group. .. and your instructor has approved the project.
Obtain a FutureSystems account¶
As all your work will be completed on FutureSystems, you will need a FutureSystems account in order to access and user resources. Go to the FutureSystems portal and request an account if you do not yet have one. Then, you must request to be added to the course FutureSystems project. Finally, you must upload an SSH key. Please see the FutureSystems documentation for details on requesting an account. If you have trouble uploading an SSH Key please first consult documentation on how to upload an SSH Key before contact support.
This prerequisite is satisfied if you are able to accomplish the following:
- log into https://portal.futuresystems.org
- go to the
Portal Account
tab - the
status
row is all green in the account status section
Have an SSH key¶
You will need an SSH key to use both GitHub and FutureSystems. If you have followed the documentation for creating a FutureSystems account you should have created an SSH key in the process.
The prerequisite is satisfied if:
- You can log onto
india.futuresystems.org
If you have created a key with a default name:
- The
~/.ssh/id_rsa
and~/.ssh/id_rsa.pub
files exist
If you have not satisfied this section, please see the documentation for details on how to do so.
Using git on india.futuresystems.org¶
To use git on india.futuresystems.org, Try:
module load git
A sample output looks like:
[albert@i136 ~]$ module load git
git version 2.2.1 loaded
[albert@i136 ~]$ git
usage: git [--version] [--help] [-C <path>] [-c name=value]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
<command> [<args>]
...
Add your SSH Public key to GitHub¶
Now that you have an SSH keypair, you need to upload your public key to GitHub in order to access your repository with your SSH key.
To add this key to GitHub, first copy your public SSH key string. This is a same key string that you registered on portal.futuresystems.org. For example, you can view the key by executing the following command:
$ cat ~/.ssh/id_rsa.pub
You may see something like the following:
ssh-rsa AAA....... albert@gmail.com
Copy this public key by selecting it and right-click -> Copy
.
Important
This must be your public key.
Make sure you get the contents of id_rsa.pub
and not
id_rsa
.
Next go to your GitHub account SSH keys and click Add SSH key
on the top right.
You will need to provide a title and the key.
It is a good idea to use your name and course number in the title,
for example, Albert has Albert I590 12388
.
Next paste the key into the Key
field and click Add key
at the bottom.
This section is successfully completed if your GitHub account SSH keys lists the key you provided with the title and a fingerprint such as:
d8:c3:dd:c8:2f:98:11:ca:[...]
The fingerprint should be same with one on portal.futuresystems.org if you used a same public key.
New Personal Repository¶
https://github.com/new allows you create a new repository on github.com.
Configuring your Git Identity (git config
)¶
You will need to configure git in order to use it properly. The following are required:
- your name
- your email address
- your SSH keys (
id_rsa
andid_rsa.pub
)
Note
We will use id_rsa and id_rsa.pub filenames to indicate a private and a public key in this lesson. You may have different filenames.
Note
In order for git to function properly you will need to repeat the configuration steps for each machine you use git on.
Ada would configure your name and email like so:
$ git config --global user.name "Ada Lovelace"
$ git config --global user.email lovelace@gmail.com
Additionally, you can configure an editor such as nano
,
emacs
, or vi
.
Ada will use nano
:
$ git config --global core.editor nano
Once you have done so you should have a ~/.gitconfig
file.
You can check that this file exists and that it contains the correct
information:
$ cat ~/.gitconfig
[core]
editor = nano
[user]
name = Ada Lovelace
email = lovelace@gmail.com
Initializing the Repository with git clone
¶
Once you have access to a repository you should use it to work on assignments.
You may use your local machine or FutureSystems account via
india.futuresystems.org
. For instance, if your account name on
FutureSystems is albert
:
ssh albert@india.futuresystems.org
Once you have your repository URL
(for example: git@github.com:futuresystems/class-bigdata-technology-spring-2016-ABCDE.git
)
you can download the repository like so:
git clone git@github.com:futuresystems/class-bigdata-technology-spring-2016-ABCDE.git
cd class-bigdata-technology-spring-2016-ABCDE
Using the Repository¶
Now that you have an initialized repository you may use it for your assignments.
This section describes how to create and modify documents using git to track and share the changes among collaborators. Upon completion you will know how to do the following:
add
-ing files to gitcommit
-ing changespush
-ing changespull
-ing changes- resolving conflicts
Adding content to git (git add
, git commit
, git status
)¶
Now that you have a repository in your account on india
let us
create some content and notify git that changes to this content needs
to be tracked.
Tracking content makes it easy to share changes among collaborators,
track precisely who made a change, what was changed, when something
changed, and why a change was made.
The commands we are using in this section are:
git add
git commit
git status
The concepts are:
- untracked content
- staging area
- tracked content
- what a change means in git terminology
First let us create a file called fist.txt
and write some lines:
$ nano fish.txt # open the file in the "nano" editor
$ cat fish.txt # after saving, show the contents of the file
One fish
Two fish
Red fish
Blue fish
At this stage the file exists but git is not tracking changes made. If it were to be deleted then it is gone for good.
We can inspect the status of git using the git status
command:
$ git status
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
fish.txt
nothing added to commit but untracked files present (use "git add" to track)
There is a lot of information here but the key pertinent point is the
Untracked files
heading which lists all files that git sees but
whose changes are not being tracked.
There is also the helpful hint use "git add <file>..."
indicating
a possible next step.
Let us do so:
$ git add fish.txt
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fish.txt
In order to understand what git add
does, we need to know the
difference between each of the three states that content may be in:
- untracked
- staging
- tracked
When the fish.txt
file was created the content was untracked.
That is, any modifications to fish.txt
will not be logged.
If it is deleted it cannot be recovered, it cannot be shared using
git, and we cannot extract the who, what, when, and why
metadata associated with a change.
By using git add
content can be added to the staging area.
Multiple files can be staged.
Hypothetically, if two other files hello.txt
and world.txt
were to be created they could be staged:
$ git status
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
fish.txt
hello.txt
world.txt
nothing added to commit but untracked files present (use "git add" to track)
$ git add hello.txt
$ git add hello.txt
$ git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fish.txt
new file: hello.txt
new file: world.txt
By using the staging area multiple files can be committed to git as a single change. Meaning: a change is the addition, deletion, of modification of content of one or more files.
At this point, ignoring the hypothetical hello.txt
and world.txt
files, we can now commit this change:
$ git commit -m "added counting fish"
The git commit
command recording everything in the staging area
as a single change.
When committing a change it is necessary to add a message describing
the change.
The change itself stores the what (what content changed), and
when (time and date of a change), but you must provide a
message that describes why a change was made.
This message is then stored with the change and can be viewed by
looking at the history of the repository.
You can now see for yourself that git no longer sees any untracked content:
$ git status
On branch master
nothing to commit, working directory clean
At this point you have used the git add
, git commit
, and
git status
commands and should know the difference between the
untracked
, staging area
, and tracked
states that content
may be in, and understand what is meant by a “change.”
Viewing Repository History (git show
, git log
)¶
Recall that a git “change” refers to who made a change, what what changed, when a change was made, and why a change was made. Each change is added to the others so that you can view the entire history, each change on top of its parent, of a repository.
Try it out using git show
to view the contents of a commit:
$ git show
commit 05b162b8e7ffe5eb8dda8822a691244a26ff2c0e
Author: Ada Lovelace <lovelace@gmail.com>
Date: Wed Feb 25 12:40:20 2016 -0500
added counting fish
diff --git a/fish.txt b/fish.txt
new file mode 100644
index 0000000..77a5fea
--- /dev/null
+++ b/fish.txt
@@ -0,0 +1,4 @@
+One fish
+Two fish
+Red fish
+Blue fish
As you can see there is a lot of information here. The pertinent points are:
- who: the author name and email address is provided
- what: you can see the exact change at the bottom
- when: the date of the commit is given
- why: the commit message you provide is given
Additionally, you can see an overview containing the commit author,
date, and message using git log
to show the history.
In this case there has only been one commit so that is all that will
be shown.
However, please try this out again later after making further commits.
$ git log
commit 05b162b8e7ffe5eb8dda8822a691244a26ff2c0e
Author: Ada Lovelace <lovelace@gmail.com>
Date: Wed Feb 25 12:40:20 2016 -0500
added counting fish
Sharing your changes via GitHub (git push
, git pull
)¶
This section describes how to share you changes using git and GitHub. The commands covered are:
git push
git pull
By the end of this section you will understand the difference between a local and remote repository and how to share changes made locally via a remote repository.
Recall that earlier you initialized a repository using the git clone
command.
Let us look in further detail at what this command does.
First, you logged into india@futuresystems.org
.
At this point, your git repository was not on india
.
By executing the git clone
command you created a local copy on
india
of the remote repository hosted on the GitHub server.
At this point there are two repositories: local and remote
(also known as origin
).
You can inspect this for yourself.:
$ cd class-bigdata-technology-spring-2016-ABCDE
$ git remote -v
origin git@github.com:futresystems/class-bigdata-technology-spring-2016-ABCDE.git (fetch)
origin git@github.com:futresystems/class-bigdata-technology-spring-2016-ABCDE.git (push)
Here, origin
is the shorthand name referring the the location
of the remote repository that this local one was created
from.
Important
This means that ANY changes added via git commit
are only
committed to the local repository.
These changes are NOT YET present at the remote (origin
).
In order to share your commits with the remote repository, you
must push
them.
Like so:
$ git push origin master
Let’s break this down a bit.
The first part is git push
, meaning that we are telling git
to share our local changes with a remote repository.
Now let us examine the origin
and master
parts of the command.
Recall the output of git remote -v
and git status
after our
commit earlier.
The git remote
command provides us with the name associated
with the remote repository, namely origin
.
From git status
, we get On branch master
.
A repository can have multiple branches with different names
such as (release-2.0
, dev1.3
, etc).
This is beyond the scope of this lesson, but it suffices to say
that all our commits so far have been to the default branch which is
called master
.
Let us look at the command again:
$ git push origin master
Translated into English, this says: “push the changes made to the current
branch to the master branch of the repository called origin
”.
In other words, git push
updates the remote repository with all
local changes.
At this point, the remote repository reflects the changes made by Ada.
Now, Albert had previously cloned the repository at the same time as
Ada, since they are working together.
Since he cloned it before Ada push
-ed her commits, his repository
is out of date.
However, Ada can now tell Albert that she made some change:
Ada: Hi Albert. I pushed some changes to the repo.
Albert: Thanks Ada. I’ll pull them right away.
Albert can then do the following:
$ cd class-bigdata-technology-spring-2016-ABCDE
$ git pull origin master
Albert now has all the changes Ada made.
Important
Only by using git push
will your GitHub repository be updated.
If you are trying to share your changes but your team-members cannot
see them, make sure to git push origin remote
.
Concurrent Changes¶
One feature of git is that is allows multiple people to work on the same repository concurrently.
For instance, while Ada was adding the fish.txt
file, Albert may
have been writing about eggs.
He would have cloned the repository, just like Ada, but added
eggs.txt
instead:
$ nano eggs.txt
$ cat eggs.txt
Do you like green eggs and ham?
I do not like them, Sam-I-am.
As Ada did, Albert would add
and commit
the change:
$ git add eggs.txt
$ git commit -m "added green eggs and ham"
Now, when he pulls the changes that Ada made he sees that both
eggs.txt
and fish.txt
are present in his local repository:
$ ls
eggs.txt fish.txt
He can share his changes with Ada in the same fashion:
Albert: Hi Ada. I pushed my changes.
Ada: Great. I’ll pull them now.
Now, Ada does the pull
-ing and sees Albert’s changes:
$ git pull origin master
$ ls
eggs.txt fish.txt
Exercise¶
The goal of this exercise is for you and your team to become familiar
with push
-ing and pull
-ing to and from your repository.
Each person should log into india@futuresystems.org
and clone the
repository from GitHub.
Next, each person should create a file <portalname>.txt
in which
they explain what the following commands do:
git clone
git add
git commit
git push
git pull
Additionally, this file should describe the difference between a remote and local repository.
For example, Ada would create lovelace.txt
and Albert emc2.txt
.
Finally, each person should synchronize their changes with everyone else so that each team member has the other team member’s file. This synchronization should be done with git such that the GitHub repository also has these changes.
Be aware that individual participation counts. Each team member must contribute their file and this file must be unique. Please recall that git tracks who made a contribution and exactly what that change was.