Version Control

Version control is a huge topic and I won’t even attempt to do the rationale for it justice. Suffice to say, a state of the art source control repository is an essential piece of partner software that will keep information, configuration, code etc. properly versioned and, if so desired, open to collaboration.

The version control of choice here is git. Why? It is incredibly efficient, heavily used throughout the Open Source Software community and it’s architected at its core to function while detached from remote repositories.

The Pro Git book is published online and freely available and is in my mind the git resource and I rarely go elsewhere for git info.

So, what are we trying to do here? As I have many VMware images typically running various versions of Linux and running a purely local git repositories on my trusty laptop is not sufficiently redundant to suit my tastes, I need a server that hosts my git remote instance. For this purpose, I’m using my Synology NAS 713+ machine. Not a spectacular piece of hardware, but it’s got redundant drives and performs well enough for this purpose.

The net result I’m after here is that I have a remote repository that I can clone in any of my ‘worker’ images so that I can get reliable access to configuration commands, code etc. without having to resort to shared folder access.

As I have VPN access to my own network when outside of it, I strictly speaking won’t need to forward any ports when working with the remote repository outside my firewalls, but because I enable SSH with certificate authentication only for accessing said repositories, it presents a light-weight options for working truly remotely with the repository.

Basic Configuration

First off, the NAS needs to be configured with a share, optionally a dedicated account and appropriate security definition. To keep things simple as my client account name matches that in my NAS, I’ve opted to not setup a distinct account for access the remote git repository. Again, this is the trivial alternative and people looking for configurations more conducive to robust collaboration, will find plenty of resources outlining their options in detail.
In the steps below, I’ve artificially altered the shell prompt ot indicate where the commands are executed, client or NAS, client$ or nas$.

Authorize your client account to log on to the NAS via SSH without having to specify a password. You do this by adding your client account’s public key to the NAS account’s authorized_keys file. Generate a key-pair with ssh-keygen only if you don’t have a key pair already. The NAS’s hostname is set to ‘syno’ in /etc/hosts

client$ ssh-keygen
client$ scp ~/.ssh/id_rsa.pub syno:
client$ ssh syno
nas$ cat id_rsa.pub >> ~/.ssh/authorized_keys
nas$ rm id_rsa.pub

While you should have been prompted for you NAS account’s password the first time, with the key now be ‘authorized’, an attempt to ssh syno should now provide password-less access.

Once we’re setup with password-less access to remote storage, we’re ready to actually push an empty repository. To do that you need to create the repository locally. I’ll create a repository called ‘own’:

client$ mkdir -p git/own && cd git/own
client$ git init

Now that we have an empty repository, we need to clone a ‘bare’ representation of it

client$ cd ..
client$ git clone --bare own own.git

Before pushing this repository to the NAS, to save on typing when cloning and pushing, I setup a server-side symolic link to point to the ‘storage root’ where all the git remote repositories will be added. This is a trivial operation that impacts the account for which SSH access has been setup only. E.g. in my case:

client$ ssh syno
nas$ ln -s /volume1/depot git

Now we can copy this repository to the server using standard SSH vernacular:

client$ scp -r own.git syno:git/

That’s it. Now you can remove the local own and own.git and fetch the server one. Presuming you have git installed and configured in your VM images, cloning your repository is as simple as:

client$ git clone syno:git/own.git

When working with virtual machine images in a development or lab capacity, it’s recommended to snapshot base images with SSH and git not only installed but also configured. It can then be really efficient to spin up new images or revert back to something that was last known to be working without having to go through the setup steps every time. Also, with a base configuration as a snapshot, you avoid polluting your NAS-side authorized_keys files with a high number of distinct public keys, some of which may belong to long-ago discarded VM images.

Reboot

This site has been dormant for years while I’ve been pursuing traditional, corporate ventures. Its previous incarnation was, as was relevant at the time, clearly slanted towards business intelligence consulting.

While consulting itself was a means to an end, the end of which did not signify any termination to the pursuit of making sense of data. Instead, you could say with everyday distractions having moved off to the side, the focus on creating designs that are ‘right’ as opposed the bare minimum is all of a sudden a within the realm of the possible and eminently achievable.

‘Right’ in this sense is right from a business strategic point of view, it’s right presuming the majority of other decisions in an organization are made the right way, too. Just like ‘good enough’ is open to interpretation and as a result has often received a bad rap, ‘right’ when spoken through the mouth of an unabashed software engineer is easily and often construed as overly elaborate, complex and what’s worse, out of touch with business objectives.

Getting back to actually doing what’s right. This site will be a collection of observations and interactions, detailing the thought, rationale and very real obstacles and decision points one is faced with when building analytical systems from the ground up.

While not everybody needs to, or indeed should, know everything, it’s my experience that projects, and in some cases entire companies, fail by virtue of people’s information isolation levels are too high; the silos are too entrenched for a decision-maker to get an unbiased perspective in order to effectively manage the entity in question.

To combat this modern-day malady, I’m advocating an approach that requires that at least a few key people need to have a seemingly impressive insight into a substantial portion of the domain. In the realm of analytical systems, that means that a hypothetical ‘star resource’ (please, let’s find a better moniker for this person) would keenly understand the motivation of the business drivers behind why people are even talking about putting ‘something’ in place to drive operations, provide insight or whatever it is that needs to be done. In addition, however, said person is capable of translating these business problems (if that’s what they are) into a technical domain and can there articulate the problems, potential solutions in the relevant vernacular.

This means that the business-savvy individual understands analytical software systems from the infrastructure level to the applications. E.g. they are familiar with Linux as a deployment architecture on a deep level, they are very fluent in database technologies and query constructs, both as it relates to traditional RDBMSs as well as those in the NoSQL space, they understand messaging infrastructure and integrate data from disparate sources and can tell when it works properly and when it doesn’t. They understand mathematical and statistical modeling and know the relevant languages such as R and Python to not only prototype model implementations, but in fact can directly contribute to the production deployment.

To round it all off, a solid understanding of and rather sincere respect forĀ  project management as a discipline is a significant asset that will keep progress up.

Subsequent posts will show how well (or not) this skill set is articulated using real analytical landscape patterns.