Graph Database Cloud Deployment

Overview

In this post, we’ll cover the first few steps towards deploying a graph database in the cloud to facilitate centralized persistence and collaboration. To keep costs down (zero in fact if you haven’t yet exhausted your AWS Free Tier), we’ll deploy a rather scaled down version of the Titan Graph DB that can run on a t2.micro instance. However, just because it can run on such an instance, it should be understood that this is by no means recommended to run a productive Titan server in this manner. Proper production sizing should prevail and this post should be seen as a  rough template that can be scaled to suit bigger needs.

The key takeaways for the deployment are:

  1. Scalable storage and computation
  2. Secure with end-user processes that meaningful and can be enforced

Architecture

The server deployment outlined herein is a very classic one in the sense that it relies on a full-fledged Linux installation as an EC2 node and as such does not leverage containers or lambdas or any other light-weight mechanisms to achieve elastic scalability.

Titan Deployment

Storage

Currently, AWS is less than stellar in that it doesn’t allow for a launch profile to directly attach an instance to an existing block device. This is desirable as with a detached EBS volume backing our database, we could very easily take advantage of Spot instances with an AMI based boot snapshot and a standalone EBS volume where upon instance termination, our database backing store would remain and ready to be mounted the next time we launched a spot instance. Even so, to prepare for this eventuality, we’ll create an EBS volume that we’ll attach to the instance.

In our case, on a RHEL 7 image, an EBS device name of /dev/sdf showed up in linux as /dev/xvdf. While completely optional, we’ll here create a single LVM partition that will give us much flexibility later on when deciding how to divvy up the storage space.

Use fdisk to create a single, primary, partition that spans the device and has device type 8e (Linux LVM). Optionally, you can create a swap partition (type 82) as well as your image by default won’t have this. If you do, of course, this partition needs to be formatted with mkswap, used with swapon and mounted in /etc/fstab

# fdisk /dev/xvdf

In our LVM, we want one really small (e.g. 64MB) Logical Volume (LV) for the Titan data directory as that’s where we’ll keep our credentials graph. The second LV is the bigger one (e.g. 2GB) where the database files will be stored. The reason these are so small despite that the free tier covers up to 30GB is that the free tier eventually runs out and you don’t want to allocate more than you need. LVM makes it easy to grow these as needed. The LV creation follows these steps, presuming /dev/xvdf1 is your LVM partition:

# pvcreate /dev/xvdf1
# vgcreate vg /dev/xvdf1
# lvcreate -L 64M -n tdata vg
# lvcreate -L 2G -n dbdata vg
# mkfs.xfs /dev/mapper/vg-tdata
# mkfs.xfs /dev/mapper/vg-dbdata
# lsblk

The last command lists the device hierarchy.

OpenVPN

This post is not an in-depth elaboration of how to setup an OpenVPN server. There are maintained references for that. Instead, we’ll cover specific choices made when setting it up.

Specifically, we installed OpenVPN from the official RPM, put keys under /etc/openvpn/keys, i.e. the server’s key and certificate, the CA’s certificate and the DH .pem file. The configuration file is /etc/openvpn/server.conf in which we kept the 1194/udp port and protocol. For its simplicity and performance, we use a routed configuration (as opposed to bridged).

Choose a particular subnet and netmask at the server tag. E.g.

server 10.33.44.0 255.255.255.0

Your OpenVPN server will assign IP address 10.33.44.1, which is the address your client application, i.e. those connecting to Titan once the VPN tunnel is up and running, will be using.

From a security point of view, TLS-Auth and running under an unprivileged account is highly recommended as we need to enable world-wide (0.0.0.0/0) access to the VPN server in our AWS Security Group and as such need to arm ourselves against DoS attacks.

Open the requisite firewall port:

# firewall-cmd --zone-public --add-port=1194/udp
# firewall-cmd --zone=public --add-port=1194/udp --permanent

Generate client certificates either directly (including the .key file that then must be transferred over a secure line) or from certificate requests (.csr files).

Once you’ve confirmed that your setup is working, the final piece is to enable the systemd services file that the OpenVPN RPM provided. The .service file provided is a template file so that when you enable it, make sure that the name provided matches the configuration file, in our case server.conf. Hence:

# systemctl enable openvpn@server.service
# systemctl start openvpn@server.service

Titan

We have covered Titan deployments in previous posts and won’t be going into the same level of detail here. In sum, though, we want Titan installed under /var/lib/titan, running under a dedicated account (also named titan). We will be using the titan account to also run the console when necessary, so it’s important that it’s given a login shell (bash) and that it has a home directory. The Gremlin Console insists on writing user preferences and will throw an exception if it’s not there. Alternatively, create the titan user without a shell and home directory and add the titan group to the user account you’ll be running the Gremlin Console with.

Next, we’ll make titan use the LVs we have created. Basically:

# cd /var/lib/titan
# mkdir db
# mount /dev/mapper/vg-dbdata db
# mkdir db/berkeley
# mkdir db/es
# mv data data.old
# mkdir data
# mount /dev/mapper/vg-tdata data
# mv data.old/* data/
# rmdir data.old

Now the key Titan directories are using the EBS partitions. Update /etc/fstab to mount these on boot as e.g.

/dev/mapper/vg-dbata /var/lib/titan/db xfs defaults,noexec 0 2
/dev/mapper/vg-tdata /var/lib/titan/data xfs defaults,noexec 0 2

Next, we’ll copy and modify the following files and make changes such that Titan will use the BerkeleyDB location we specified.

  1. Copy conf/remote.yaml and change the IP to the one Titan listens on.
  2. Copy conf/titan-berkeleyje-es.properties and change the paths to /var/lib/titan/db/berkeley and /var/lib/titan/db/es. Also add the following entry at the bottom:
    gremlin.graph=com.thinkaurelius.titan.core.TitanFactory
  3. Copy conf/gremlin-server/gremlin-server.yaml and change the IP address and the path to the .properties file from the previous step.

Since we’re done with the file config, make the titan account own the install.

# cd /var/lib
# chown -R titan:titan titan

At this point it makes sense to ensure that Titan starts without errors and that as user titan, you can launch the Gremlin Console and connect to the Titan server.

Once that’s complete, it’s time for the final piece; to create a .services file so that systemd launches it on system boot. Start by changing to the system directory for systemd:

# cd /usr/lib/systemd/system

In there, create a file, named e.g. titan.service and tailor the contents to match the name of the server configuration file created above. E.g.

# /usr/lib/systemd/system/titan.service

[Unit]
Description=Titan
After=openvpn@server.service

[Service]
User=titan
Group=titan
WorkingDirectory=/var/lib/titan
ExecStart=/var/lib/titan/bin/gremlin-server.sh conf/gremlin-server/my-server.yaml
Restart=always

[Install]
WantedBy=multi-user.target

Note the dependence on the OpenVPN service launched earlier. This is required as IP address the Titan server listens to is on the tun0 network device created by OpenVPN and it would not do to start Titan before the OpenVPN service has completely started. Enable and start the Titan service:

# systemctl enable titan.service
# systemctl start titan.service

As we’re also running firewalld, open up the port to allow our clients access to the Titan server:

# firewall-cmd --zone-public --add-port=8182/tcp
# firewall-cmd --zone=public --add-port=8182/tcp --permanent

Process

Our server is now self-sustaining in the sense that its storage is organized and services setup. You can reboot, stop and start and even create AMIs from it that can be launched and it will come up properly.

In order to manage new users we need a few simple processes:

  1. Sign certificates from certificate requests. This is simple to do securely as it does not involve the transference of private keys. However, if you have opted for the TLS-Auth option (and it’s recommended that you do), that pre-shared key needs to be distributed to each client in a secure fashion.
  2. Distribute a pre-configured client.conf (or client.ovpn) configuration file to facilitate a hassle-free connection experience.
  3. Once the clients’ VPN tunnels are running, they should connect to Titan via the OpenVPN assigned address, which in our case was 10.33.44.1.
  4. If your Titan server requires authentication, define these credentials in the credentials graph and restart Titan.