As builders of analytical landscapes, we have to pay attention to a myriad of details. The topic of this post is carved out of a small portion of a huge subject area: Networking. Specifically, we’ll be focusing on the system builder’s machine setup to facilitate access to virtualized nodes running on a developer-type machine. To illustrate this setup we’ll use a very specific configuration. Different configurations will have different strengths and weaknesses but the example below will show the basic concepts.
The overall objective will be to run the graph database Neo4j inside a virtual machine on the host machine. Said database should be accessible to those who can access the host machine, directly or via the network.
The host machine in this case is a 13.3″ MacBook Pro running Yosemite with VMware Fusion 7 Professional. Neo4j will be installed on a 64bit CentOS 7 (the firewall configuration on earlier versions is completely different). To keep this topic focused, we will not cover the installation of any software, but rather stay purely on the network configuration side of things.
Virtualization solutions such as VMware provide network access to the guest images through a software switch, or a set thereof as is the case in VMware specifically. They are:
- Host-only network. As the name implies, on the host itself can see and communicate with guests. Useful e.g. to limit impact of untrusted images. This network is provided through a virtual switch named vmnet1.
- NAT, or Network Address Translation, which is the dominant configuration when dealing with firewalls; images behind the firewall appear as one to those outside the firewall. The firewall in turn makes sure packets are routed back to the correct host for returning/response traffic or forwarded in the case of new in-bound traffic. VMware’s switch for this is called vmnet8.
- Bridged network turns the guests into ‘topological siblings’ of sorts with the host. This means guests route like the host, are on the same subnet and consequently get their IP-address (in the case of DHCP) from the same server as the host etc. This isn’t always exactly the case, but the purposes of this argument, it is.
Which option to choose? Well, requirements disqualify the host-only option, so it stands between NAT and Bridged. Unless there are very specific requirements that require Bridged, err on the side of NAT. This will keep control with you as you have one more layer between your guests and the network you’re on. This works both ways, if you’re on an otherwise untrusted network, you don’t have to harden the firewall of your development guest images that need to communicate. This is a good thing as it stops you from needing to chase down bugs that turned out to be an overzealous firewall configuration. Going the other way, you have one more way of making sure your guest images won’t embarrass you when your dev images cause your client’s network administrator to run screaming that they’re under attack. Lastly, on the topic of performance, why force host-to-guest and guest-to-guest traffic to route outside your host? Unless you need to sniff that traffic on e.g. your router, it’s just a drag on performance for you and those using your network.
We can now reformulate the problem to: with a NAT’d guest, how do we access it from the outside, i.e. on the host and beyond?
The answer to that question was already alluded to above; with port forwarding. The concept is familiar to anybody who’s been playing around in their router’s administration app. Most, if not all, modern routers allow you to open a particular port (or range of ports) on the WAN-side and connect with, i.e. forward to, a host and port (range) combination on the LAN side. Basically the exact same principle applies to the ‘router’ that is vmnet8. This forward rule is specified in the minimalist syntax required by the following file:
Really old versions of VMware Fusion will have a slightly different path, but the idea remains the same. VMware is kind enough to put in examples of such forwarding rules, you just have to be mindful of the section header for TCP or UDP. E.g. for TCP, the following (commented out) example is provided:
[incomingtcp] # Use these with care - anyone can enter into your VM through these... # The format and example are as follows: #<external port number> = <VM's IP address>:<VM's port number> #8080 = 172.16.3.128:80
If uncommented and VMware restarted (not just the guest image, but Fusion-proper), http://localhost:8080 in the browser on the host would be resolved and routed to the NAT’d guest with the IP-address in question. In our case with Neo4j, we’re interest in accessing our guest’s port 7474/tcp, so we add the following line under the commented out line above:
7474 = 172.16.126.143:7474
A few things to note: We specified our guest by IP-address. In the VMware case this doesn’t necessitate a static IP for the guest; VMware’s internal DHCP server for vmnet1 and vmnet8 have address affinity and will give your guest a consistent address. Note also that we didn’t have to choose port 7474 on our host, it could in principle be any available port above 1024, e.g. 17474 and in many cases this is actually recommended, especially when forwarding to a guest’s web server that often listens on ports 80 and 443 that very likely are already taken on your host (or by another guest on your host).
But even so, we’re not necessarily done. Why not? Well, all we’ve really done is put up a sign and made a path, so to speak. There may still be an uncompromising firewall erected by the guest that prevents our access. In our case, since we’re using CentOS 7 (again it’s important it’s 7 as 6.x behaves differently) we’ll be using the rather pleasant firewall-cmd tool. Specifically, we have to do the following:
- Determine the zone for the NIC in question (see below)
sudo firewall-cmd --get-active-zones
- Poke a hole in the firewall for the port in the appropriate zone, in this case: public.
sudo firewall-cmd --zone=public --add-port=7474/tcp
- Make changes permanent once confirmed to be working
sudo firewall-cmd --zone=public --add-port=7474/tcp --permanent
While following to conceptual flow of information, this still is somewhat backwards. Reason being, for us to establish the zone, we need to know the NIC and if we knew which NIC we would quickly realize that Neo4j out of the box does not use that NIC. Instead, very sensibly, Neo4j, unless you tell it otherwise will only listen on the loop-back interface to eliminate the risk that external marauders get access to your freshly minted graph database. So, even though we’ve set up the port forward and opened the firewall, we’ve merely configured a route to a NIC where there’s no Neo4j.
Enter the third and final configuration step; Make Neo4j listen on the NIC you’ve opened 7474/tcp access to. Thankfully, we don’t have to say exactly which IP this is, but by telling Neo4j to listen to 0.0.0.0, we’re effectively telling it that we want a ‘real’ (as real as it gets in the realm of virtual servers) NIC and not loop-back. To convince yourself of this all you need to run is this:
sudo netstat -tulpn
It will show that 7474/tcp is listening on 127.0.0.1, i.e. loopback. Again, your objective is to tell Neo4j to listen on 0.0.0.0. This listen setting is defined in Neo4j’s neo4j-server.properties file. Locate and uncomment the following line:
Save and restart Neo4j. Now, we’re ready to access our guest’s 7474/tcp port and in Neo4j’s case we do so in the browser: http://localhost:7474