Graph DB Undo stack

Overview

This post outlines a relatively simple means of implementing an undo/restore stack to handle accidental deletions of vertices and edges from a graph database. While the ability to restore incorrectly deleted elements give peace of mind even in single-user contexts, it’s when multiple users collaborate on the same graph where this feature becomes really valuable.

Scope

In order to manage the complexity inherent in such operations, we’ll start with a maximum of restrictions which we will loosen incrementally in subsequent posts.

In sum:

  • Only permit deletion of one vertex or edge at a time.
  • Only permit restoration of one element at a time
  • Always restore the top-level (most recently pushed) element off the stack. I.e. all more recently deleted elements must be restored prior to restoring an element.
  • Only permit deletion of orphaned vertices, i.e. those with no in- or outbound edges; zero degrees. This has the effect that edges will always be deleted prior to vertices they are bound do and we’re thus spared from the complexity otherwise stemming from restoring an edge to a deleted vertex.

Environment

Our graph will be hosted in a Neo4j-backed version of 3.1.1 Gremlin server and will thus be accessed through Gremlin queries.

While we only have one ‘physical’ graph (named ‘graph’) with one traverser (named ‘g’), we will for the purposes of showing contextual partitioning introduce the concept of model identifier, which serves as a logical grouping or context designation for each vertex. In our example this designation is manifest through a property named ‘model’, but could just as well (and perhaps better) be implemented as a dedicated edge between the context vertex and each constituent vertex.

We’ll be using the words node and vertex interchangeably.

Conceptual design

The following schematic outlines the undo graph concept. Elements deleted are  highlighted in green and with a 5pt line weight.Undo Concept graph1

 

In this design there is one and only one Undo root vertex linked with one Undo model vertex for each distinct model instance in the graph.

The Undo stack itself is defined by the blue vertices. These vertices are meta-nodes and link the ‘payload’; i.e. element that has been deleted either directly as is the case with deleted vertices or as a label and property copy as is the case with deleted edges.

Operations

Setup

Prior to being able to push deleted elements onto the undo stack, the model instance-centric heads need to be setup. This can be accomplished with a query similar to the following (line-breaks added for readability):

root = g.V().hasLabel('undoRoot').tryNext()
.orElseGet({g.addV('undoRoot').next()}); 

g.V(root).out('model').and(hasLabel('undoModel'), has('name', 'ModelB'))
.tryNext()
.orElseGet({g.V(root).as('r').addV('undoModel')
.property('name', 'ModelB').as('n')
.addE('model').from('r').inV()
.addV('undoNull').addE('next').from('n').outV().next()})

In order to simplify subsequent queries, we’ll be assuming that the ‘undoModel’ labelled vertex has ID 4678 and will reference it directly.

Vertex deletion

In the case with a strict zero-degree constraint on vertex deletion, deletion is effectively a linkage operation, where we create a meta-node that links to the deleted object and push said meta-node onto the restore stack.

If the orphaned vertex we wished to delete had the ID of 94, our Gremlin query could look something like this:

g.V(4678).as('r').addV('undoVertex').as('uv')
.addE('deleted').from(V(94))
.select('uv').addE('next').from('r')
.outV().outE('next')
.match(__.as('e').inV().as('t')).where('t', neq('uv'))
.select('t').addE('next').from('uv')
.select('e').drop()

Let’s break this query down to understand what is going on.

  1. Declare our model root at ‘r’ form which we add an undoVertex node that we call ‘uv’
  2. The ‘uv’ vertex links to the node that we wanted to delete, i.e. with ID 94.
  3. Add a ‘next’ edge from said root to our undoVertex.
  4. For all outbound ‘next’ edges from the newly created edge’s source vertex, i.e. root’s ‘next’ edges
  5. Match all nodes that that are different from the one we just added. There should only ever be one here, which is the previous vertex that we’re pushing one level deeper into the stack.
  6. Add a new ‘next’ edge from our still new ‘uv’ vertex to this previous node.
  7. Drop the edge that linked the root ‘r’ to the previous vertex that ‘uv’ is now responsible for keeping track of.

The following image illustrates our resultant graph assuming no vertices had previously been deleted:

Edge deletion

Edge deletion differs from vertex deletion in that edges are immutable, i.e. Gremlin won’t let you reassign which vertices it connects. As a result, our approach is to copy the edge label and properties into a new edge that connects dummies, or proxy vertices, that serve no other purpose than to assist the restoration process with identifying the true vertices that the logical edge was originally connected to.

In the example below, the Undo Model root ID is 4678 and the ID of the edge to delete is 8341 illustrated by the following sub-graph. These two IDs are the only two IDs that the delete operation needs in order to complete.

Single Edge Deletion

ue = g.addV('undoEdge').as('ue').addE('next').from(V(4678))
.select('ue').next();

oe = g.E().hasId(8341).next();

ne = g.V(ue).as('ue')
.addV('undoProxy').property('proxyID', g.E(oe).outV().id()).as('pt')
.select('ue').addE('tail').from('pt')
.addV('undoProxy').property('proxyID', g.E(oe).inV().id()).as('ph')
.select('ue').addE('head').from('ph')
.select('ph').addE(oe.label()).from('pt').next();

ElementHelper.propertyValueMap(oe).each(ne.&property);

g.V(ue).as('ue').V(4678).as('r').outE('next').union(__.match(__.as('e')
.inV().as('t')).where('t', neq('ue')).select('t')
.addE('next').from('ue').select('e'),g.E(oe)).drop()

The query is split into the following five sub-sections:

  1. Create the undo edge meta node and link it to the root.
  2. Get a reference to the old (to be deleted) edge.
  3. Add proxies linked to the undo edge meta node and add an edge between the two proxies that carries the label of the edge to delete and return said edge.
  4. Copy the property values from the old edge to the new
  5. Complete the insertion of the meta node and drop any old next edges plus the old edge itself.

An empty Undo stack would look like this once the compound operation had completed:

Vertex restoration

When the vertex was deleted, it was an orphan. After it has been restored it will be an orphan anew and subsequent operations beyond the scope of restoration are required to lift its orphan status.

Abiding by the constraint to always pop at the top, we only need the ID of the undoModel vertex (4678 in our examples) to execute the restore.

g.V().hasId(4678).as('r').out('next').as('ou').out('next')
.addE('next').from('r').select('ou').drop()

Edge restoration

The complexity of edge deletion is also reflected in their restoration.

oe = g.V().hasId(4678).out('next').in('head').inE().next();
tid = g.E(oe).inV().values('proxyID').next();
hid = g.E(oe).outV().values('proxyID').next(); 

ne = g.V().hasId(hid).as('h').V().hasId(tid)
.addE(oe.label()).from('h').next(); 

ElementHelper.propertyValueMap(oe).each(ne.&property);

g.V().hasId(4678).as('r').out('next').as('ou').out('next')
.addE('next').from('r').select('ou')
.union(__.as('s'), __.in('tail'), __.in('head')).drop()

The following operations are performed:

  1. Find the old edge, i.e. the clone that resulted from the delete operation
  2. Extract the vertex IDs from the two proxy nodes; t means tail and h means head.
  3. Create a new edge between the nodes in question and set their label from the clone
  4. Copy all edge properties
  5. Remove the undo edge from the stack and delete the constituent objects.

In future posts, we’ll look at more complex dependency control that will afford us better restoration flexibility and also the ability to expire deleted objects from the undo stack.

Published by

Henrik

Data Architect and Integrator Linux Evangelist Developer