Core concepts#

In the following, we present a few core concepts that should help in understanding the BSFS operations and codebase.

Graph storage#

RDF describes a network or graph like the file graph as a set of (subject, predicate, object) triples. Subject is the identifier of the source node, object is the identifier of the target node (or a literal value), and predicate is the type of relation between the source node and the target. As suggested by RDF, we use URIs to identify nodes and predicates. For example, a triple that assigns me as the author of a file could look like this:

<http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> <http://example.com/me>

Note that alternatively, the object could also be a literal value (“me”):

<http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> "me"

There are a number of graph databases that support this or an analoguous paradigm, such as RDFLib, Blazegraph, TypeDB, Titan, and many more. BSFS uses such a third-party graph database to store its file graph.

As usual in database systems, we have to distinguish schema data (that coverns the structure of the storage) from instance data (the actual database content). Similar to relational database systems, both kinds of data can be represented as triples, and subsequently stored within the same graph storage (although one might need to separate them logically). In BSFS, we employ an explicit schema (see next section) that is managed alongside the data.

Schema#

BSFS ensures consistency across multiple distributed client applications by maintaining an explicit schema that governs node types and predicates. Furthermore, exposing the schema allows client to run a number of compatibility and validity checks locally, and a graph database may use the schema to optimize its storage or operations.

In BSFS, the schema is initially provided by the system administrator (usually in the Turtle format) and subsequently stored by the backend. The default schema defines three root types (bsfs:Node, bsfs:Predicate, and bsfs:Literal), and BSFS expects any node, literal, or predicate to be derived from these roots.

For example, a new predicate can be defined like so:

# define some abbreviations
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bsfs: <http://schema.bsfs.io/>
prefix bse: <http://schema.bsfs.io/Entity#>

# define a node type
bsfs:Entity rdfs:subClassOf bsfs:Node .

# define a literal type
xsd:string rdfs:subClassOf bsfs:Literal .

# define a predicate ("author of a node")
bse:author rdfs:subClassOf bsfs:Predicate ;
    rdfs:domain bsfs:Entity ;
    rdfs:range xsd:string .

BSFS checks all requests and rejects queries or operations that violate the schema.

Querying#

BSFS at its core is not much more than a translator from a user query into a graph database query. It operates directly on three abstract syntax trees (AST), to run fetch, search, or sort, queries respectively. By not using an existing query language, we avoid an unnecessary and possibly expensive parsing step. Some routines create an AST internally (e.g., bsfs.graph.graph.Graph.all()), others accept an user-defined AST (e.g., bsfs.graph.graph.Graph.get()). One way or another, the AST is validated against the schema, and access control conditions are added.