storagePillar 1 — Semantic storage
RDF storage in PostgreSQL. Turtle goes in; dictionary-encoded, hexastore-indexed, LIST-partitioned quads come out — addressable from any Postgres client.
v0.6 headline — the native staged bulk loader
v0.6 adds a native, multi-backend staged bulk loader — a background-worker pipeline that commits per phase (STAGE → DICT → RESOLVE → INDEX) and is resumable on failure. It loads the full 8.2-billion-triple Wikidata truthy graph into a single PostgreSQL instance, dictionary-encoded with a complete SPO/POS/OSP hexastore, and self-tunes down to ordinary hardware so the same load works out-of-the-box on stock PostgreSQL.
Features in this pillar
- description Load Turtle from disk — one UDF reads any
.ttl/.ntfile off the server filesystem and ingests it; on preloaded servers it auto-selects the staged path for N-Triples. - bolt Native staged bulk loader —
load_turtle_staged_run: a commit-per-phase, resumable, background-worker pipeline that loads the full 8.2 B graph. - description Inline Turtle / TriG / N-Quads ingest — same parser family, no filesystem dependency;
parse_trig/parse_nquadsfor quad-bearing serialisations. - query_stats Verbose ingest statistics — JSONB report of timing, cache hits, batch counts.
- storage Per-graph LIST partitions — cheap whole-graph drops, isolated namespaces.
- account_tree Named graphs (IRI ↔ id mapping) — symmetric IRI lookup for graph-scoped SPARQL.
- account_tree Hexastore + dictionary — three covering indexes (SPO/POS/OSP), interned terms, proven at billion scale.
- description Term types — typed literals, language tags, blank nodes, RDF collections.
- bolt Bulk ingest — the loader family: prepared
INSERT, parallelbulk_load, streaming windows, staged pipeline. - bolt Shared-memory dictionary cache — cross-backend hot path for repeated IRIs.
- build Graph lifecycle UDFs —
drop_graph,clear_graph,copy_graph,move_graphas partition-level primitives.
At a glance
sql
-- One-time
CREATE EXTENSION pgrdf;
SELECT pgrdf.add_graph(100);
-- Load
SELECT pgrdf.load_turtle('/fixtures/foaf.ttl', 100);
-- → 631
-- Inspect
SELECT pgrdf.count_quads(100);
SELECT * FROM pgrdf._pgrdf_dictionary
WHERE term_type = 1 LIMIT 5;Next — Load Turtle from disk →
auto_storiesTraining
A recommended path through Pillar 1 — read in this order and each page builds on the previous one:
- description Start with Load Turtle from disk — the single UDF call that takes you from "file on disk" to "queryable quads". The simplest end-to-end win.
- storage Then Per-graph LIST partitions — the partition-per-graph model is the structural choice everything else hangs off. Understand it before going further.
- account_tree Then Named graphs (IRI ↔ id) → Hexastore + dictionary — how graphs are addressed and how terms are stored. The two pillars of cheap storage and fast lookup.
- description Then Term types — datatypes, language tags, blank nodes, RDF lists. The detail you'll need before any FILTER or aggregate.
- bolt Then Bulk ingest → Native staged bulk loader + Shared-memory dictionary cache — performance characteristics from ontology-scale loads up to the full 8.2 B graph.
- build Finish with Graph lifecycle UDFs — once you understand the storage model, the lifecycle operations are partition-level primitives.
Learn more
- info Refresh on RDF foundations with the RDF 1.1 Primer.
- description The RDF 1.1 Turtle spec — what
parse_turtleis implementing. - code Postgres internals — partitioning chapter of the PG manual.