Skip to content

query_statsScale & benchmarks

The native staged bulk loader takes the complete 8.2-billion-triple Wikidata truthy graph into a single PostgreSQL instance — dictionary-encoded, full SPO/POS/OSP hexastore, correctness-gated, 0 triples dropped. This is a ceiling on big hardware; the same loader self-tunes down to ordinary hardware out-of-the-box.

The 8.2-billion-triple load

The staged loader ingests the complete Wikidata truthy N-Triples dump. Every figure below is measured, not modelled, and the load is correctness-gated (quads == triples):

MeasuredResult
Triples loaded8,199,708,346 (0 dropped)
Distinct dictionary terms1,801,847,593
On-disk size~2.0 TB — heap 729 GB + indexes 1448 GB
Indexesfull SPO / POS / OSP hexastore
Exact literal dedup"Berlin" preserved across 268 distinct languages — keyed on full (value, datatype, language) identity, never collapsed

The exact-dedup row is the correctness keystone at scale: "Berlin"@en, "Berlin"@de, "Berlin"@fr and 265 more are distinct dictionary terms. Deduplication is on the full (value, datatype, language) key, so the dictionary never silently collapses a language-tagged literal into a bare string.

Hosts and rates

HostCores / RAMEngineIngestRate
Azure E128ads_v7128 vCPU / 1 TiBv0.6.144 h 53 m466 K triples/s
Azure E64ads_v764 vCPU / 503 GiB · 3.4 TB diskv0.6.14~10.3 h~221 K triples/s

The 128-core run is the flagship. The 64-core run proves the same full 8.2-billion-triple load completes out-of-the-box on half the cores and a 3.4 TB disk — the loader self-tunes to the host, using the index resolve strategy and temp-spill routing to finish without running out of disk where an all-hash resolve would have spilled multi-TB.

Phase breakdown — E128, v0.6.14

The pipeline commits per phase (STAGE → DICT → RESOLVE → INDEX), so each phase is independently timed:

PhaseTimeWhat it does
STAGE14 minParse the N-Triples dump into an UNLOGGED staging table; the COPY fans across the worker pool.
DICT1 h 51 mParallel hash-aggregate dedup of the 1.8 B distinct terms; spills to disk so dictionary RAM stays bounded.
RESOLVE2 h 00 mJoin staged triples against the dictionary into encoded quads (index strategy).
INDEX32 minRebuild the SPO/POS/OSP hexastore, one worker per DDL — the index builds run concurrently.

37 % faster than the v0.6.13 baseline

The v0.6.14 flagship run is 37 % faster than the v0.6.13 hash baseline of 6 h 41 m / 340.7 K triples/s:

v0.6.13 (hash baseline)v0.6.14Delta
Total ingest6 h 41 m / 340.7 K tps4 h 53 m / 466 K tps−37 %
STAGE COPY1 h 41 m14 minparallel COPY
INDEX rebuild1 h 43 m32 minconcurrent index build

The gain comes from the parallel STAGE COPY (fanning the COPY across the worker pool instead of a single stream) and the concurrent index build (one worker per DDL rather than serial CREATE INDEX).

A ceiling, not a requirement

Treat the 8.2 B graph as a ceiling on big hardware — proof of what a single PostgreSQL box can hold, not a hardware requirement. The loader self-tunes down to ordinary hardware: pass n_workers => 0 (the default) and it sizes the worker pool, the resolve strategy, and the temp-spill routing to the host it finds. The same load_turtle / load_turtle_staged_run call that fills a 128-core box also ingests a laptop-sized graph on stock PostgreSQL.

Raw ingest — not reasoning

The 8.2 B figure is raw ingest at scale, not reasoning. truthy statements are already-asserted direct claims, so there is nothing to infer. Reasoning is single-threaded and runs over a right-sized graph — see Processes & flows for the operating model and Pillar 3 · Inference for the full load → reason → query LUBM pipeline.

See also

  • bolt Native staged bulk loader — the STAGE → DICT → RESOLVE → INDEX pipeline behind these numbers.
  • account_tree Processes & flows — the operating model: parallel ingest, single-threaded reasoning over a carved slice.
  • rocket_launch Roadmap — graph carving and per-graph index reorganization toward v0.7.0.

MIT licensed. Documentation for pgRDF — built with VitePress, served via GitHub Pages.