Skip to content

Scale of reasoning

The two halves of pgRDF scale differently, and the docs say so plainly:

Ingest is parallel and scales to billions of triples. Reasoning is single-threaded — so you reason over a graph sized to your hardware.

Why reasoning is single-threaded

pgrdf.materialize runs the reasonable reasoner in-process, on a single core. Forward-chaining the OWL 2 RL / RDFS closure to a fixpoint is CPU-bound, and the upstream reasoner exposes a single-threaded fixpoint — there is no multi-core path to hand it. This is an upstream limit, not a pgRDF defect: it is tracked openly in #1, with a parallelism proposal filed upstream as gtfierro/reasonable#57.

The consequence for the operating model is direct. The native staged bulk loader ingests the full 8.2-billion-triple Wikidata truthy graph into a single PostgreSQL instance — parallel, all-cores, resumable. But materialize is not for that graph. Reasoning runs on a graph your hardware can close in your batch window, and going forward over a carved slice of a larger graph rather than the whole of it (roadmap).

The proven reasoning regime

Two reference points bound what "right-sized" means in practice. Both are correctness-gated against the known LUBM answer counts.

LUBM-100 on a laptop — zero tuning

The full LUBM-100 benchmark (100 universities, 14 reference queries) runs on ordinary hardware with no database tuning at all:

MeasuredResult
Load 13,879,970 triples (Turtle)3 min 29 s
OWL 2 RL reasoning → 22.5M facts, statistics refreshed automatically4 min 54 s
All 14 queries on the loaded grapheach ≤ 3 s
All 14 queries after reasoningeach ≤ 5 s

Environment: a laptop — Apple-silicon VM (8 vCPU / 32 GiB), stock postgres:17.4-bookworm in Docker, default PostgreSQL configuration. No manual indexes, no ANALYZE, no planner hints, no extension settings.

LUBM-500 on a single box — a 112-million-quad closure

The same load → index → OWL-RL materialize → SPARQL cycle runs across the full LUBM ladder on one 32-vCPU / 256 GiB box (native PostgreSQL 17, no sharding):

LUBM-Nbase triplesmaterialize (OWL-RL)total quads (closure)
101.32M15s2.13M
10013.9M4m 37s22.46M
25034.5M10m 9s55.88M
50069.1M~43m111.83M

LUBM-500 builds a full materialized closure of 111.8 million quads on a single box (peak 146 / 256 GiB RAM) — load, reason, and query in one PostgreSQL instance. The materialize column is the dominant cost at scale precisely because it is the single-threaded step; ingest and index scale with cores, reasoning does not.

How to right-size your reasoning

  • Reason over a slice, not the firehose. Materialize the graph (or the carved slice) your batch window can close, then SPARQL-query the inferred + base rows flat.
  • Bound the rule set. The 'rdfs' profile computes only the schema closures and is cheaper than full OWL 2 RL when your workload doesn't need the equivalence / inverse / transitive / sameAs rules.
  • Run off-hours, idempotently. materialize is idempotent and refreshes planner statistics on write-back, so it slots into a nightly cron or an ingest-tail trigger with no special-casing.

See also

MIT licensed. Documentation for pgRDF — built with VitePress, served via GitHub Pages.