Shared-memory dictionary cache
The dictionary
id ↔ termmapping is mirrored in PostgreSQL shared memory across backends. Repeated ingest of the same IRIs becomes a memory lookup, not a SQL roundtrip.
What it does
Every ingest call must resolve each parsed term to its _pgrdf_dictionary id (or insert it). A naïve implementation issues a SELECT-then-INSERT per term. The shared-memory cache:
- Mirrors committed dictionary rows in a fixed-size LRU table allocated in Postgres shared memory at extension init.
- Serves hits without touching the underlying SQL table.
- Stages new terms in a per-transaction buffer; on COMMIT they're promoted to the shared cache for other backends.
The same IRIs (rdf:type, rdfs:subClassOf, FOAF predicates) appear thousands of times in any real ontology — cache hit rates of 95 %+ are typical after the first load.
Why you'd use it
- Project managers — throughput scales with cache hit rate. The verbose-ingest report shows the rate empirically; capacity planning is straightforward.
- Data scientists — loading multiple ontologies that share vocabulary terms (FOAF, RDFS, OWL) is dramatically faster after the first one warms the cache.
- Operators —
pgrdf.shmem_reset()provides explicit invalidation around schema/extension upgrades.
Example
-- Inspect cache size and hit/miss counters.
SELECT pgrdf.stats() -> 'shmem_dict_cache';
-- → {
-- "size_entries": 12480,
-- "capacity": 65536,
-- "lookups_total": 189342,
-- "hits": 181004,
-- "misses": 8338,
-- "hit_rate": 0.9560
-- }
-- Force invalidation (rarely needed; used by migration scripts).
SELECT pgrdf.shmem_reset();How it works
The cache lives in a LWLockable hash table sized at extension startup. Insert path:
- Parser produces a term.
- Cache lookup. Hit → use the cached id.
- Miss → consult
_pgrdf_dictionary; if absent, allocate a new row in the transaction-local stage buffer. - On
COMMIT, the staged buffer is inserted into_pgrdf_dictionaryand promoted into the shared cache.
This makes the dictionary update visible to other backends only once the inserting transaction commits — preserving Postgres' read-committed semantics for graph workloads.
See SPEC.pgRDF.LLD.v0.4 §4.1 for the full design.
Tests
tests/regression/sql/50-shmem-dict-cache.sql— hit-rate growth across repeated loads.tests/regression/sql/63-shmem-reset-invalidation.sql—shmem_resetclears counters and entries.