What Happens When You Give an LLM Geometric Truth to Reason About
You have a database with 22,000 relationships. Some of them are wrong. How do you find the bad ones?
The standard toolkit gives you schema validation, foreign key constraints, maybe some ML-based anomaly detection if you're feeling ambitious. These work. They catch the obvious stuff — null values, orphaned records, type mismatches.
But what about the facts that are structurally valid yet semantically wrong? The edge that satisfies every constraint but doesn't actually belong in your graph? That's harder. That's the problem we set out to solve.
The idea
We built a system that encodes every relationship in a database as a geometric structure in 4D space. Not a vector embedding in the usual sense — not "turn everything into a 768-dimensional blob and measure cosine similarity." This is relational geometry. Each relationship type gets its own transformation, and every fact in your database becomes a composition of those transformations.
The key insight: if your data is internally consistent, these transformations should compose cleanly. If you follow a path from A to B to C and back to A, you should end up where you started. When you don't — when there's a measurable gap — that gap tells you something. It tells you which facts have strong geometric support from the rest of the graph, and which ones are structurally isolated.
We call that gap a trust score. Every edge in your database gets one.
Testing it
We ran this against the Chinook database — a well-known sample dataset modeling a digital music store. Artists, albums, tracks, playlists, customers, invoices. About 4,600 entities and 22,000 relationships.
The encoder compressed the entire relational structure into a single .ramish file under 1MB. Every relationship encoded, every edge scored. The structural audit flagged 75 disconnected entities and identified a single mega-hub node — both real characteristics of the Chinook schema, not false positives.
We also tested against a database with no declared foreign keys — the kind of messy schema you actually find in production. The encoder identified entities and inferred relational structure from the data itself, no schema hints required. If the geometric relationships were there, it found them.
The accident
We'd published a free open-source reader for .ramish files — a command-line tool that lets you query the encoded structure, check trust scores, validate individual edges, inspect the geometry. It's on PyPI. Anyone can install it.
We gave the reader to a large language model along with the Chinook .ramish file and asked it a question: "Who plays Smoke On The Water?"
Now, the Chinook database doesn't have a direct track-to-artist relationship. Tracks belong to albums. Albums belong to artists. To answer the question, you need to follow a path: find the track, find its album ("Machine Head"), find the album's artist (Deep Purple).
The LLM didn't guess. It didn't search the internet. It opened the .ramish file, looked up the edges for "Smoke On The Water," followed the relational path through the encoded geometry — track to album to artist — and returned the answer in plain English.
It navigated the structure. And because every step traced through the encoded geometry, the answer was provably grounded in the data. No hallucination possible. Every hop verifiable.
Why this matters
This wasn't the plan. We built an encoding engine for database integrity analysis — find the bad edges, score the trust, ship an audit report. That's useful and that's the product.
But what we accidentally created was something different: a format that lets any LLM reason about relational structure without hallucinating. The .ramish file is the ground truth. The LLM is the interpreter. The human gets conversational access to structural relationships without needing to understand the encoding, the schema, or the query language.
Think about what that means for a non-technical user. Your database engineer runs an audit, gets a .ramish file. They use the CLI tool to dig into the geometry, check trust scores, find the weak edges. That's valuable.
But then the CFO asks a question about the data. Instead of writing a SQL query, someone gives the .ramish file to an LLM. The CFO asks in plain English. The LLM navigates the geometry and answers. Every answer grounded in verifiable structure. No hallucination. No "the AI made something up." The math is right there.
What we haven't solved
Let's be honest about the edges.
Scale is the obvious one. 22,000 relationships encodes fast. We've tested up to 500,000 entities and it works, but we haven't pushed into millions yet. That's coming.
The encoding is a snapshot. If your data changes, you re-encode. We have frozen key arithmetic that lets you score new edges against an existing encoding without retraining from scratch, but true streaming updates to the geometry aren't there yet. Right now it's periodic audits, not continuous monitoring — though each new audit becomes more valuable when compared against the geometric baseline from the previous one.
And the LLM navigation — while grounded in the geometry — is still limited by the LLM's ability to interpret the reader tool's output. More complex queries with longer relational paths occasionally need nudging. We're improving the reader's output format to make multi-hop navigation more natural.
Try it yourself
The reader is free and open-source.
pip install ramish-explorer
The encoding engine is at ramish.io. Free tier gives you three audits a month with a risk snapshot and your top suspicious edges. Pro users get the full .ramish file — the one you can hand to an LLM.
We're two people in Colorado who think databases should be able to tell you which of their own facts to trust.