Sibyl Memory · #2 on LongMemEval Oracle (95.6%)

01 Benchmark

We score 95.6% on LongMemEval. Without vectors.

LongMemEval Oracle (ICLR 2025, University of Michigan) is the standard benchmark for long-horizon agent memory. 500 questions across six categories. Our score is public, reproducible, and live on the leaderboard.

95.6^%

LongMemEval Oracle · 500 questions · Claude Opus 4.6 Sibyl Memory placed second on the public leaderboard, tied with Chronos (PwC). The only file-based system in the top tier.

4 vCPU · 16GB EC2 · zero vectors · zero embeddings · zero external retrieval

Rank	System	Score	Architecture
1	agentmemory V4	96.2%	embedding-based
2	Sibyl Memory	95.6%	file-based · zero vectors
2	Chronos (PwC)	95.6%	embedding-based
4	Mastra Observational Memory	94.9%	embedding-based
5	MemMachine	93.0%	embedding-based
6	Hindsight (Vectorize)	91.4%	vector DB
Mem0 · Zep · Supermemory · Emergence AI · Oracle baseline — all below the top tier

100%

single-session-user

100%

single-session-assistant

96.2%

temporal-reasoning

93.3%

single-session-pref

93.2%

multi-session

92.3%

knowledge-update

We did not optimize for the benchmark. We optimized for production efficiency. The benchmark improvement was a side effect.

Sibyl Labs · LongMemEval Report · April 2026

01.5 Methodology

How we ran the benchmark.

Benchmark: LongMemEval Oracle (ICLR 2025, University of Michigan)
Dataset: 500 questions across 6 categories, public leaderboard
Judge model: Claude Opus 4.6
Hardware: 4 vCPU / 16GB EC2
Architecture: File-based, zero vectors, zero embeddings, zero external retrieval
Run date: April 2026
Result: 95.6%, ranked #2 (tied with Chronos)

The full report (per-category breakdown, runtime cost, ablation on the file-based architecture) is at blog.sibylcap.com/longmemeval-v2.

Read the full benchmark report ↗

02 Product

Sibyl Memory.

Postgres-backed, schema-first, multi-tenant. One SDK, two transports: Sibyl Cloud or your own database. Persistent state belongs in a schema, not a chatbot's context window.

Sibyl Memory

Persistent memory for AI agents.

Postgres-backed, schema-first, multi-tenant. One SDK, two transports: Sibyl Cloud or your own database. The substrate that scored #2 on LongMemEval, packaged for production.

Free100 MAU · cloud

Starter$99 / mo · 1K MAU

Pro$499 / mo · 10K MAU

Scale$2,500+ / mo · 100K+ MAU

Self-host$25,000 / yr · BYOC

From Free to enterprise self-host

Explore memory → Request API key

Sibyl Memory is one product in the lab's surface. For the full catalog including SIBYL Framework, Hermes plugin, Ping protocol, and x402 endpoints, see /products.

03 Applications

One memory system. Five operating shapes.

Same architecture across every deployment. File-based at the substrate, Postgres-backed in production, multi-tenant by namespace. Different access patterns optimized for different team problems. From a single operator to a million users.

01

Operator Memory

For solo operators running AI

Full-stack memory for one operator. Priorities, journal, entities, scars, relationships, arc. The same shape Sibyl uses to operate herself, packaged for delivery to anyone running long-horizon work through an AI.

Buyers: indie founders, solo researchers, AI-first builders, autonomous agent operators · single-tenant per operator, multi-month operational continuity

Live

02

User Profile Memory

For consumer platforms with active users

Per-user persistent memory at platform scale. Tracks history, extracts patterns, surfaces back via your UI. Each user gets an isolated namespace that grows with their behavior.

Buyers: prediction markets, social platforms, agent platforms · 1K to 1M+ active users per platform

Pilot

Three more shapes — Conversational Continuity, Agent Reputation, and Org Memory — are in the development pipeline. See Products → In development for the full surface.

04 Architecture

Memory is the state. Agents are stateless.

The multi-tenant pattern: per-tenant schema namespace, ephemeral agent runtime, the same hierarchical memory shape across every use case. Each request loads exactly the requesting tenant's slice, processes the turn, writes back, exits. Memory persists in Postgres. Agents do not persist at all.

Schema · Five Tiers

HOT

state_documents

treasury · priorities · session

WARM

entities · entity_relations

UNIQUE(tenant, category, name)

COLD

journal · revenue · errors · metrics

append-only · indexed by ts

REFERENCE

reference_documents

runbooks · rules · constants

Six questions, answered.

What buyers and integrators actually ask before committing.

Does Sibyl Memory require a vector database?

No. Sibyl Memory is file-based at the substrate and Postgres-backed in production. There is no vector database, no embedding pipeline, and no external retrieval service. Schema is imposed at write time, not inferred at read time. That is the architectural choice that produced the 95.6% result on a 4 vCPU / 16GB EC2 instance.

What is the latency overhead per memory call?

A single Postgres query against an indexed namespace. In practice, p50 sits in the low single-digit milliseconds for hot-tier reads, with no embedding round-trip and no vector-DB hop. Cold-tier reads incur the cost of decompression. Production-grade benchmarks are in the full report at blog.sibylcap.com/longmemeval-v2.

Can it run on RDS, Neon, Aurora, or other managed Postgres?

Yes. Sibyl Memory uses standard Postgres 14+ features (JSONB, partial indexes, triggers, materialized views). RDS, Aurora, Neon, Supabase, and self-hosted Postgres all work. The self-host tier is BYOC by design.

Does it support GDPR cascade delete?

Yes. Tenant-scoped cascade delete is a first-class operation. Removing a user namespace removes their messages, memory entries, audit log rows, and any derived state in a single transaction. Tamper-proof audit log on every write keeps the deletion itself recorded for compliance. EU AI Act export is supported.

How is Sibyl Memory different from vector-DB approaches like Mem0 or Zep?

Vector approaches infer structure at read time via embedding similarity. That is good for fuzzy semantic recall and bad for everything that needs a precise answer. Sibyl Memory imposes structure at write time via schema, which is good for everything that needs a precise answer and slightly less elegant for unstructured ambient recall. The 95.6% LongMemEval score is the proof point that schema-first does not lose to embedding-first on the questions that matter to autonomous agents. Full comparison →

Is the benchmark code public?

The LongMemEval Oracle benchmark itself is public (ICLR 2025, University of Michigan). The Sibyl Memory implementation is licensed; the architecture, methodology, and per-category results are documented in the full report at blog.sibylcap.com/longmemeval-v2. For deeper architecture review under NDA, contact [email protected].

05 The Lab

Sibyl Labs is a research and infrastructure lab.

Sibyl Labs, LLC was formed in April 2026 to wrap the agentic infrastructure work in a real legal entity. The lab builds memory systems, agentic frameworks, and the supporting tooling that makes long-running autonomous agents possible. Every product we sell is the same architecture our own agent operates on.

The thesis is not complicated. Most agents forget. The ones that remember are built on architecture that scales. We publish the work in public, benchmark it in the open, and ship the substrate so others can build the next generation of agents on something that has already survived production.

Memory is one shape of the work. Frameworks are another. Custom builds for partners are a third. The output is the same: infrastructure for agents that operate, not demo.

Lab notes & benchmarks blog ↗ LongMemEval report april 2026 ↗ Sibyl, the agent sibylcap.com ↗ @sibylcap on X x ↗

06 Contact

Enterprise. Research. Custom builds.

For self-host, BYOC, or air-gapped deployments. For research collaboration on memory or agent benchmarks. For bespoke memory or framework integrations sized to a partner's specific architecture.

Email [email protected] DM @sibylcap Read blog.sibylcap.com

Sibyl Memory. #2 on LongMemEval.