Codag
Sign in Book a demo
45.6×

Compression

0%

Hallucination

<1s

Warm latency

$0.10

Per million lines

Three axes, defined precisely. No vibes-based comparisons.

Axis · 01

Compression

ratio = raw_bytes / output_bytes

Every line you don't have to send to an LLM is money saved and context preserved. Higher is better. gzip ≈ 6×, severity ≈ 3×, codag > 40× lossy.

Axis · 02

Recall accuracy

recall = |kept ∩ truth| / |truth|

On labeled incidents: did the trigger and root-cause lines actually survive? Compression without recall is just deletion. Measured per role: trigger, root_cause, evidence.

Axis · 03

Speed

latency = p50, p95 ms · throughput

Wall-clock per incident, end to end. Includes network round-trip for hosted baselines, raw compute for local ones. Reported as p50 and p95.

Codag vs. LibreLog (Llama-3-8B)

5 wins · 1 tie · 2 losses · 5× smaller model, trained in 5h on a Mac

Dataset Metric LibreLog Codag Δ
Hadoop FTA 0.702 0.753 +5.1pp
Hadoop FGA 0.901 0.938 +3.7pp
HDFS PA 0.918 0.988 +7.0pp
HDFS FTA 0.777 0.800 +2.3pp
HDFS GA 1.000 1.000 tie
Spark FGA 0.936 0.978 +4.2pp

Evidence recall vs. prior art

On labeled incident windows. Higher = more diagnostic lines preserved.

Approach Evidence recall Trigger recall Compression
Severity filter 0.30 0.12 3.4×
Drain3 templating 0.29 0.18 9.9×
TF-IDF anomaly 0.15 0.08 14.2×
Codag 0.619 0.667 7.4× · 45.6× compact

Reproducible from the open repo: Drain3, gzip / lz4, severity filter, Claude Opus 4.6, and codag (with an API key). The LibreLog comparison requires their published model weights and is not bundled — see codag/codag-log-bench for the harness.

Run the benchmarks yourself.

One repo, four open-source baselines + the codag API. LogHub-2.0 fetched on demand, ~30 hand-labeled incidents bundled. Results land in results/latest.json.

$ git clone https://github.com/codag/codag-log-bench
$ cd codag-log-bench && bash scripts/download_loghub.sh
$ CODAG_API_KEY=cdk_… python -m codag_log_bench.run --baselines all
View on GitHub MIT-licensed · CI-tested · drop-in

Want to run the benchmarks against your own log corpus?

Get in touch