← Performance Observations
Benchmark evidence

Verified Performance Evidence

Raw benchmark output across execution environments. Single-run baseline, concurrent saturation tests, and live cloud deployments. All results are end-to-end: source file through destination write, with full row-count and checksum validation.

2,516,818 Rows/sec

One worker. One file. One machine. The cleanest measurement of the engine itself, uncomplicated by concurrency dynamics.

75.8M
Rows ingested
30.1s
Elapsed
2,516,818
Rows / sec
86.0 MB/s
Write throughput
bench — single worker run
DataForge Bench v0.4.1
Config: bench.yaml · Workers: 1 · Source: courtlistener_opinions_2024.csv

Stage 1  Source validation       PASS  75,814,101 rows detected
Stage 2  Schema inference        PASS  42 columns mapped
Stage 3  Ingestion               PASS  elapsed 30.123s

  Rows inserted   75,814,101
  Rows/sec        2,516,818
  Throughput      86.0 MB/sec
  Malformed       0
  Dropped         0
  Delta           0

Stage 4  Validation              PASS  row count, schema, checksum
Run complete.
HardwareAMD Ryzen 9 7950X · 32 logical cores
RAM63.2 GB
Target DBPostgreSQL 18.0 · local socket
Execution pathAPI (warmed system conditions)
DatasetCourtListener opinions corpus · 75,814,101 rows
ValidationRow count · schema · checksum · PASS

5,426,774 Aggregate Rows/sec

10 parallel workers. 758M total rows. Disk was not yet the bottleneck.

758.1M
Total rows (10 × 75.8M)
139.7s
Wall time
5,426,774
Aggregate rows / sec
163ms
Launch skew (all 10 workers)
// System utilization
  • CPU avg39.3%
  • CPU peak86.1%
  • Disk utilization41.5%
  • RAM (peak)~18.4 GB
// Validation
  • Jobs completed10 / 10
  • Failed jobs0
  • Malformed rows0
  • Row-count delta0
At 39.3% average CPU and 41.5% disk utilization, neither compute nor storage was saturated. The engine maintained near-simultaneous parallel execution (163ms launch skew across 10 workers) and delivered 5.4M aggregate rows/sec with zero errors.

Storage Saturation — Lossless

20 parallel workers pushed the test device to its physical write ceiling. The engine held. No data loss. No job failures.

1,516M
Total rows (20 × 75.8M)
291.4s
Wall time
5,202,719
Aggregate rows / sec
1,954 MB/s
Peak write throughput
// Storage metrics
  • Avg write896.9 MB/s
  • Peak write1,954 MB/s
  • Avg queue depth10.1
  • Disk utilization54.5%
// Compute metrics
  • Peak CPU98.3%
  • RAM (peak)~38.1 GB
  • Workers20
  • Source data staged~26 GB
// Integrity
  • Jobs completed20 / 20
  • Failed jobs0
  • Malformed rows0
  • Row-count delta0

At 20 workers, DataForge reached the physical write ceiling of the test storage device — sustaining peak writes of 1,954 MB/s. Throughput held at 5.2M aggregate rows/sec. No data loss. No job failures. The architecture degraded gracefully against hardware limits rather than introducing errors or partial writes. This is the key integrity result: storage saturation surfaced as reduced throughput, not as data corruption.

0
Failed jobs
0
Malformed rows
0
Dropped rows
0
Row-count delta

Cross-Cloud Deployment

Cloud numbers carry honest overhead — cold start, network transport, database writes over the network. These are real production numbers, not compute-only figures.

Google Cloud Platform Live
GCS → Cloud Run Jobs → Cloud SQL
Enterprise Plus · 8 vCPU / 64 GB
895,259
rows / sec · 1m 25s wall time
  • Rows ingested75,814,101
  • Malformed0
  • Dropped0
  • ValidationPASS
Amazon Web Services Deployed
ECS Fargate → RDS PostgreSQL
5-region cluster · end-to-end validated
444,000
rows / sec · 75.8M rows documented
Full bench-report: coming soon
Infrastructure deployed and validated. Formal timed benchmark run in progress.
Microsoft Azure Deployed
Deployed via azure-setup.ps1
Azure Database for PostgreSQL
Full bench-report: coming soon
Infrastructure deployed. Benchmark run scheduled. Results will be published here.
Cloud throughput figures reflect network-transport overhead to managed database services — not the engine's compute ceiling. The GCP result (895K rows/sec into Cloud SQL Enterprise Plus) is a meaningful production reference for managed cloud PostgreSQL ingestion at scale.

Run It on Your Infrastructure

The benchmark harness is available for download. Register to receive the binary, configuration, and runbook. No sales call required. Results unlock immediately after email verification.

Download the Benchmark Kit → View the Runbook
Ready to
talk throughput?

Pilot discussions, investor conversations, enterprise architecture review, or technical deep-dives.