Summary
This report documents the performance metrics from the incremental database seeding for stress testing the LangChain migration (Feature 014).
System Specifications
- RAM: 120 GB
- Storage: 916 GB NVMe SSD (588 GB available before test)
- Database: PostgreSQL 15 (Docker container)
- Partition:
/dev/nvme1n1p2
Test Stages
Stage 1: Small Test (1K docs, ~50K chunks)
| Metric |
Value |
| Documents |
1,000 |
| Chunks |
50,326 |
| Mega doc chunks |
10,000 |
| DB Size |
429 MB |
| Time |
20.0 seconds |
| Rate |
~2,511 chunks/s |
| Size per chunk |
~8.5 KB |
Stage 2: Medium Test (10K docs, ~460K chunks)
| Metric |
Value |
| Documents |
10,000 |
| Chunks |
459,342 |
| Mega doc chunks |
50,000 |
| DB Size |
3.75 GB |
| Time |
177.4 seconds (~3 min) |
| Rate |
~2,589 chunks/s |
| Size per chunk |
~8.35 KB |
Stage 3: Large Test (100K docs, ~4.2M chunks) - COMPLETED
| Metric |
Value |
| Documents |
100,000 |
| Chunks |
4,202,829 |
| Mega doc chunks |
100,000 |
| Vector stores |
10 |
| Specialists |
20 |
| DB Size |
34 GB |
| Time |
1,747.3 seconds (~29 min) |
| Rate |
~2,405 chunks/s |
| Size per chunk |
~8.1 KB |
Stage 4: Full Test (1M docs, ~25.5M chunks) - PLANNED
| Metric |
Estimated Value |
| Documents |
1,000,000 |
| Expected chunks |
~25,500,000 |
| Mega doc chunks |
500,000 |
| Vector stores |
10 |
| Specialists |
20 |
| Estimated DB size |
~210-250 GB |
| Estimated time |
~3 hours |
Chunk Insert Rate
- Average: ~2,400-2,500 chunks/second
- Consistent: Rate remains stable regardless of table size
- Batch size: 2,000 chunks per INSERT
Storage Calculations
- Per chunk (avg): ~8.35 KB
- Embedding (1536 × 4 bytes): 6.1 KB
- Text content: ~500 bytes
- Metadata + indexes: ~1.7 KB
Time Estimates
| Chunks |
Estimated Time |
| 50K |
20 seconds |
| 500K |
3.5 minutes |
| 4M |
28 minutes |
| 25M |
~3 hours |
Chunk Distribution (Variable Chunks Mode)
The --variable-chunks flag creates realistic document size distribution:
| Category |
Chunks per doc |
% of docs |
Example |
| Small |
1-5 |
50% |
Notes, single pages |
| Medium |
6-30 |
25% |
Reports, articles |
| Large |
31-100 |
15% |
Manuals, guides |
| Very Large |
101-300 |
8% |
Comprehensive PDFs |
| Huge |
301-1000 |
1.9% |
Technical specs |
| Extreme |
1001+ |
0.1% |
Large datasets |
| Mega doc |
Configurable |
1 doc |
500K chunks for stress testing |
Commands Reference
# Clean existing stress test data
pnpm ts-node src/scripts/seed-stress-test-data.ts --clean
# Stage 1: Small test
pnpm ts-node src/scripts/seed-stress-test-data.ts \
--documents=1000 \
--chunk-sampling=100 \
--variable-chunks \
--mega-doc-chunks=10000 \
--vector-stores=5 \
--specialists=10
# Stage 2: Medium test
pnpm ts-node src/scripts/seed-stress-test-data.ts \
--documents=10000 \
--chunk-sampling=100 \
--variable-chunks \
--mega-doc-chunks=50000 \
--vector-stores=5 \
--specialists=10
# Stage 3: Large test
pnpm ts-node src/scripts/seed-stress-test-data.ts \
--documents=100000 \
--chunk-sampling=100 \
--variable-chunks \
--mega-doc-chunks=100000 \
--vector-stores=10 \
--specialists=20 \
--batch-size=2000
# Stage 4: Full test (500K mega doc)
pnpm ts-node src/scripts/seed-stress-test-data.ts \
--documents=1000000 \
--chunk-sampling=100 \
--variable-chunks \
--mega-doc-chunks=500000 \
--max-chunks=1000 \
--vector-stores=10 \
--specialists=20 \
--batch-size=2000
Disk Space Requirements
| Stage |
Chunks |
DB Size |
Cumulative |
Time |
| Base |
0 |
23 MB |
23 MB |
- |
| Stage 1 |
50K |
429 MB |
429 MB |
20s |
| Stage 2 |
460K |
3.8 GB |
3.8 GB |
3 min |
| Stage 3 |
4.2M |
34 GB |
34 GB |
29 min |
| Stage 4 |
25.5M |
~210 GB |
~210 GB |
~3 hrs (est.) |
Notes
- Storage is efficient: Each chunk with embedding is ~8.35 KB
- Rate is consistent: INSERT performance doesn't degrade with table size
- Variable chunks: Creates realistic edge cases for testing
- Mega document: Single document with 500K chunks tests extreme scenarios
Report generated during stress test seeding for LangChain Migration (Feature 014)