CODE REVIEW GRAPH — Pumpfun Replay

Complete project dependency map · data flow · component status
Data Layer
Training
Models
Simulation
Dashboard
Live Bot
73
source files
6
ML models
5
rule systems
9
API routes
8
dashboard pages
12
live bot modules
DATA LAYER — Parquet files, CSV, PostgreSQL3.75 GB+ on disk
unified-features.parquet
7.3M rows · 73 cols · Beat's 30K tokens
Pre-joined features for all Beat-entered tokens
ACTIVE
full-universe-features.parquet
56M rows · 38 cols · 640K tokens
Beat + random tokens (shared features only)
32 FEATS
negative-replays (23 batches)
48.7M rows · 610K mints
Random pump.fun tokens for validation
ACTIVE
neg-computed/ (23 batches)
48.7M rows · 119 cols
Negatives with computed features
NEW
beat-trades/
beat-jan2026-trades.csv (94K trades)
beat-context.csv (portfolio state)
ACTIVE
PostgreSQL (debunker)
trades: 3.9M rows · wallet_trades: 94K
model_scores: 44K · tokens: 33K
ACTIVE
backtest.json
45K trades with PnL, sizing, exit types
Dashboard reads this for performance/trades
+3,015 SOL
scored-mints.jsonl.gz
30K mints × all frames with model scores
Used by live-tuning SSE
157 MB
▼ feeds into ▼
TRAINING — Python scriptstraining/
build-unified-features.py
Builds pre-joined parquet from raw replays
→ unified-features.parquet
ACTIVE
build-full-universe.py
Combines Beat + negatives (shared cols)
→ full-universe-features.parquet
32 FEATS
compute-features-negatives-batched.py
Computes 40 Beat-only features for negatives
→ neg-computed/ (119 cols each)
NEEDS COMBINE
train-all-models.py
Original 9-model training pipeline
Filter, Entry, Sizing, Runner, Exit, Post-mig
ACTIVE
retrain-all-full-universe.py
Retrain all models on full universe
32 shared features only
KILLED
simulate-fast.py
40-second vectorized simulation
→ backtest.json (dashboard data)
+3,015 SOL
retrain-exit-frame-match.py
Exit model experiments (4 variants)
±3f, ±1f, regression, exact
DONE
retrain-sizing-9class.py
Sizing expanded to 9 classes
81% accuracy on Beat's grid
DONE
retrain-filter-honest.py
Validated filter on 610K random tokens
AUC 1.0 confirmed legit
DONE
sweep-exit-models.py
10 variants: v1/v2 × 5 thresholds
Best: v1 at 0.45 (+3,015 SOL)
DONE
sweep-regime.py
7 regime re-entry variants
Fixed WARM outperforms regime
DONE
export-models-json.py
Export .pkl → .json for JS inference
5 models exported to live-bot/models/
ACTIVE
precompute-scores.py
Score all 7M frames through all models
→ scored-frames.parquet
ACTIVE
▼ produces models ▼
MODELS — .pkl (Python) + .json (JavaScript)data/ + live-bot/models/
Filter (Model 0)
AUC 1.0 · 29 feats · 300 trees
Beat vs random at frame 8. Grade: B
v1
Entry (Model 1)
AUC 0.826 · 27 feats · 400 trees
"Buy now?" 68% of Beat entries. Grade: B
v1
Sizing (Model 2)
9-class · 70 feats · 4500 trees
81% train, 24% exact match. Grade: D
v2
Runner
AUC 0.799 · 58 feats · 400 trees
Feeds exit model + gates re-entries
v1
Exit (Model 5)
AUC 0.860 · 64 feats · 400 trees
8.9% ±5f → 23.9% (autoresearch). Grade: F
v1
Exit (autoresearch best)
Asymmetric -12/+1 labels · depth 16
23.9% ±5f · not yet in production
EXPERIMENTAL
▼ used by simulation + bot ▼
SIMULATION & ANALYSIStraining/ + autoresearch/
simulate-fast.py
Full backtest: filter→entry→sizing→exit→re-entry
+3,015 SOL · 45K trades · 47.9% WR · 40s runtime
PRODUCTION
autoresearch/
Autonomous experiment loop (Karpathy pattern)
prepare.py (fixed eval) + train.py (agent edits)
20 EXPERIMENTS
Audit findings
+2,163 SOL early-entry advantage
Filter legit · 0% false positives · Exit weakest
DOCUMENTED
▼ feeds dashboard + bot ▼
DASHBOARD — Next.js 16 + React 19dashboard/
/live
Real-time bot monitor (mock + WebSocket)
Positions, PnL curve, alerts, kill switch
NEW
/performance
Backtest results from backtest.json
PnL, WR, exit/tier/buy breakdowns, daily
ACTIVE
/trades
Sortable trade browser with Beat comparison
Click → Beat's full history + sizing analysis
ACTIVE
/replay
Trade-by-trade replay (1,600+ lines)
Positions, charts, accuracy, pump.fun feed
ACTIVE
/live-tuning
Client-side simulation with sliders
Drag params, watch PnL change in real time
ACTIVE
/strategy
All models + rules documented
Scorecard, audit findings, vulnerabilities
UPDATED
/tuning
Current config (thresholds, params)
ACTIVE
9 API routes
stats, summary, trades, equity, all-trades
all-mints, mint-trades, replay (SSE), live-tuning (SSE)
ACTIVE
▼ live bot connects to ▼
LIVE BOT — Node.jslive-bot/
geyser-client.js
BlockRazor gRPC connection
Subscribe to pump.fun transactions
NEEDS PROTO
accumulator.js
O(1) per-trade feature engine
Mirrors backtest features exactly
READY
xgboost-scorer.js
Compiled tree inference (0.6ms pipeline)
TypedArray flat arrays, NaN handling
OPTIMIZED
model-pipeline.js
All 5 models with feature enrichment
Position-relative features for exit
READY
risk-manager.js
P0 kill switches + P1 circuit breakers
Daily loss, drawdown, position caps, stale burst
READY
tx-builder.js
Pump.fun buy/sell with PDA cache
Jito anti-MEV + 0slot tips
READY
sender.js
0slot HTTP with keep-alive
sendFast() fire-and-forget for buys
READY
executor.js
Execution manager
Blockhash pre-fetch, paper/live toggle
READY
ws-server.js
WebSocket → dashboard /live page
Trades, positions, stats, commands
READY
config.js
Hot-reloadable config
Dashboard can write config.json
READY
index.js
Main pipeline wiring
Filter→Entry→Sizing→Exit→Re-entry→Execute
READY
LEGACY / REFERENCE — Old VPS bot codevps-deploy/
debunker.js (665 lines)
Old comparison engine
Helius WebSocket, single classifier
LEGACY
beat-live-sel09.js (1,914 lines)
Old trading bot
Used as reference for tx building
LEGACY
token-tracker-v3.js
176-feature accumulator
Reference for live-bot accumulator
REFERENCE
pump-tx.js
Transaction builder
Ported to live-bot/tx-builder.js
PORTED

DATA FLOW

Geyser Raw Data → replay.js → parquet files → build-unified-features.py → unified-features.parquet
                                              → compute-features-negatives-batched.py → neg-computed/
                                              → build-full-universe.py → full-universe-features.parquet

unified-features.parquet → train-all-models.py → model .pkl files → export-models-json.py → model .json files
                         → simulate-fast.py → backtest.json → dashboard API routes → pages

backtest.json → /api/stats, /api/summary, /api/trades, /api/equity
PostgreSQL    → /api/replay, /api/all-trades, /api/all-mints, /api/mint-trades
scored-mints  → /api/live-tuning

model .json   → live-bot/xgboost-scorer.js → model-pipeline.js → index.js
BlockRazor    → geyser-client.js → accumulator.js → model-pipeline.js → risk-manager.js → executor.js → 0slot
ws-server.js  → dashboard /live page (WebSocket)
  

CRITICAL DEPENDENCIES

simulate-fast.py DEPENDS ON: unified-features.parquet, all 5 model .pkl files, beat-context.csv
backtest.json DEPENDS ON: simulate-fast.py output
/replay page DEPENDS ON: backtest.json + PostgreSQL (trades, wallet_trades, model_scores)
/live-tuning DEPENDS ON: scored-mints.jsonl.gz (pre-computed model scores)
live-bot DEPENDS ON: model .json files, BlockRazor token, 0slot API key, wallet keypair
  

WHAT NEEDS FIXING

1. full-universe-features.parquet only has 32 shared features — needs 70+ with computed features
   → compute-features-negatives-batched.py ran successfully (119 cols) but combine step failed
   → Need to fix combine: align Beat (73 cols) + Negatives (119 cols) → unified ~70 cols

2. All models trained on Beat-only features (73 cols) — need retraining on full universe
   → retrain-all-full-universe.py exists but was killed (was using 32-feat version)

3. Exit model autoresearch best (23.9%) not yet deployed to production simulation

4. Geyser client needs BlockRazor proto file to actually connect

5. Accumulator (live-bot) needs validation against backtest accumulator (feature drift check)