CODE REVIEW GRAPH — Pumpfun Replay

Complete project dependency map · data flow · component status

Data Layer

Training

Models

Simulation

Dashboard

Live Bot

source files

ML models

rule systems

API routes

dashboard pages

live bot modules

DATA LAYER — Parquet files, CSV, PostgreSQL3.75 GB+ on disk

unified-features.parquet

7.3M rows · 73 cols · Beat's 30K tokens
Pre-joined features for all Beat-entered tokens

ACTIVE

full-universe-features.parquet

56M rows · 38 cols · 640K tokens
Beat + random tokens (shared features only)

32 FEATS

negative-replays (23 batches)

48.7M rows · 610K mints
Random pump.fun tokens for validation

ACTIVE

neg-computed/ (23 batches)

48.7M rows · 119 cols
Negatives with computed features

NEW

beat-trades/

beat-jan2026-trades.csv (94K trades)
beat-context.csv (portfolio state)

ACTIVE

PostgreSQL (debunker)

trades: 3.9M rows · wallet_trades: 94K
model_scores: 44K · tokens: 33K

ACTIVE

backtest.json

45K trades with PnL, sizing, exit types
Dashboard reads this for performance/trades

+3,015 SOL

scored-mints.jsonl.gz

30K mints × all frames with model scores
Used by live-tuning SSE

157 MB

▼ feeds into ▼

TRAINING — Python scriptstraining/

build-unified-features.py

Builds pre-joined parquet from raw replays
→ unified-features.parquet

ACTIVE

build-full-universe.py

Combines Beat + negatives (shared cols)
→ full-universe-features.parquet

32 FEATS

compute-features-negatives-batched.py

Computes 40 Beat-only features for negatives
→ neg-computed/ (119 cols each)

NEEDS COMBINE

train-all-models.py

Original 9-model training pipeline
Filter, Entry, Sizing, Runner, Exit, Post-mig

ACTIVE

retrain-all-full-universe.py

Retrain all models on full universe
32 shared features only

KILLED

simulate-fast.py

40-second vectorized simulation
→ backtest.json (dashboard data)

+3,015 SOL

retrain-exit-frame-match.py

Exit model experiments (4 variants)
±3f, ±1f, regression, exact

DONE

retrain-sizing-9class.py

Sizing expanded to 9 classes
81% accuracy on Beat's grid

DONE

retrain-filter-honest.py

Validated filter on 610K random tokens
AUC 1.0 confirmed legit

DONE

sweep-exit-models.py

10 variants: v1/v2 × 5 thresholds
Best: v1 at 0.45 (+3,015 SOL)

DONE

sweep-regime.py

7 regime re-entry variants
Fixed WARM outperforms regime

DONE

export-models-json.py

Export .pkl → .json for JS inference
5 models exported to live-bot/models/

ACTIVE

precompute-scores.py

Score all 7M frames through all models
→ scored-frames.parquet

ACTIVE

▼ produces models ▼

MODELS — .pkl (Python) + .json (JavaScript)data/ + live-bot/models/

Filter (Model 0)

AUC 1.0 · 29 feats · 300 trees
Beat vs random at frame 8. Grade: B

Entry (Model 1)

AUC 0.826 · 27 feats · 400 trees
"Buy now?" 68% of Beat entries. Grade: B

Sizing (Model 2)

9-class · 70 feats · 4500 trees
81% train, 24% exact match. Grade: D

Runner

AUC 0.799 · 58 feats · 400 trees
Feeds exit model + gates re-entries

Exit (Model 5)

AUC 0.860 · 64 feats · 400 trees
8.9% ±5f → 23.9% (autoresearch). Grade: F

Exit (autoresearch best)

Asymmetric -12/+1 labels · depth 16
23.9% ±5f · not yet in production

EXPERIMENTAL

▼ used by simulation + bot ▼

SIMULATION & ANALYSIStraining/ + autoresearch/

simulate-fast.py

Full backtest: filter→entry→sizing→exit→re-entry
+3,015 SOL · 45K trades · 47.9% WR · 40s runtime

PRODUCTION

autoresearch/

Autonomous experiment loop (Karpathy pattern)
prepare.py (fixed eval) + train.py (agent edits)

20 EXPERIMENTS

Audit findings

+2,163 SOL early-entry advantage
Filter legit · 0% false positives · Exit weakest

DOCUMENTED

▼ feeds dashboard + bot ▼

DASHBOARD — Next.js 16 + React 19dashboard/

/live

Real-time bot monitor (mock + WebSocket)
Positions, PnL curve, alerts, kill switch

NEW

/performance

Backtest results from backtest.json
PnL, WR, exit/tier/buy breakdowns, daily

ACTIVE

/trades

Sortable trade browser with Beat comparison
Click → Beat's full history + sizing analysis

ACTIVE

/replay

Trade-by-trade replay (1,600+ lines)
Positions, charts, accuracy, pump.fun feed

ACTIVE

/live-tuning

Client-side simulation with sliders
Drag params, watch PnL change in real time

ACTIVE

/strategy

All models + rules documented
Scorecard, audit findings, vulnerabilities

UPDATED

/tuning

Current config (thresholds, params)

ACTIVE

9 API routes

stats, summary, trades, equity, all-trades
all-mints, mint-trades, replay (SSE), live-tuning (SSE)

ACTIVE

▼ live bot connects to ▼

LIVE BOT — Node.jslive-bot/

geyser-client.js

BlockRazor gRPC connection
Subscribe to pump.fun transactions

NEEDS PROTO

accumulator.js

O(1) per-trade feature engine
Mirrors backtest features exactly

READY

xgboost-scorer.js

Compiled tree inference (0.6ms pipeline)
TypedArray flat arrays, NaN handling

OPTIMIZED

model-pipeline.js

All 5 models with feature enrichment
Position-relative features for exit

READY

risk-manager.js

P0 kill switches + P1 circuit breakers
Daily loss, drawdown, position caps, stale burst

READY

tx-builder.js

Pump.fun buy/sell with PDA cache
Jito anti-MEV + 0slot tips

READY

sender.js

0slot HTTP with keep-alive
sendFast() fire-and-forget for buys

READY

executor.js

Execution manager
Blockhash pre-fetch, paper/live toggle

READY

ws-server.js

WebSocket → dashboard /live page
Trades, positions, stats, commands

READY

config.js

Hot-reloadable config
Dashboard can write config.json

READY

index.js

Main pipeline wiring
Filter→Entry→Sizing→Exit→Re-entry→Execute

READY

LEGACY / REFERENCE — Old VPS bot codevps-deploy/

debunker.js (665 lines)

Old comparison engine
Helius WebSocket, single classifier

LEGACY

beat-live-sel09.js (1,914 lines)

Old trading bot
Used as reference for tx building

LEGACY

token-tracker-v3.js

176-feature accumulator
Reference for live-bot accumulator

REFERENCE

pump-tx.js

Transaction builder
Ported to live-bot/tx-builder.js

PORTED

DATA FLOW

Geyser Raw Data → replay.js → parquet files → build-unified-features.py → unified-features.parquet
                                              → compute-features-negatives-batched.py → neg-computed/
                                              → build-full-universe.py → full-universe-features.parquet

unified-features.parquet → train-all-models.py → model .pkl files → export-models-json.py → model .json files
                         → simulate-fast.py → backtest.json → dashboard API routes → pages

backtest.json → /api/stats, /api/summary, /api/trades, /api/equity
PostgreSQL    → /api/replay, /api/all-trades, /api/all-mints, /api/mint-trades
scored-mints  → /api/live-tuning

model .json   → live-bot/xgboost-scorer.js → model-pipeline.js → index.js
BlockRazor    → geyser-client.js → accumulator.js → model-pipeline.js → risk-manager.js → executor.js → 0slot
ws-server.js  → dashboard /live page (WebSocket)

CRITICAL DEPENDENCIES

simulate-fast.py DEPENDS ON: unified-features.parquet, all 5 model .pkl files, beat-context.csv
backtest.json DEPENDS ON: simulate-fast.py output
/replay page DEPENDS ON: backtest.json + PostgreSQL (trades, wallet_trades, model_scores)
/live-tuning DEPENDS ON: scored-mints.jsonl.gz (pre-computed model scores)
live-bot DEPENDS ON: model .json files, BlockRazor token, 0slot API key, wallet keypair

WHAT NEEDS FIXING

1. full-universe-features.parquet only has 32 shared features — needs 70+ with computed features
   → compute-features-negatives-batched.py ran successfully (119 cols) but combine step failed
   → Need to fix combine: align Beat (73 cols) + Negatives (119 cols) → unified ~70 cols

2. All models trained on Beat-only features (73 cols) — need retraining on full universe
   → retrain-all-full-universe.py exists but was killed (was using 32-feat version)

3. Exit model autoresearch best (23.9%) not yet deployed to production simulation

4. Geyser client needs BlockRazor proto file to actually connect

5. Accumulator (live-bot) needs validation against backtest accumulator (feature drift check)