Skip to content
Open

4.0 #402

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
197 commits
Select commit Hold shift + click to select a range
ab0f857
Should be a stable standalone for multisweeps to test
Sep 15, 2025
9f352eb
Stable multigpu hyper sweeps
Sep 16, 2025
40da904
Stable pure sample eff multisweep
Sep 16, 2025
c017b81
Compute data
Sep 18, 2025
b028b77
stable
Sep 19, 2025
2d11375
Fix sweep len
Sep 19, 2025
337f520
hide dash
Sep 19, 2025
95ba005
Upgrade vis
Sep 20, 2025
10d96f0
breakout config
Sep 20, 2025
9df4db5
Merge branch 'multisweep' of https://github.com/pufferai/pufferlib in…
Sep 20, 2025
cded41d
Eff sweep pong
Sep 23, 2025
c90cc80
eff sweep breakout
Sep 23, 2025
a9be3df
Fix hacky max cost
Sep 23, 2025
2e8b020
Simple plot
Sep 23, 2025
51bfd91
breakout config
Sep 23, 2025
d9b9b0d
Merge branch 'multisweep' of https://github.com/pufferai/pufferlib in…
Sep 23, 2025
41e2415
Basic plots
Sep 24, 2025
9f017aa
small fix
Sep 24, 2025
663bd4f
Plot colors
Sep 24, 2025
a6c8445
Headless video save demo
Sep 26, 2025
6d97022
Temp garbo
Sep 29, 2025
a46d7fa
Merge branch 'multisweep' of https://github.com/pufferai/pufferlib in…
Sep 29, 2025
7cb168c
Still a mess but better
Sep 30, 2025
82f2e93
decent prototype
Sep 30, 2025
e288451
Update configs
Sep 30, 2025
352e0cc
configs
Sep 30, 2025
6bb09d1
Simplify
Sep 30, 2025
367c890
stable demo
Oct 4, 2025
805f558
Cull dead code
Oct 4, 2025
2ea0cb1
Constellation initial prototype
Oct 4, 2025
8df26d5
Configs for sweeping
Oct 4, 2025
fd710a8
snake config
Oct 4, 2025
4f417ca
Config
Oct 6, 2025
5ab588a
Mem fix for puffer moba w/ shared mem close. Add perf metric to pacman
Oct 7, 2025
6260d32
comment dash
Oct 7, 2025
11b0ca7
triple triad config; verbose option in pufferl
Oct 7, 2025
3a0ff8f
plot c
Oct 7, 2025
e9b1185
Merge branch 'multisweep' of https://github.com/pufferai/pufferlib in…
Oct 7, 2025
a06fa9c
Initial plot dash
Oct 7, 2025
5ca627f
fixes
Oct 7, 2025
d351aa4
Temp
Oct 7, 2025
f540556
cogames
Oct 7, 2025
a9a2e19
cogames
Oct 7, 2025
cdab278
3d
Oct 7, 2025
58e3423
Cap snake score
Oct 8, 2025
a09a8cf
rware config
Oct 8, 2025
eb3fec4
Two small profile scripts
Oct 8, 2025
1c4fcf8
update cogames
Oct 8, 2025
d319dbc
merge
Oct 8, 2025
86dcf77
Impulse wars config
Oct 9, 2025
f94dc05
nmmo3 config
Oct 9, 2025
2d21214
cogames
Oct 9, 2025
bcb6a9f
nmmo3 net params
Oct 9, 2025
af07504
fuck this
capnspacehook Aug 20, 2025
d6b2eab
Fix impulse wars
Oct 9, 2025
c42db1c
IW sweep
Oct 9, 2025
c32430f
initial box plot
Oct 9, 2025
d85672a
merge
Oct 9, 2025
492ec08
To pandas
Oct 9, 2025
80c5712
progress
Oct 10, 2025
3e6dd21
constellation cache
Oct 11, 2025
6036f81
Prototype in color
Oct 11, 2025
2b0d829
UI
Oct 11, 2025
dad30c9
prototype
Oct 11, 2025
6f67327
pretty
Oct 12, 2025
6503dbd
Progress!
Oct 13, 2025
854c3ec
Progress!
Oct 13, 2025
9077b0e
shaders!
Oct 14, 2025
80d8127
UI
Oct 15, 2025
ce29f53
UI
Oct 15, 2025
bfc2862
shaders
Oct 15, 2025
aa4f90f
constellations!
Oct 15, 2025
c746d1a
minor refactor
Oct 15, 2025
48c5329
Tooltip prototype
Oct 16, 2025
5c7e270
Initial refactor
Oct 17, 2025
33bd507
tooltip
Oct 17, 2025
21d66c7
Latest
Oct 18, 2025
740eee3
latest
Oct 18, 2025
65cdba0
UNSTABLE TESTING. DO NOT USE
Oct 18, 2025
f168bd6
BROKEN- DO NOT USE
Oct 18, 2025
9d23693
Initial cuda bind
Oct 19, 2025
62d361f
Build flags
Oct 20, 2025
3452174
comprehensive perf test
Oct 23, 2025
883a3d8
Iniital puffer cpp
Oct 23, 2025
81de3c5
4.6M sps test
Oct 24, 2025
4d3cdbd
Un-cppify a bit
Oct 24, 2025
3699447
Cosine anneal lr
Oct 24, 2025
8f64e12
bf16. It is slower. Will fix after
Oct 24, 2025
bcef486
Add logs
Oct 27, 2025
c68c4c6
Progress on port -- need a full matching test from python to Cpp for …
Oct 27, 2025
d1395f1
Reproducible net init
Oct 28, 2025
c0b6cf5
almost there!
Oct 30, 2025
eeba3ed
Assert check passes
Oct 30, 2025
d38cec6
fixed the bug! Was the lstm shared init for cpp
Oct 30, 2025
e94ea35
CPU check pass
Oct 31, 2025
9c4fef7
Enable gpu. Check fails, trains, 4msps)
Oct 31, 2025
6ca97e0
bf16. Slower for now because missing kernel. May need to do 32b accum
Oct 31, 2025
5b0638c
Mingru
Nov 4, 2025
0b50ca1
Multilayer
Nov 4, 2025
d7f8880
progress
Nov 4, 2025
a8d1423
Add torch muon
Nov 4, 2025
3074858
Merge pull request #411 from PufferAI/muon
jsuarez5341 Nov 4, 2025
bc74c3f
Stable
Nov 4, 2025
676bdf2
Update sweep defaults
Nov 4, 2025
0575058
Mamba
Nov 5, 2025
faede5f
Merge branch '4.0' into mamba
jsuarez5341 Nov 5, 2025
bde228c
Merge pull request #412 from PufferAI/mamba
jsuarez5341 Nov 5, 2025
9df8eda
Testing new archs
Nov 5, 2025
35530d5
Minor
Nov 6, 2025
6efab69
Ready for sweeps
Nov 6, 2025
86bc81c
Initial mingru
Nov 6, 2025
304990c
Update configs for sweeping
Nov 7, 2025
a9926ce
merge
Nov 7, 2025
01071cb
Adam for now so we can run
Nov 7, 2025
84636a7
Initial kern
Nov 8, 2025
47e4d24
merge
Nov 8, 2025
de94e75
merge
Nov 8, 2025
0f67928
More kernels
Nov 8, 2025
effe468
test kerns
Nov 8, 2025
47a0471
decent kernels
Nov 8, 2025
5d5655a
Progress
Nov 9, 2025
77011ed
Kernels pass
Nov 9, 2025
018da4d
Fix cast
Nov 9, 2025
f1ef5e2
comment prints
Nov 9, 2025
0b7f7b4
Perfect logcoeff kernel
Nov 11, 2025
456b56f
Numerically stable fused scan
Nov 11, 2025
ba446d5
Stable train
Nov 12, 2025
b22b41d
Fix bias in entropy grad
Nov 12, 2025
6f2bd90
6.6m w/ cpu envs
Nov 12, 2025
0b63fbb
skip connect and rmsnorm makes mingru stable
Nov 13, 2025
e2cbb3e
Fix breakout; solid mingru in python, runnable cpp
Nov 13, 2025
b86792c
multilayer
Nov 13, 2025
91a0e9f
latest
Nov 14, 2025
7eb5799
nmmo3
Nov 14, 2025
fa76d1b
Initial muon (needs binds)
Nov 14, 2025
0838c91
Merge branch '4.0' of https://github.com/pufferai/pufferlib into 4.0
Nov 14, 2025
6fbc382
RMSNorm
Nov 15, 2025
52920d6
Initial dll vec
Nov 15, 2025
ef0ab9a
DLL-based training initial
Nov 15, 2025
6dbb80f
Merge branch '4.0' into 4.0-merge
jsuarez5341 Nov 17, 2025
ba605bc
Merge pull request #419 from PufferAI/4.0-merge
jsuarez5341 Nov 17, 2025
60c6e62
merge
jsuarez5341 Nov 17, 2025
da8334a
Merge branch '4.0' of https://github.com/pufferai/pufferlib into 4.0
jsuarez5341 Nov 17, 2025
1f7da1a
ready g2048 for testing
jsuarez5341 Nov 17, 2025
35dfaeb
fix optim and config
Nov 17, 2025
a6cd57a
Tweak torch muon
Nov 17, 2025
a55b5e9
Fix sweeps
Nov 17, 2025
4219ee9
Torch muon matched to heavyball muon numerics
jsuarez5341 Nov 18, 2025
4a143b1
Migrate optimizer to custom muon
jsuarez5341 Nov 18, 2025
b24f130
Fix big layers
jsuarez5341 Nov 18, 2025
804fa1c
Initial heavyball muon to cpp
jsuarez5341 Nov 18, 2025
e529fb6
Working initial vec
jsuarez5341 Nov 20, 2025
6e973cc
Initial pinned mem
jsuarez5341 Nov 20, 2025
42a5b3d
Progress on kerns
jsuarez5341 Nov 22, 2025
7904c0b
latest
jsuarez5341 Nov 25, 2025
abd7965
latest
jsuarez5341 Nov 25, 2025
87f0529
Merge branch '4.0' into merge
jsuarez5341 Nov 26, 2025
685933e
Merge pull request #427 from PufferAI/merge
jsuarez5341 Nov 26, 2025
3b32ffc
temp fixes
jsuarez5341 Nov 26, 2025
51ad295
Initial cuda buffering
jsuarez5341 Nov 26, 2025
ed81f3e
Initial buffered vec. Compiles, hangs with >1 buffer
jsuarez5341 Nov 26, 2025
f0fdb6a
temp
jsuarez5341 Nov 28, 2025
e79ba75
env c files
jsuarez5341 Nov 28, 2025
e5d66a2
20k map gen for tower climb, fast load, made pufferl work
kywch Nov 28, 2025
5eb660b
Initial buffered vec runnable
Nov 28, 2025
99daf55
Merge pull request #431 from kywch/tower-20k
jsuarez5341 Nov 28, 2025
49ee5db
Blocked perf test
jsuarez5341 Nov 29, 2025
825bc9b
works 1 buffer, not 2
jsuarez5341 Nov 29, 2025
1f8d8ac
Fix bug. Am idiot
jsuarez5341 Dec 1, 2025
17f4920
latest vec
jsuarez5341 Dec 3, 2025
1437acb
initial vec
jsuarez5341 Dec 6, 2025
d2fc984
config etc
jsuarez5341 Dec 6, 2025
430b8ed
Mutex test
Dec 18, 2025
707f0f0
Progress
Dec 19, 2025
84905f8
Working initial cpu!
Dec 19, 2025
e954a9b
Now it actually works on cpu
Dec 19, 2025
9b4d8c1
Initial gpu c vec
Dec 19, 2025
0e14de0
6m sps breakout training. A bit unstable, but we got a 22s solve
Dec 19, 2025
a932085
Still unstable but good start for profiling
Dec 19, 2025
96c3b4b
Muon optims
Dec 20, 2025
0254d42
temp
Dec 20, 2025
39f1cd8
Temp
Dec 26, 2025
f15764d
temp cudagraphs
Dec 26, 2025
ec1dc6b
single buffer graph:
Dec 26, 2025
de2fe54
Working kernel. Major bug fix. Launch on same stream as torch
Dec 27, 2025
3b53176
9m with cuda trace backward
Dec 27, 2025
be99a4f
fixed fused loss for now
Dec 27, 2025
07948e9
Initial refactor
Dec 29, 2025
7f09246
Kern/graph options
Dec 29, 2025
a829b2d
pufferlib/pufferl.py
Dec 30, 2025
1a7e3c9
Initial numerical match to unoptimized cpp
Dec 30, 2025
367c9b2
stable training with cudagraphs
Dec 31, 2025
b5accce
Major env vec bug fix
Dec 31, 2025
a8e3c4a
6m stable
Dec 31, 2025
393afe0
graph rollout copy
Dec 31, 2025
0065ce3
20s with norm, 16s (unstable) without)
Dec 31, 2025
9b8f4d0
New gate!
Jan 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions assets/dash.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:root {
--font-color: #F1F1F1;
--dropdown-bg: #005050;
}

body {
background-color: black !important;
color: var(--font-color) !important;
}

.rc-slider-mark-text {
color: var(--font-color) !important;
}

.Select-control, .Select-menu-outer, .Select-value-label, .Select-option {
color: var(--font-color) !important;
background-color: var(--dropdown-bg) !important;
}

h1, h2, h3, h4, h5, h6 {
color: var(--font-color) !important;
}
185 changes: 185 additions & 0 deletions cache_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
import numpy as np

import json
import glob
import os


env_names = sorted([
'breakout',
'impulse_wars',
'pacman',
'tetris',
'g2048',
'moba',
'pong',
'tower_climb',
'grid',
'nmmo3',
'snake',
'tripletriad'
])

HYPERS = [
'train/learning_rate',
'train/ent_coef',
'train/gamma',
'train/gae_lambda',
'train/vtrace_rho_clip',
'train/vtrace_c_clip',
'train/clip_coef',
'train/vf_clip_coef',
'train/vf_coef',
'train/max_grad_norm',
'train/adam_beta1',
'train/adam_beta2',
'train/adam_eps',
'train/prio_alpha',
'train/prio_beta0',
'train/bptt_horizon',
'train/num_minibatches',
'train/minibatch_size',
'policy/hidden_size',
'env/num_envs',
]

ALL_KEYS = [
'agent_steps',
'cost',
'environment/score',
'environment/perf'
] + HYPERS

def pareto_idx(steps, costs, scores):
idxs = []
for i in range(len(steps)):
better = [scores[j] >= scores[i] and
costs[j] < costs[i] and steps[j] < steps[i]
for j in range(len(scores))]
if not any(better):
idxs.append(i)

return idxs

def load_sweep_data(path):
data = {}
keys = None
for fpath in glob.glob(path):
if 'cache.json' in fpath:
continue

with open(fpath, 'r') as f:
exp = json.load(f)

if not data:
for kk in exp.keys():
if kk == 'data':
for k, v in exp[kk][-1].items():
data[k] = []
else:
data[kk] = []

discard = False
for kk in list(data.keys()):
if kk not in exp and kk not in exp['data'][-1]:
discard = True
break

if discard:
continue

for kk in list(data.keys()):
if kk in exp:
v = exp[kk]
sweep_key = f'sweep/{kk}/distribution'
if sweep_key in data and exp[sweep_key] == 'logit_normal':
v = 1 - v
elif kk in ('train/vtrace_rho_clip', 'train/vtrace_c_clip'):
v = max(v, 0.1)

data[kk].append(v)
else:
data[kk].append(exp['data'][-1][kk])

steps = data['agent_steps']
costs = data['cost']
scores = data['environment/score']

idxs = pareto_idx(steps, costs, scores)

# Filter to pareto
for k in data:
data[k] = [data[k][i] for i in idxs]

# Monkey patch: Cap performance
data['environment/perf'] = [min(e, 1.0) for e in data['environment/perf']]

# Monkey patch: Adjust steps by frameskip if present
if 'env/frameskip' in data:
skip = data['env/frameskip']
data['agent_steps'] = [n*m for n, m in zip(data['agent_steps'], skip)]

return data

def cached_sweep_load(path, env_name):
cache_file = os.path.join(path, 'c_cache.json')
if not os.path.exists(cache_file):
data = load_sweep_data(os.path.join(path, '*.json'))
with open(cache_file, 'w') as f:
json.dump(data, f)

with open(cache_file, 'r') as f:
data = json.load(f)

print(f'Loaded {env_name}')
return data

def compute_tsne():
data = {name: cached_sweep_load(f'experiments/logs/puffer_{name}', name) for name in env_names}

flat = []
flat_mmin = []
flat_mmax = []
for env in env_names:
flat.append(np.stack([data[env][hyper] for hyper in HYPERS], axis=1))
flat_mmin.append(np.stack([data[env][f'sweep/{hyper}/min'] for hyper in HYPERS], axis=1))
flat_mmax.append(np.stack([data[env][f'sweep/{hyper}/max'] for hyper in HYPERS], axis=1))

flat_distribution = [data[env][f'sweep/{hyper}/distribution'] for env in env_names for hyper in HYPERS]

flat = np.concatenate(flat, axis=0)
flat_mmin = np.concatenate(flat_mmin, axis=0).min(axis=0)
flat_mmax = np.concatenate(flat_mmax, axis=0).max(axis=0)

normed = flat.copy()
for i in range(len(HYPERS)):
dist = flat_distribution[i]
if 'log' in dist or 'pow2' in dist:
flat_mmin[i] = np.log(flat_mmin[i])
flat_mmax[i] = np.log(flat_mmax[i])
normed[:, i] = np.log(flat[:, i])

normed[:, i] = (normed[:, i] - flat_mmin[i]) / (flat_mmax[i] - flat_mmin[i])

from sklearn.manifold import TSNE
proj = TSNE(n_components=2)
reduced = proj.fit_transform(normed)

row = 0
for env in env_names:
'''
for i, hyper in enumerate(HYPERS):
sz = len(data[env][hyper])
data[env][hyper] = normed[row:row+sz, i].tolist()
'''
sz = len(data[env]['agent_steps'])

data[env] = {k: v for k, v in data[env].items() if k in ALL_KEYS}
data[env]['tsne1'] = reduced[row:row+sz, 0].tolist()
data[env]['tsne2'] = reduced[row:row+sz, 1].tolist()
row += sz

json.dump(data, open('all_cache.json', 'w'))

if __name__ == '__main__':
compute_tsne()
Loading
Loading