Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,87 @@ All of our documentation is hosted at [puffer.ai](https://puffer.ai "PufferLib D
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=pufferai/pufferlib&type=Date" />
</picture>
</a>

## Installing required software
This project makes heavy use of C to speed up operations in Python so packages for building native exensions are required.
```bash
sudo apt update && sudo apt install -y git curl wget nano software-properties-common build-essential python3-dev
```

Make sure that you have all of your Nvidia drivers configured correctly so that your GPU can be used for accelerating the RL training. One important thing is having NVCC which is a cuda compiler, installed to enable better performance by compiling some PufferLib speciic kernels but this is optional. You can check if nvcc is installed by running.
```bash
nvcc -V
```

If nvcc is missing or you are missing some nvidia drivers you can install them using the command below. Here are [alternative installtion instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) for the Cuda Toolkit.
```bash
sudo apt -y install nvidia-cuda-toolkit
```

UV is the prefferred package manager for this project but you are free to just use pip and suffer. [alternative installtion instructions](https://docs.astral.sh/uv/getting-started/installation/)
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

The preferred compiler toolchain for PufferLib is the latest stable version of clang. You can also download the executable directly from [LLVM releases page](https://releases.llvm.org/)
```bash
bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
```

On Ubuntu clang is installed as under clang-|version-number| e.g. clang-20 but this project expects the executable to be named clang so it's best to introduce and alias in ~/.bashrc or another file that is preloaded in your preferred shell.
```bash
alias clang="clang-20"
```
and reload the file
```bash
source ~/.bashrc
```

## Running your first Reinforcement Learning experiment
Now that you have all of the required software to start experimenting with PufferLib

```bash
git clone https://github.com/PufferAI/PufferLib.git && cd PufferLib
```

Now you can create a virtual environment and activate it.
```bash
uv venv && source .venv/bin/activate
```

Install the local packages. This also installs in this folder the appropriate version of [Raylib](https://www.raylib.com/) which is a minimalistic library for building video games in C and [Box2D](https://box2d.org/) which is physics engine for 2D games. This can take a while because the cuda dependencies are large (over 1 Gb) and compilation of the custom kernels if enabled also takes some time.
```bash
uv pip install -e .
```

Now you can compile the first RL environemnt. The build_ocean.sh script is used for building Ocean RL environements which is a PufferLib native framework. "target" is the name of the environemnt. You can view all of the environment files as well as other Ocean environments at pufferlib/ocean/target. The environment is configured by a .ini config file which specifies the name, RL policy and training configuration, the one for the target env is located at config/ocean/target.ini. "local" is the type of the build. Local builds contain debug symbols, use an address sanitizer and allow you to verify that the environment works as you intend, a production version can be compiled using the "fast" build. You can also build a "web" version of the env which will generate an html page with WebAssembly.


```bash
scripts/build_ocean.sh target local
```

and then you can run the created executable which demonstrates the environment.
```bash
./target
```

If you see pufferfish chasing the stars it means that everything works correctly. This demo loads a neural net that has already been trained before. This is why the fish chase the stars rather than bounce around aimlessly which they would do if the weights were selected at random.

## Training your first neural net using Reinforcement Learning
In order to train the env we need to use the puffer train command and use the env_name from the config, rather than the file name based one that was used for building.
```bash
puffer train puffer_target
```

After you begin training you should see how over time the policy_loss begins decreasing. This means that the fish are getting better at obtaining the reward which comes from eating the stars. Another observation is that the episode_length decreases which means that over time the fish eat all of the stars faster because they get better at running to ther closest one. One final observation is that the explained_variance is increasing which means that the policy is responsible for a higher fraction of the variance in the environment. In other words the situation in the environment becomes less random and more dependent on the trained policy because the fish get better at following the stars.

Now you can export your learned weights so that they can be used in the environment demo. This command exports the latest version by default from the specified environemnt from the experiments/ folder. If you wish to export a specific one use "--load_model_path" option.

```bash
puffer export puffer_target
```

This should generate a puffer_target_weights.bin file which contains all of the learned weights for the neural net for this environment. Now you can see how these weights behave in real life. You need to edit the load path for the weights for the demo at pufferlib/ocean/target/target.c at line 21 from "resources/target/target_weights.bin" to "puffer_target_weights.bin" which you trained. After recompiling the evironment you should notice that the fish are somewhat dumber than the ones which come by default for this env. You can now try change the config (e.g. training for longer or changing other training params) and see how that impacts the behavior.

Just as a side note the pufferlib/ocean/target/target.c is just a demonstration that allows you to see how your weights behave, it's not required for training. All of the code that is required to train the env is located in pufferlib/ocean/target/target.h (the C code) pufferlib/ocean/target/binding.c (some C code exposed to Python) and pufferlib/ocean/target/target.py (Python env that calls functions from binding.c, target.h and PufferLib)
14 changes: 10 additions & 4 deletions pufferlib/pufferl.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
# Assume advantage kernel has been built if torch has been compiled with CUDA or HIP support
# and can find CUDA or HIP in the system
ADVANTAGE_CUDA = bool(CUDA_HOME or ROCM_HOME)
HELP_MESSAGE = 'Usage: puffer [train, eval, sweep, autotune, profile, export] [env_name] [optional args].\nYou can access help for specific command and environment using: puffer [train, eval, sweep, autotune, profile, export] [env_name] --help e.g. puffer train puffer_target --help'

class PuffeRL:
def __init__(self, config, vecenv, policy, logger=None):
Expand Down Expand Up @@ -1195,7 +1196,11 @@ def load_config(env_name, parser=None):
p.read([puffer_default_config, path])
if env_name in p['base']['env_name'].split(): break
else:
raise pufferlib.APIUsageError('No config for env_name {}'.format(env_name))
if env_name=="--help":
print(HELP_MESSAGE)
exit(0)
else:
raise pufferlib.APIUsageError('No config for env_name {}'.format(env_name))

return process_config(p, parser=parser)

Expand Down Expand Up @@ -1285,9 +1290,9 @@ def auto_type(value):
return args

def main():
err = 'Usage: puffer [train, eval, sweep, autotune, profile, export] [env_name] [optional args]. --help for more info'
if len(sys.argv) < 3:
raise pufferlib.APIUsageError(err)
print(HELP_MESSAGE)
return

mode = sys.argv.pop(1)
env_name = sys.argv.pop(1)
Expand All @@ -1304,7 +1309,8 @@ def main():
elif mode == 'export':
export(env_name=env_name)
else:
raise pufferlib.APIUsageError(err)
print(HELP_MESSAGE)
return

if __name__ == '__main__':
main()
2 changes: 1 addition & 1 deletion scripts/build_ocean.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/bash -i
Copy link

@elevatorguy elevatorguy Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pufferlib aside, using dash by default - instead of bash - and running a compile or build script that relies upon .bashrc, for example oneapi initialization; thanks.


# Usage: ./build_env.sh pong [local|fast|web]

Expand Down
2 changes: 1 addition & 1 deletion scripts/build_simple.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/bash -i

# Usage: ./build.sh your_file.c [debug|release]

Expand Down