diff --git a/README.md b/README.md index dc1bfaa81..11ef6e405 100644 --- a/README.md +++ b/README.md @@ -21,3 +21,87 @@ All of our documentation is hosted at [puffer.ai](https://puffer.ai "PufferLib D Star History Chart + +## Installing required software +This project makes heavy use of C to speed up operations in Python so packages for building native exensions are required. +```bash +sudo apt update && sudo apt install -y git curl wget nano software-properties-common build-essential python3-dev +``` + +Make sure that you have all of your Nvidia drivers configured correctly so that your GPU can be used for accelerating the RL training. One important thing is having NVCC which is a cuda compiler, installed to enable better performance by compiling some PufferLib speciic kernels but this is optional. You can check if nvcc is installed by running. +```bash +nvcc -V +``` + +If nvcc is missing or you are missing some nvidia drivers you can install them using the command below. Here are [alternative installtion instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) for the Cuda Toolkit. +```bash +sudo apt -y install nvidia-cuda-toolkit +``` + +UV is the prefferred package manager for this project but you are free to just use pip and suffer. [alternative installtion instructions](https://docs.astral.sh/uv/getting-started/installation/) +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +The preferred compiler toolchain for PufferLib is the latest stable version of clang. You can also download the executable directly from [LLVM releases page](https://releases.llvm.org/) +```bash +bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)" +``` + +On Ubuntu clang is installed as under clang-|version-number| e.g. clang-20 but this project expects the executable to be named clang so it's best to introduce and alias in ~/.bashrc or another file that is preloaded in your preferred shell. +```bash +alias clang="clang-20" +``` +and reload the file +```bash +source ~/.bashrc +``` + +## Running your first Reinforcement Learning experiment +Now that you have all of the required software to start experimenting with PufferLib + +```bash +git clone https://github.com/PufferAI/PufferLib.git && cd PufferLib +``` + +Now you can create a virtual environment and activate it. +```bash +uv venv && source .venv/bin/activate +``` + +Install the local packages. This also installs in this folder the appropriate version of [Raylib](https://www.raylib.com/) which is a minimalistic library for building video games in C and [Box2D](https://box2d.org/) which is physics engine for 2D games. This can take a while because the cuda dependencies are large (over 1 Gb) and compilation of the custom kernels if enabled also takes some time. +```bash +uv pip install -e . +``` + +Now you can compile the first RL environemnt. The build_ocean.sh script is used for building Ocean RL environements which is a PufferLib native framework. "target" is the name of the environemnt. You can view all of the environment files as well as other Ocean environments at pufferlib/ocean/target. The environment is configured by a .ini config file which specifies the name, RL policy and training configuration, the one for the target env is located at config/ocean/target.ini. "local" is the type of the build. Local builds contain debug symbols, use an address sanitizer and allow you to verify that the environment works as you intend, a production version can be compiled using the "fast" build. You can also build a "web" version of the env which will generate an html page with WebAssembly. + + +```bash +scripts/build_ocean.sh target local +``` + +and then you can run the created executable which demonstrates the environment. +```bash +./target +``` + +If you see pufferfish chasing the stars it means that everything works correctly. This demo loads a neural net that has already been trained before. This is why the fish chase the stars rather than bounce around aimlessly which they would do if the weights were selected at random. + +## Training your first neural net using Reinforcement Learning +In order to train the env we need to use the puffer train command and use the env_name from the config, rather than the file name based one that was used for building. +```bash +puffer train puffer_target +``` + +After you begin training you should see how over time the policy_loss begins decreasing. This means that the fish are getting better at obtaining the reward which comes from eating the stars. Another observation is that the episode_length decreases which means that over time the fish eat all of the stars faster because they get better at running to ther closest one. One final observation is that the explained_variance is increasing which means that the policy is responsible for a higher fraction of the variance in the environment. In other words the situation in the environment becomes less random and more dependent on the trained policy because the fish get better at following the stars. + +Now you can export your learned weights so that they can be used in the environment demo. This command exports the latest version by default from the specified environemnt from the experiments/ folder. If you wish to export a specific one use "--load_model_path" option. + +```bash +puffer export puffer_target +``` + +This should generate a puffer_target_weights.bin file which contains all of the learned weights for the neural net for this environment. Now you can see how these weights behave in real life. You need to edit the load path for the weights for the demo at pufferlib/ocean/target/target.c at line 21 from "resources/target/target_weights.bin" to "puffer_target_weights.bin" which you trained. After recompiling the evironment you should notice that the fish are somewhat dumber than the ones which come by default for this env. You can now try change the config (e.g. training for longer or changing other training params) and see how that impacts the behavior. + +Just as a side note the pufferlib/ocean/target/target.c is just a demonstration that allows you to see how your weights behave, it's not required for training. All of the code that is required to train the env is located in pufferlib/ocean/target/target.h (the C code) pufferlib/ocean/target/binding.c (some C code exposed to Python) and pufferlib/ocean/target/target.py (Python env that calls functions from binding.c, target.h and PufferLib) diff --git a/pufferlib/pufferl.py b/pufferlib/pufferl.py index 5fe9c14a2..502aabf67 100644 --- a/pufferlib/pufferl.py +++ b/pufferlib/pufferl.py @@ -53,6 +53,7 @@ # Assume advantage kernel has been built if torch has been compiled with CUDA or HIP support # and can find CUDA or HIP in the system ADVANTAGE_CUDA = bool(CUDA_HOME or ROCM_HOME) +HELP_MESSAGE = 'Usage: puffer [train, eval, sweep, autotune, profile, export] [env_name] [optional args].\nYou can access help for specific command and environment using: puffer [train, eval, sweep, autotune, profile, export] [env_name] --help e.g. puffer train puffer_target --help' class PuffeRL: def __init__(self, config, vecenv, policy, logger=None): @@ -1195,7 +1196,11 @@ def load_config(env_name, parser=None): p.read([puffer_default_config, path]) if env_name in p['base']['env_name'].split(): break else: - raise pufferlib.APIUsageError('No config for env_name {}'.format(env_name)) + if env_name=="--help": + print(HELP_MESSAGE) + exit(0) + else: + raise pufferlib.APIUsageError('No config for env_name {}'.format(env_name)) return process_config(p, parser=parser) @@ -1285,9 +1290,9 @@ def auto_type(value): return args def main(): - err = 'Usage: puffer [train, eval, sweep, autotune, profile, export] [env_name] [optional args]. --help for more info' if len(sys.argv) < 3: - raise pufferlib.APIUsageError(err) + print(HELP_MESSAGE) + return mode = sys.argv.pop(1) env_name = sys.argv.pop(1) @@ -1304,7 +1309,8 @@ def main(): elif mode == 'export': export(env_name=env_name) else: - raise pufferlib.APIUsageError(err) + print(HELP_MESSAGE) + return if __name__ == '__main__': main() diff --git a/scripts/build_ocean.sh b/scripts/build_ocean.sh index 88909d44f..255478fe0 100755 --- a/scripts/build_ocean.sh +++ b/scripts/build_ocean.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/bash -i # Usage: ./build_env.sh pong [local|fast|web] diff --git a/scripts/build_simple.sh b/scripts/build_simple.sh index 8d0711370..a65d4ebe3 100644 --- a/scripts/build_simple.sh +++ b/scripts/build_simple.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/bash -i # Usage: ./build.sh your_file.c [debug|release]