-
Notifications
You must be signed in to change notification settings - Fork 14
Add map_sources option for mixing maps from multiple directories #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.0
Are you sure you want to change the base?
Conversation
Allows training with maps from multiple sources with configurable weights. Format: "path1:weight1,path2:weight2,..." Example: --env.map-sources "resources/drive/binaries/training:0.75,resources/drive/binaries/carla:0.25" This creates a temp directory with symlinks to maps from each source, sampled according to the specified weights. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Greptile SummaryThis PR adds a Key Changes:
Issues Identified:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Drive.__init__
participant create_mixed_map_directory
participant Filesystem
participant binding.shared
User->>Drive.__init__: Initialize with map_sources param
alt map_sources provided
Drive.__init__->>create_mixed_map_directory: Parse map_sources_str, num_maps
create_mixed_map_directory->>create_mixed_map_directory: Parse and normalize weights
create_mixed_map_directory->>create_mixed_map_directory: Calculate map counts per source
create_mixed_map_directory->>Filesystem: Create temp directory
loop For each source
create_mixed_map_directory->>Filesystem: List .bin files
create_mixed_map_directory->>create_mixed_map_directory: Sample maps (with/without replacement)
end
create_mixed_map_directory->>create_mixed_map_directory: Shuffle selected maps
loop For each selected map
create_mixed_map_directory->>Filesystem: Create symlink in temp dir
end
create_mixed_map_directory->>Filesystem: Register atexit cleanup handler
create_mixed_map_directory-->>Drive.__init__: Return temp_dir path
Drive.__init__->>Drive.__init__: Update self.map_dir to temp_dir
else map_dir provided
Drive.__init__->>Filesystem: Check map_dir exists
Drive.__init__->>Filesystem: Count available .bin files
Drive.__init__->>Drive.__init__: Validate num_maps vs available
end
Drive.__init__->>binding.shared: Initialize with map_dir, num_maps
binding.shared-->>Drive.__init__: Return agent_offsets, map_ids, num_envs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 3 comments
| map_counts.append(count) | ||
|
|
||
| # Create temp directory | ||
| temp_dir = tempfile.mkdtemp(prefix="pufferdrive_maps_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Temp directory created but never cleaned up - will accumulate in /tmp over multiple runs. Consider using tempfile.TemporaryDirectory() context manager or add cleanup in the close() method
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 54:54
Comment:
**logic:** Temp directory created but never cleaned up - will accumulate in `/tmp` over multiple runs. Consider using `tempfile.TemporaryDirectory()` context manager or add cleanup in the `close()` method
How can I resolve this? If you propose a fix, please make it concise.
pufferlib/ocean/drive/drive.py
Outdated
| Path to temp directory containing symlinked maps | ||
| """ | ||
| if seed is not None: | ||
| random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Using module-level random.seed() affects global random state - may cause unexpected behavior if other code uses random module. Use random.Random(seed) instance instead
rng = random.Random(seed) if seed is not None else random.Random()
# Then use rng.choices() and rng.shuffle() insteadPrompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 27:27
Comment:
**logic:** Using module-level `random.seed()` affects global random state - may cause unexpected behavior if other code uses `random` module. Use `random.Random(seed)` instance instead
```python
rng = random.Random(seed) if seed is not None else random.Random()
# Then use rng.choices() and rng.shuffle() instead
```
How can I resolve this? If you propose a fix, please make it concise.Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
tests/test_map_sources.py, line 94-113 (link)style: Test is non-deterministic due to lack of seeding in
create_mixed_map_directory(). The assertion on line 112-113 could fail if random sampling produces different distributions. Consider either:- Adding seed support to the function and seeding this test
- Using a tolerance-based check instead of exact equality
4 files reviewed, 3 comments
| sampled = random.sample(available, k=count) | ||
| else: | ||
| sampled = random.choices(available, k=count) | ||
| for map_file in sampled: | ||
| selected_maps.append(os.path.join(path, map_file)) | ||
|
|
||
| # Shuffle to mix sources | ||
| random.shuffle(selected_maps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing seed parameter for reproducibility - random.sample(), random.choices(), and random.shuffle() use global random state. Different runs with same config will produce different map selections.
Consider adding a seed parameter to create_mixed_map_directory() and using random.Random(seed) instance:
def create_mixed_map_directory(map_sources_str, num_maps, seed=None):
rng = random.Random(seed) if seed is not None else random.Random()
# Then use rng.sample(), rng.choices(), rng.shuffle()Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 64:71
Comment:
**logic:** Missing seed parameter for reproducibility - `random.sample()`, `random.choices()`, and `random.shuffle()` use global random state. Different runs with same config will produce different map selections.
Consider adding a `seed` parameter to `create_mixed_map_directory()` and using `random.Random(seed)` instance:
```python
def create_mixed_map_directory(map_sources_str, num_maps, seed=None):
rng = random.Random(seed) if seed is not None else random.Random()
# Then use rng.sample(), rng.choices(), rng.shuffle()
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| map_dir = create_mixed_map_directory(map_sources, num_maps) | ||
| self.map_dir = map_dir # Update instance variable to point to temp directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Need to pass seed to create_mixed_map_directory() for reproducible map selection:
| map_dir = create_mixed_map_directory(map_sources, num_maps) | |
| self.map_dir = map_dir # Update instance variable to point to temp directory | |
| map_dir = create_mixed_map_directory(map_sources, num_maps, seed) | |
| self.map_dir = map_dir # Update instance variable to point to temp directory |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 212:213
Comment:
**logic:** Need to pass seed to `create_mixed_map_directory()` for reproducible map selection:
```suggestion
map_dir = create_mixed_map_directory(map_sources, num_maps, seed)
self.map_dir = map_dir # Update instance variable to point to temp directory
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
map_sourcesconfig option for training with maps from multiple directories"path1:weight1,path2:weight2,..."Example Usage
puffer train puffer_drive --env.map-sources "resources/drive/binaries/training:0.75,resources/drive/binaries/carla:0.25"Output:
Test plan
🤖 Generated together with Claude Code