Python analysis template

⚙️ Project setup

This repository follows a standard setup I use for data science projects, which includes:

A research compendium layout, including a local Python package (see File Structure).
Visual Studio Code (VSC) as the preferred IDE, with recommended extensions.
A VS Code Dev Container, powered by Docker, as a reproducible development environment (using a Debian image).
pre-commit to manage git hooks.
Python tooling:
- Black for code formatting (pre-commit and VSC extension). In addition, I mostly follow the Google style guide.
- Ruff (pre-commit and VSC extension) for linting.
- mypy for type checking (VSC extension).
- uv to compile requirements.
- pdoc to generate API documentation (including a pre-commit hook for generating a local documentation). Python docstrings are written following the Google docstring format and with the help of the autoDocstring VSC extension.
- pytest for testing, with doctest enabled.
- Automatic versioning of the local package from git tags via setuptools_scm, following semantic versioning.
SQLFluff as a formatter and linter for SQL files (pre-commit and VSC extension).
prettier (VSC extension) as a formatter for YAML, JSON and Markdown files.
markdownlint (VSC extension) as a linter for Markdown files.
Taplo (VSC extension) as a formatter for TOML files.
shfmt (VSC extension) as a formatter for shell scripts.
SonarLint (VSC extension) as an additional multi-language linter.
typos (VSC extension) as a code spell checker.
A Makefile to provide an interface to common tasks (see Make commands).
Conventional commit messages (enforced by pre-commit).

🗂️ File structure

.
├── analysis/                  # Analysis scripts and notebooks
├── data/                      # Data files (usually git ignored)
├── docs/                      # API documentation (git ignored)
├── results/                   # Output files: figures, tables, etc. (git ignored)
├── scripts/                   # Utility scripts (e.g. env setup)
├── src/                       # Local Python package
│   ├── __init__.py
│   └── config.py              # Configs, constants, settings
├── tests/                     # Unit tests for src/
│   └── test_*.py
├── .devcontainer/             # VS Code Dev container setup
├── .vscode/                   # VS Code settings and extensions
├── Dockerfile                 # Dockerfile used for dev container
├── Makefile                   # Utility commands (docs, env, versioning)
├── pyproject.toml             # Configs for package, tools (Ruff, mypy, etc.) and direct deps
├── requirements.txt           # Pinned dependencies (generated)
├── taplo.toml                 # Configs for TOML formatter
├── .editorconfig              # Configs for Shell formatter
├── .pre-commit-config.yaml    # Configs for pre-commit
├── .sqlfluff                  # Configs for SQLFluff

🐳 Development environment

The preferred development environment for this project is a VS Code Dev Container, which provides a consistent and reproducible setup using Docker.

Install and launch Docker.
Install and open the project in VS Code.
Open the container by using the command palette in VS Code (Ctrl + Shift + P) to search for "Dev Containers: Open Folder in Container...".

The dependencies specified in requirements.txt are automatically installed in the container and the local package is available in editable mode. If needed, the container can be rebuilt by searching for "Dev Containers: Rebuild Container...".

For more details regarding Dev Containers, or alternative environment setups (venv, Conda, etc.), please refer to DEVELOPMENT.md.

Regardless of the environment, install Git hooks after setup with pre-commit install to ensure the code is automatically linted and formatted on commit.

📦 Managing requirements

Requirements are managed with:

pyproject.toml to list direct dependencies of the src package and development dependencies (e.g. for the analysis).
requirements.txt to pin all dependencies (direct and indirect). This file is automatically generated with uv and is used to fully recreate the environment.

⚠️ The local package (src) is not included in requirements.txt, so installation is a two-step process.

Workflow

Initial setup or adding new direct dependencies:
1. Add dependencies to pyproject.toml.
2. Run make reqs to compile requirements.txt.
Upgrading packages: compile new requirements with uv pip compile pyproject.toml -o requirements.txt --all-extras --upgrade, then make deps.

Finally, run make deps to install pinned dependencies and the local package in editable mode.

🛠️ Make commands

Common utility commands are available via the Makefile, including:

make reqs: Compile requirements.txt from pyproject.toml.
make deps: Install requirements and the local package.
make docs: Generate the package documentation.
make tag: Create and push a new Git tag by incrementing the version.
make venv: Set up a venv environment (see DEVELOPMENT.md).

The full list of targets can be listed with make help.

🧰 Using the template

Delete this section after initialising a project from the template.

This template aims to be relatively lightweight and tailored to my needs. It is therefore opinionated and also in constant evolution, reflecting with data science journey with Python. It is also worth noting that this template is more focused on experimentation rather than sharing a single final product.

Getting started

Initialise your GitHub repository with this template. Alternatively, fork (or copy the content of) this repository.
Update
- project metadata in pyproject.toml, such as the description and the authors.
- the repository name (if the template was forked).
- the README (title, badges, sections).
- the license.
Set up your preferred development environment (see Development Environment).
Specify, compile and install your requirements (see Managing requirements).
Adjust the configurations to your needs (e.g. Python configuration in src/config.py, the SQL dialect in .sqlfluff, etc.).
Add a git tag for the initial version with git tag -a v0.1.0 -m "Initial setup", and push it with git push origin --tags. Alternatively, use make tag.
(Optional) Update pre-commit with pre-commit autoupdate.

Possible extensions

Suggested modules for the local package

The src/ package could contain the following modules or sub-packages depending on the project:

utils for utility functions.
data_processing, data or datasets for data processing functions.
features for extracting features.
models for defining models.
evaluation for evaluating performance.
plots for plotting functions.

Additional directories

The repository structure could be extended with:

models/ to store model files.
subfolders in data/ such as data/raw/ for storing raw data.

Experiment tracking with MLflow

MLflow can be used as a tool to track Machine Learning experiments. Often, MLflow will be configured so that the results are saved on a remote database and artifact store. If this is not the case, the following can be added in src/config.py to set up a local storage for MLflow experiments:

MLRUNS_DIR = RES_DIR / "mlruns"
TRACKING_URI = "file:///" + MLRUNS_DIR.as_posix()
os.environ["MLFLOW_TRACKING_URI"] = TRACKING_URI

Then, the MLflow UI can be launched with:

mlflow ui --backend-store-uri file:///path/to/results/mlruns

For a slightly more elaborate setup running a MLflow server with a local database and artifact store as part of a DevContainer, see ghurault/mlflow-devcontainer.

Environment configuration via `.env`

Configurations, such as credentials, can be loaded from a .env file.

This can be achieved by mounting a .env file directly in the Dev Container, updating the runArgs option in .devcontainer/devcontainer.json accordingly.

Alternatively, one can use the python-dotenv package and add the following in src/config.py:

from dotenv import load_dotenv

load_dotenv()

Project documentation

A full project documentation (beyond the API) could be generated using mkdocs or quartodoc.

Continuous Integration

This template is not tied to a specific platform and does not include continuous integration workflows. Nevertheless, the template could be extended with the following integrations:

GitHub's Dependabot for dependency updates, or pip-audit.
Testing and code coverage.
Building and hosting documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python analysis template

⚙️ Project setup

🗂️ File structure

🐳 Development environment

📦 Managing requirements

Workflow

🛠️ Make commands

🧰 Using the template

Getting started

Possible extensions

Suggested modules for the local package

Additional directories

Experiment tracking with MLflow

Environment configuration via `.env`

Project documentation

Continuous Integration

Related

About

Uh oh!

Releases 5

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.devcontainer		.devcontainer
.vscode		.vscode
analysis		analysis
data		data
docs		docs
results		results
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sqlfluff		.sqlfluff
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
matplotlibrc		matplotlibrc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
taplo.toml		taplo.toml

License

ghurault/python-analysis-template

Folders and files

Latest commit

History

Repository files navigation

Python analysis template

⚙️ Project setup

🗂️ File structure

🐳 Development environment

📦 Managing requirements

Workflow

🛠️ Make commands

🧰 Using the template

Getting started

Possible extensions

Suggested modules for the local package

Additional directories

Experiment tracking with MLflow

Environment configuration via .env

Project documentation

Continuous Integration

Related

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors 2

Uh oh!

Languages

Environment configuration via `.env`