Skip to content

Conversation

@jharris1679
Copy link

What is sniffbench?

sniffbench is a benchmark suite for evaluating AI coding agents—like pytest, but for AI assistants.

When you change your AI coding setup (switching models, adding MCP servers, updating prompts), you're flying blind. Did it get better? Worse? sniffbench gives you that data.

Key Features

  • A/B test configurations — Register variants, run evaluations, compare results
  • Real-world evaluation cases — Use closed issues from your actual repos as test cases
  • Track what matters — Token usage, cost, tool efficiency, cache hit ratio
  • Multi-agent support — Works with Claude Code, with Cursor/Aider support planned

Links

Thanks for considering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant