Skip to content

Conversation

@dcbartlett
Copy link
Collaborator

@dcbartlett dcbartlett commented Dec 24, 2025

Description

Adds a new test runner designed to evaluated tests over multiple configurable runs. This method allows the tests to be evaluated based on non-deterministic output of the test results. The Testing matrix can also be configured with different suite specific variables and will run the suite with multiple sets of variables to "evaluate" performance based on different inputs.

Adds a new test suite that utilizes the new test runner for apply-diff tests.

Test Procedure

cd apps/vscode-evals && pnpm i && pnpm test:run

Pre-Submission Checklist

  • [-] Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • [-] Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Important

Introduces evally test runner for non-deterministic outputs and matrix configurations, adding apply-diff test suite and supporting configurations in vscode-evals.

  • Behavior:
    • Introduces evally test runner for non-deterministic test outputs and configurable test matrices.
    • Adds apply-diff test suite using evally in applyDiff.matrix.test.ts.
    • Supports multiple test iterations and variable configurations.
  • Configuration:
    • Adds .env.local.sample, .vscode-test.mjs, eslint.config.mjs, and tsconfig.esm.json for vscode-evals.
    • Updates package.json scripts for testing and building in vscode-evals.
    • Modifies knip.json to ignore vscode-evals.
  • Implementation:
    • Implements runTest.ts to execute tests using @vscode/test-electron.
    • Defines matrix tests in applyDiff.matrix.test.ts and sampleMatrix.test.ts.
    • Adds standaloneRunner.ts for running matrix tests from CLI.
    • Provides utility functions in utils.ts for test execution.
  • Package:
    • Adds @roo-code/evally package with TestMatrixRunner and types for matrix testing.
    • Configures evally package with build and test scripts in package.json.

This description was created by Ellipsis for 032d4a5. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Enhancement New feature or request labels Dec 24, 2025
@roomote
Copy link
Contributor

roomote bot commented Dec 24, 2025

Rooviewer Clock   See task on Roo Cloud

All issues from previous reviews have been addressed. No new issues found in the latest commit.

  • Remove unused suiteJsonStats variable in packages/evally/src/runner/TestMatrixRunner.ts
  • Add missing rimraf dependency to apps/vscode-evals/package.json
  • Fix incorrect exports in packages/evally/package.json (types export points to .d.ts for import/require)
  • Verify model ID openai/gpt-5.1 is intentional or update to a valid model (dismissed by author)
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

await resetTestFile(file)
}
}
const waitFor = async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The waitFor and sleep utility functions are defined here and also re‐implemented in other files (e.g. in src/suite/index.ts). Consider extracting these functions into a shared utility module to avoid code duplication.

This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

3 participants