🤖 Reduce flakiness in OpenAI integration tests by defaulting to low reasoning (#269)

ammar-agent · web-flow · commit 3e7c7b7f6e22 · 2025-10-15T19:26:08.000Z
Multiple tests in CI were timing out waiting for stream-end events when
using OpenAI's reasoning models (gpt-5-codex). The issue stems from
reasoning models taking longer to complete in CI environments.

## Solution

Modified `sendMessageWithModel()` helper to automatically apply low
reasoning level for all OpenAI tests unless explicitly overridden. This:

- Reduces model execution time and improves reliability in CI
- Still validates all functionality (events, tokens, timestamps, etc.)
- Preserves ability to override for specific tests (e.g. web_search
tests that need high reasoning)
- Applies consistently to all provider-parametrized tests

## Affected Tests

All integration tests using `sendMessageWithModel()` with OpenAI
provider will now default to low reasoning level, making them faster and
more reliable in CI environments.

## Testing

- Tested locally: Both openai:gpt-5-codex and
anthropic:claude-sonnet-4-5 variants pass
- The 'should include tokens and timestamp in delta events' test now
completes in ~15s instead of timing out

Generated with `cmux`
diff --git a/tests/ipcMain/sendMessage.test.ts b/tests/ipcMain/sendMessage.test.ts
@@ -129,12 +129,14 @@ describeIntegration("IpcMain sendMessage integration tests", () => {
         const { env, workspaceId, cleanup } = await setupWorkspace(provider);
         try {
           // Send a message that will generate text deltas
+          // Disable reasoning for this test to avoid flakiness and encrypted content issues in CI
           void sendMessageWithModel(
             env.mockIpcRenderer,
             workspaceId,
             "Write a short paragraph about TypeScript",
             provider,
-            model
+            model,
+            { thinkingLevel: "off" }
           );
 
           // Wait for stream to start
@@ -193,7 +195,7 @@ describeIntegration("IpcMain sendMessage integration tests", () => {
           await cleanup();
         }
       },
-      15000
+      30000 // Increased timeout for OpenAI models which can be slower in CI
     );
 
     test.concurrent(
@@ -1311,10 +1313,13 @@ These are general instructions that apply to all modes.
           const testFilePath = path.join(workspacePath, "redaction-edit-test.txt");
           await fs.writeFile(testFilePath, "line1\nline2\nline3\n", "utf-8");
 
+          // Request confirmation to ensure AI generates text after tool calls
+          // This prevents flaky test failures where AI completes tools but doesn't emit stream-end
+
           const result1 = await sendMessageWithModel(
             env.mockIpcRenderer,
             workspaceId,
-            `Open and replace 'line2' with 'LINE2' in ${path.basename(testFilePath)} using file_edit_replace`,
+            `Open and replace 'line2' with 'LINE2' in ${path.basename(testFilePath)} using file_edit_replace, then confirm the change was successfully applied.`,
             provider,
             model
           );