Files
centdix d3cb0c6220 fix: improve flow chat and benchmark coverage (#8825)
* fix: support special flow modules in evals

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract shared flow helper logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: make special flow tools openai-compatible

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: improve flow eval prompts and validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: relax flow benchmark overfits

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: record updated flow benchmark history

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address flow review findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: source flow chat special module prompt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: narrow rawscript helper return type

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: dedupe flow chat prompt guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: relax flow test10 validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-15 16:22:39 +00:00
..