* fix: support special flow modules in evals Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: extract shared flow helper logic Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: make special flow tools openai-compatible Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: improve flow eval prompts and validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: relax flow benchmark overfits Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: record updated flow benchmark history Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address flow review findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: source flow chat special module prompt Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: narrow rawscript helper return type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: dedupe flow chat prompt guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: relax flow test10 validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>