Compare commits

...

186 Commits

Author SHA1 Message Date
centdix
d3cb0c6220 fix: improve flow chat and benchmark coverage (#8825)
* fix: support special flow modules in evals

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract shared flow helper logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: make special flow tools openai-compatible

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: improve flow eval prompts and validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: relax flow benchmark overfits

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: record updated flow benchmark history

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address flow review findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: source flow chat special module prompt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: narrow rawscript helper return type

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: dedupe flow chat prompt guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: relax flow test10 validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-15 16:22:39 +00:00
Ruben Fiszel
a3f24aeff8 sqlx 2026-04-15 15:14:44 +00:00
centdix
f1e84cb088 chore: add backend preview validation to ai evals (#8827)
* feat: add backend preview validation to ai evals

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: refresh shared preview workspace assets

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: harden shared backend preview validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-15 15:11:25 +00:00
Ruben Fiszel
3aa279cfd7 nit tx commit cj 2026-04-15 12:05:11 +00:00
centdix
5c179e5448 fix: preserve gemini thought signatures in ai chat (#8837)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-15 11:49:57 +00:00
Diego Imbert
12d0a3de08 fix: parse assets on inline script module creation to avoid false toast (#8835)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:01:07 +00:00
Ruben Fiszel
a98f5b9dfd chore(main): release 1.684.1 (#8834)
* chore(main): release 1.684.1

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-14 21:48:03 +00:00
Ruben Fiszel
75e204dad1 pin tree-sitter 2026-04-14 21:36:00 +00:00
Ruben Fiszel
6158ff2ebe fix: stop escalating missing email recipients to critical alert (#8833)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-14 21:21:50 +00:00
Ruben Fiszel
8ee14644f4 chore(main): release 1.684.0 (#8831)
* chore(main): release 1.684.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-14 20:58:06 +00:00
Ruben Fiszel
f273341759 remove axios deps 2026-04-14 20:50:20 +00:00
hugocasa
64ba3a632e feat: cascade trigger script_path on runnable rename + fix trigger permissioned_as (#8823)
* feat: cascade trigger script_path updates on script/flow rename + fix trigger permissioned_as

Backend: When a script or flow path is renamed, automatically update script_path
across all trigger tables (http, email, kafka, websocket, postgres, mqtt, nats,
sqs, gcp, native). Long-running triggers get server_id reset to force restart.
Native triggers additionally get async webhook URL re-registration with external
services (Google, Nextcloud) via token rotation + handler.update().

Frontend: Fix permissioned_as handling across all trigger/schedule editors:
- Allow setting permissioned_as on trigger creation (not just edit) for admins
- Fix hasChanged detection for permissioned_as changes
- Fix FolderEditor group selector showing usernames instead of group names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename script_rename -> runnable_rename for consistency

"Runnable" is the correct term for both scripts and flows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove native trigger re-registration from runnable rename

Keep it simple — only update script_path in the DB for non-native triggers.
Native triggers require external service re-registration (token rotation +
webhook URL update) which adds significant complexity; defer to a future PR.

sqlx files for the updated CTE query need regenerating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* sqlx

* refactor: call update_triggers_script_path directly, remove windmill-trigger wrapper

No need for the extra module/dep — the common function is called directly
from scripts.rs and flows.rs with inline error mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reject empty principal in folder default permissioned_as validation

`u/` and `g/` (no name after prefix) were passing validation. Use regex
to require at least one character after the prefix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent async folder-default load from overwriting user's permissioned_as choice

Split the initialization effect into two: one that resets on trigger switch
(tracks permissionedAs), and one that handles folder default loading (tracks
folderDefault.value). The second effect is guarded by a userHasSelected flag
set in handleSelect, so a late-arriving folder default doesn't wipe the
user's explicit selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* lock

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:42:13 +00:00
Ruben Fiszel
aebf758412 fix: allow dedicated flow substeps to inherit parent tag (#8832)
Flow substeps that inherit the parent flow's tag were re-validated
against CUSTOM_TAGS, which rejected dedicated flow tags
(`{workspace_id}:flow/{path}`) since they are never user-registered.
The parent flow's tag was already validated at push time, so skip the
redundant check when the substep simply inherits it.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:38:37 +00:00
hugocasa
91064ce857 feat(frontend): improve permissions drawer UX and auto-share resource variables (#8824)
* feat: improve permissions drawer UX and auto-share resource variables

- When sharing a resource, automatically detect linked variables ($var: refs)
  and offer to apply the same permission changes via a toggle (on by default)
- Rename "Share" to "Permissions" across all dropdown menus (resources, variables,
  scripts, flows, apps, schedules, triggers)
- Replace Share icon with Shield icon for consistency
- Show default permissions (folder/user/group) as a separate section in the drawer
- Move item path into drawer title ("Permissions for {path}")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard async results against stale drawer state and null-safe extra_perms

- Add path staleness check in loadLinkedVarPaths and loadDefaultPerms
  to prevent late async responses from overwriting state when the drawer
  was reopened for a different item
- Use ?? {} fallback for folder.extra_perms which can be undefined

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:30:36 +00:00
Ruben Fiszel
2c1fe88fed fix ws_specific grant 2026-04-14 20:25:32 +00:00
Diego Imbert
7fe639d91e fix: hide serial types in column type dropdown for existing columns (#8828)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-14 20:17:51 +00:00
Diego Imbert
06fe809ecc fix: DB Manager delete/update for timestamp and serial types (#8830)
* Fix time(stamp)(tz) comparisons in pg_executor

* fix serial bug

* UPDATE and DELETE use primary key only
2026-04-14 20:17:39 +00:00
Diego Imbert
5069a3b2e3 Better S3 error context (#8829) 2026-04-14 20:17:28 +00:00
Diego Imbert
e1dbce02c2 fix: compute wall-clock duration for flow job groups in CLI (#8826)
The total duration of a for-loop/branchall group was computed as the
naive sum of all iteration durations. This is wrong for parallel
execution and doesn't account for orchestration overhead. Instead,
compute actual wall-clock time as max(completed_at) - min(started_at).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:16:50 +00:00
Ruben Fiszel
6bb80ff28b chore(main): release 1.683.2 (#8820)
* chore(main): release 1.683.2

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-14 00:23:20 +00:00
Ruben Fiszel
5b3913052e refactor: convert read-hot globals to AtomicBool/I64 and ArcSwap (#8815)
* refactor: extract load helpers from reload_setting family

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: convert atomic primitive globals to AtomicBool/AtomicI64

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: convert CRITICAL_*/HUB_API_SECRET/INSTANCE_EVENTS_WEBHOOK/JWT_SECRET to ArcSwap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: pin ee-repo-ref to arcswap-refactor EE branch commit

* refactor: convert BASE_URL/HUB_BASE_URL/MIN_VERSION/LICENSE_KEY*/LICENSE_KEY_ID to ArcSwap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: convert worker hot-path globals to ArcSwap (WORKER_CONFIG et al)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: pin ee-repo-ref to combined arcswap-urls+worker EE commit

* chore: update ee-repo-ref to d8be8f88cb8898c8f6b27421989d53528223815d

This commit updates the EE repository reference after PR #532 was merged in windmill-ee-private.

Previous ee-repo-ref: c375aaaac9ec0fc0480993627d0defc8054c31a4

New ee-repo-ref: d8be8f88cb8898c8f6b27421989d53528223815d

Automated by sync-ee-ref workflow.

* fix: cleanup unused imports + fix 2 missed WORKER_CONFIG readers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to ce0f8fbbbde09c4a858312d2d8716d224e99042c

This commit updates the EE repository reference after PR #534 was merged in windmill-ee-private.

Previous ee-repo-ref: 450b601b5aba0ca0b2045f4b5071aa8701b4bfb7

New ee-repo-ref: ce0f8fbbbde09c4a858312d2d8716d224e99042c

Automated by sync-ee-ref workflow.

* fix: secret_backend_integration test — BASE_URL.write().await → .store()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: convert APP_WORKSPACED_ROUTE to AtomicBool for symmetry with HTTP_ROUTE_WORKSPACED_ROUTE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to e587df8 (post-#535 merge)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-14 00:04:10 +00:00
Ruben Fiszel
4dc54ca3aa fix: persist indexer max_index_time_window_secs setting (#8821)
* fix: persist indexer max_index_time_window_secs setting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: toggle UX for indexer time window cap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:56:38 +00:00
Ruben Fiszel
89c8e4bb96 fix: detect WAC v2 Python workflows that only use step() (no @task) (#8819)
is_wac_v2_py required both @workflow and @task, so a workflow using
only inline step() calls fell through to the regular Python path and
returned the raw coroutine object instead of its awaited result. Match
the TS detector and accept @workflow alone.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:22:30 +00:00
Ruben Fiszel
eb85da932a chore(main): release 1.683.1 (#8817)
* chore(main): release 1.683.1

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-13 22:49:27 +00:00
Ruben Fiszel
f7f26b3224 fix: use OpenAPI 3.0 nullable pattern for getOpenDeploymentRequest (#8816)
The response schema used `oneOf: [$ref, {type: null}]` which is
OpenAPI 3.1 syntax, but the spec is declared as 3.0.3. Both
oapi-codegen (Go) and openapi-python-client rejected it, breaking
the client release jobs. Switched to the standard 3.0 pattern
(`nullable: true` + `allOf: [$ref]`), matching existing usage at
openapi.yaml:21410.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:41:59 +00:00
Ruben Fiszel
e0066b266f chore(main): release 1.683.0 (#8802)
* chore(main): release 1.683.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-13 22:21:47 +00:00
Ruben Fiszel
42d3e8c789 fix: enrich OTEL log records with per-request LogContext (#8812)
* fix: enrich OTEL log records with per-request LogContext

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add otlp_smoke example for manual OTEL log bridge verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 5d6b713b74fc46735807f5c32883002e8d976fbc

This commit updates the EE repository reference after PR #529 was merged in windmill-ee-private.

Previous ee-repo-ref: 45959d063bc941c567488d330b5819601cdd2d3d

New ee-repo-ref: 5d6b713b74fc46735807f5c32883002e8d976fbc

Automated by sync-ee-ref workflow.

* refactor: store LogContext in ArcSwap instead of Mutex

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: pin ee-repo-ref to ArcSwap branch commit

* chore: update ee-repo-ref to be2f3d4d11bb7110200524d7157caab3aac53996

This commit updates the EE repository reference after PR #530 was merged in windmill-ee-private.

Previous ee-repo-ref: 45b4d7963a9ebcd583d1a87abe7d07d3d521584a

New ee-repo-ref: be2f3d4d11bb7110200524d7157caab3aac53996

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-13 21:50:50 +00:00
centdix
c889a185d5 refactor: extract flow delete helpers (#8746)
* refactor: extract flow delete helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: unify flow delete planning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: stabilize flow delete execution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: simplify flow delete plan execution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-13 20:35:01 +00:00
Ruben Fiszel
baeb202037 nit npm check 2026-04-13 20:31:28 +00:00
hugocasa
9fb78164b4 feat: allow non-admins to create and edit HTTP triggers (#8810)
* feat: allow non-admin users to create HTTP triggers with forced workspaced routes

Non-admin users can now create and fully edit HTTP triggers, but are forced
to use workspaced routes (workspace-prefixed URLs). Instance-wide routes
remain admin-only to prevent cross-workspace URL conflicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing RLS INSERT/DELETE policies for http_trigger table

Non-admin users were blocked by row-level security when creating HTTP triggers.
Added INSERT, DELETE, see_own, and see_member policies matching other trigger tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: allow user paths for HTTP triggers

Remove the hideUser restriction on the Path component so HTTP triggers
can be created under user paths (u/username/...) in addition to folder paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove added note from instance settings description

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow non-admins to edit non-workspaced routes without changing route config

Non-admins can now open and edit existing non-workspaced HTTP triggers
(created by admins) as long as they don't modify route_path, http_method,
or workspaced_route. The workspaced prefix is only forced on new triggers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow non-admins to change route_path on workspaced routes

The prevent_route_path_change DB trigger blocked all route_path changes
for windmill_user, even on workspaced routes. Now only instance-wide
(non-workspaced) routes are protected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add explicit GRANT and force workspaced routes in OpenAPI generator

- Add explicit GRANT INSERT, DELETE on http_trigger to windmill_user
  for safety on customer instances
- Force workspaced_route: true for non-admins in OpenAPI route generator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:43:49 +00:00
Ruben Fiszel
64c58c824f feat: add deploy restriction rule and fork review requests (#8804)
* feat: add deploy restriction rule and fork review requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt for fork review requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review comments on fork review requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename fork review requests to deployment requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt for deployment request rename

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: inline deployment request panel into deploy layout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: place Request deployment button to the left of Deploy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: inline fork triggers into main deploy list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: open real trigger detail drawer for inline fork triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: email notifications for merge completion and reply pings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update deployment_request + protection_rule tables on workspace id rename

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 972893c3870e4c4a70a35748abed282d88904805

This commit updates the EE repository reference after PR #528 was merged in windmill-ee-private.

Previous ee-repo-ref: 5684d1c17d930b17849c1e5d7577891e64682d45

New ee-repo-ref: 972893c3870e4c4a70a35748abed282d88904805

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-13 17:10:37 +00:00
Ruben Fiszel
b3ef4bc26c perf: add inline-persist fast path for WAC v2 step() (#8807) 2026-04-13 16:49:53 +00:00
Ruben Fiszel
3f5841f84d feat: instance-level ruff config auto-pulled by LSP container (#8803)
* feat: add instance-level ruff config auto-pulled by LSP container

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move ruff config to new LSP tab in instance settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:04:49 +00:00
Ruben Fiszel
78a877eb96 avoid lock file race in repro_diffname CLI test on windows (#8811)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:27:36 +00:00
hugocasa
378ba78284 fix: silence user-facing toast for non-critical hub script tracking error (#8808)
* fix: silence user-facing toast for non-critical hub script tracking error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* n

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-13 14:21:54 +00:00
hugocasa
95411b2563 feat: display agent message in flow graph (#8806)
* feat: display message and web search content in agent graph node status

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: use markdown renderer for agent message output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert web search output display — content not useful

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve web search alert text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: align message title styling with other node status sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 13:39:14 +00:00
Ruben Fiszel
b6f1cc70cd fix(cli): make cli help resilient to npm registry fetch failures (#8809)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-13 13:38:38 +00:00
centdix
cdcc56461b feat: add black-box ai eval benchmarks (#8618) 2026-04-13 14:05:46 +02:00
Ruben Fiszel
60211c1d19 feat: folder default_permissioned_as rules for ownership defaults on deploy (#8801)
* feat: add folder default_permissioned_as rules for ownership defaults on deploy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unnecessary auth guard on default_permissioned_as — rules are advisory only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate system prompts with new CLI commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CI review findings — TOCTOU, race condition, email validation, type coercion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add sqlx offline cache for test queries (fixes cargo_test CI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review findings — incomplete request bodies, dead code, redundant import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review findings — full script fields, reactive stores, catch-all validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: app/schedule/trigger set-permissioned-as fetch remote first to avoid data loss

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: app set-permissioned-as avoid creating redundant app version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: compact user/group toggle + select for folder default_permissioned_as rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: collapse default_permissioned_as section by default in folder editor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: include default_permissioned_as in FolderFile CLI type for YAML round-trip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: process folder.meta changes before items in push to apply new rules immediately

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clone default_permissioned_as on fork/rename + add full lifecycle tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add no-op guarantee test — folder without rules behaves like before

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename cliBehavior to syncBehavior — more accurate scope

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:14:07 +00:00
Ruben Fiszel
6cebc6f61b chore(main): release 1.682.0 (#8798)
* chore(main): release 1.682.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-10 17:41:42 +00:00
Ruben Fiszel
59c457a138 feat: enrich hanging flow error with worker and service log info (#8800)
* feat: enrich hanging flow error with worker and service log info

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review on hanging flow diagnostics

- Widen log_file lookup window to [-90s, +30s] around worker last ping
  so the batch containing the crash is captured (log files are
  minute-aligned; looking forward only was missing the relevant bucket).
- Log a warning on log_file query errors instead of silently swallowing,
  so a misconfigured table is not reported as "no log files found".
- Note that service log download URLs require S3/parquet collection.
- Fix memory display when only worker_memory_total is known.
- Regenerate sqlx offline cache for the new/modified queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:19:20 +00:00
Ruben Fiszel
b783bf2d83 fix: show full path on hover in deploy drawer and widen drawer (#8799)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:40:23 +00:00
Ruben Fiszel
9c85565221 fix: bypass OTEL MITM tracing proxy for git sync jobs (#8796)
Git sync runs as a DeploymentCallback job. When the OTEL MITM tracing
proxy is enabled, all HTTP/HTTPS traffic from the script is rerouted
through a local intercepting proxy that chains to the corporate upstream
proxy. Git's HTTPS to GitHub fails in this setup (TLS interception with
chained CONNECT tunneling is fragile, and git's CA env handling diverges
from what the proxy injects), so customers see "GitHub.com URL couldn't
be reached" until they disable OTEL.

Detect DeploymentCallback jobs in get_proxy_envs_for_lang and fall back
to the stock PROXY_ENVS so git talks to the corporate proxy directly,
unmodified. The git sync script is system code; we don't need HTTP spans
for it.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:32:19 +00:00
Ruben Fiszel
e48c7cf448 move CiTestResult schema outside python-client inline markers (#8795)
CiTestResult was defined between the INLINE START/END markers, which
python-client/build.sh strips and replaces with a wildcard $ref to
openflow.openapi.yaml, breaking the PyPI publish job.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:28:09 +00:00
Ruben Fiszel
8b2a8882bc chore(main): release 1.681.0 (#8769)
* chore(main): release 1.681.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-10 14:53:04 +00:00
Alexander Petric
5eb9a2e965 add instance onboarding telemetry (#8792)
* [ee] feat: add instance onboarding telemetry

Update ee-repo-ref to include instance_onboarding telemetry field
in the daily stats payload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 5f912375340225876a8c1740c3301f39cd6cbd6d

This commit updates the EE repository reference after PR #527 was merged in windmill-ee-private.

Previous ee-repo-ref: b0b10d81060ab6dabee81a5a067ffadc6b48e074

New ee-repo-ref: 5f912375340225876a8c1740c3301f39cd6cbd6d

Automated by sync-ee-ref workflow.

* sqlx

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
Co-authored-by: Ruben Fiszel <ruben@windmill.dev>
2026-04-10 14:48:38 +00:00
hugocasa
946848feef fix: limit multi-runnable dedicated workers to one job at a time (#8782)
* feat: thread concurrency semaphore through dedicated worker executors

Pass the concurrency_semaphore parameter through bun, deno, and python
start_worker functions to handle_dedicated_process. Also fix the
DedicatedWorkersSelector to use listWorkspacesAsSuperAdmin (so all
workspaces including admins are visible) and skip loading when disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to limit-workers-one-job branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 54037e77cdd37777560755fef7075d35906c96d8

This commit updates the EE repository reference after PR #523 was merged in windmill-ee-private.

Previous ee-repo-ref: 56890ea8fca2c1c44a1338a27011b4dd1137d9c9

New ee-repo-ref: 54037e77cdd37777560755fef7075d35906c96d8

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
Co-authored-by: Ruben Fiszel <ruben@windmill.dev>
2026-04-10 14:35:13 +00:00
Diego Imbert
3d43d31aba fix: refresh custom instance user password if auth failed (#8787)
* Refresh custom instance user pwd if connection failed

* No longer need to check on startup

* nit: unneeded inner function

* fix
2026-04-10 14:26:53 +00:00
Diego Imbert
8957d8f19b fix: bypass sql type injection during formatting to prevent offset corruption (#8786)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:26:33 +00:00
Diego Imbert
3c64a4282d Prompt to analyse assets for whole flow on undetected assets (#8784) 2026-04-10 14:26:20 +00:00
Ruben Fiszel
ec9cec1d02 fix: treat empty global setting strings as unset (#8793)
* fix: treat empty global setting strings as unset

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: close protected-setting whitespace gap in diff and preserve empty ws override

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:23:37 +00:00
Ruben Fiszel
09666af157 refactor(git-sync): remove force_branch UI option (#8794)
The new workspaces: section in wmill.yaml lets the CLI auto-select the
right entry by matching baseUrl + workspaceId against the existing
--base-url and --workspace flags the backend already passes, making the
force_branch override redundant.

Backend field and serializer are intentionally left intact for backward
compat with any repository that already has force_branch saved.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:22:25 +00:00
Ruben Fiszel
6cf7ffc26b feat(vault): add skip_ssl_verify option for HashiCorp Vault (#8791)
* [ee] feat(vault): add skip_ssl_verify option for HashiCorp Vault

Adds an optional skip_ssl_verify boolean to VaultSettings so
self-signed Vault deployments can be used in development without
needing a custom CA bundle. The flag is surfaced as a Toggle in the
HashiCorp Vault section of the secret backend instance settings and
plumbed through to the EE Vault HTTP client builder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to bcfb663f9e902539abbbf69c517715eb8d4ce8f9

This commit updates the EE repository reference after PR #526 was merged in windmill-ee-private.

Previous ee-repo-ref: 7e1372b8f59fe81aaf61212970ebdf2286be864d

New ee-repo-ref: bcfb663f9e902539abbbf69c517715eb8d4ce8f9

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-10 13:55:22 +00:00
hugocasa
ce3e676f4a feat: list external JWT tokens in instance settings (#8783)
* [ee] feat: add external JWT tokens listing in instance settings

Add the ability for superadmins to view all external JWT tokens that have
been used for authentication, along with their claim metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move external JWT tokens listing to users tab

- Move list endpoint from /oidc/ext_jwt_tokens to /users/ext_jwt_tokens
- Display as a sub-tab below the instance Users tab, only shown when tokens exist
- Use DataTable's built-in load-more pattern for pagination
- Add "Recently active only" toggle (tokens used in the last 30 days)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add dev_override cargo feature to windmill-common

* feat: show placeholder for legacy external JWT entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 62a462461271b900351c18b0ab1ca78651154b2a

This commit updates the EE repository reference after PR #524 was merged in windmill-ee-private.

Previous ee-repo-ref: 7b493a337abe00a47cf9d94847babe3cb3a6799f

New ee-repo-ref: 62a462461271b900351c18b0ab1ca78651154b2a

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-10 13:11:00 +00:00
Ruben Fiszel
4fff89f98c fix: hide legacy global_settings.worker_configs ghost row (#8790)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 06:00:03 +00:00
Ruben Fiszel
d243eb31b0 fix: CLI falls back to workspace whoami for workspace-scoped tokens (#8789)
* fix: CLI falls back to workspace whoami when global whoami is 401

Workspace-scoped tokens (token.workspace_id set) cannot call
/api/users/whoami — the backend's token lookup filters by workspace_id
which is NULL on global paths, so auth returns 401 before the handler
runs. This breaks the CLI entirely: requireLogin calls globalWhoami at
the start of every command, so no command works with a
workspace-scoped token, not even `wmill workspace whoami`.

Fix it CLI-side: if the global whoami returns 401, fall back to the
workspace-scoped /api/w/{w}/users/whoami using the workspace already
known from the CLI profile, and adapt the response shape to
GlobalUserInfo. Also drop the redundant second globalWhoami call in
`wmill workspace whoami` — use requireLogin's return value instead.

No backend changes: the workspace_id binding on the token stays
strictly enforced for every global endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use name-based ApiError check in whoami fallback

Review feedback from PR #8789: `instanceof ApiError` can silently
return false when bundling produces multiple module instances of
`gen/core/ApiError.ts` (bun build for npm, JSR dev path), which would
skip the workspace-whoami fallback and reintroduce the exact bug this
PR fixes. Match the name-based check already used at
`cli/src/main.ts:232` and drop the `ApiError` import.

Also add a comment on `workspaceUserToGlobalUserInfo` listing the
fields that aren't derivable from the workspace-scoped User response
and are filled with placeholder values, so future callers don't trust
them downstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 05:55:53 +00:00
Ruben Fiszel
a7512f9034 chore: update git sync script version to 28191 (#8788)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:18:01 -04:00
Ruben Fiszel
5b97092997 feat: unify CLI config to workspaces, deprecate gitBranches/environments (#8767)
* refactor: unify CLI config to workspaces, deprecate gitBranches/environments

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update frontend examples and regenerate system prompts for workspaces config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: update test files to use workspaces config instead of gitBranches

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: handle --branch with --base-url correctly in sync pull/push

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: warn when --workspace overrides auto-detected branch or misses config entry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: show reason why workspace was selected in log message

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: clarify specificItems file naming uses gitBranch as suffix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: rename branch-specific to workspace-specific, use workspace name as file suffix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: rename branch-specific to workspace-specific, add comprehensive integration tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: simplify bind and init to be workspace-centric

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: make bind/unbind interactive with --workspace and --branch flags

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: make bind interactive with profile selection, workspace name, and optional branch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: init offers to bind workspace using same flow as wmill workspace bind

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: skip backend git-sync check in init when no workspace was bound

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: skip all API calls in init when no workspace was bound

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: log when RT namespace is skipped, offer to generate it after bind

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: warn when no workspace bound during init

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: init git-sync check uses bound workspace, not active profile

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: init uses selected profile directly, avoids re-resolving and duplicate prompt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: init skips requireLogin, uses bound profile token directly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: auto-pick or prompt workspace from config when no branch matches

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: show configured workspaces list and bind hint in resolution messages

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: cache bound profile to avoid duplicate profile selection prompts in init

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: hoist boundProfile scope, add 2 comprehensive integration tests covering all flows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: rt.d.ts prompt defaults to no when file exists, better description

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: remove empty overrides from generated config, add specificItems hint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add inline comments for non-trivial fields, add overrides/promotionOverrides hints to bound workspaces

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: regenerate system prompts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-09 19:46:34 +00:00
Diego Imbert
29e7701972 nit text too long (#8785) 2026-04-09 19:33:16 +00:00
hugocasa
435b25e6a4 feat: add user offboarding flow with object reassignment (#8647)
* feat: add user offboarding flow with object reassignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: require new_operator for permissioned_as when reassigning to folder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update on_behalf_of_email on scripts/flows during offboarding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract offboarding to separate module and add integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: delete tokens, add operator preview counts, remove token reassignment UI

Tokens are now always deleted during offboarding. Preview now shows
scripts/flows/apps with on_behalf_of and schedules/triggers with
permissioned_as referencing the departing user (even outside their path).
Token reassignment UI removed since webhooks break on path changes anyway.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: rich preview with path lists, warnings, and downloadable report

Preview now returns full path lists (not just counts) for owned objects
and objects executing on behalf of the user. Adds warnings for:
- HTTP triggers (webhook URLs will change)
- Email triggers (addresses will change)
- Broken $var:/$res: references in resources/variables
Frontend provides "Export list" button to download affected content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for dynamic queries (triggers, extra_perms, operator schedules)

Adds HTTP trigger, extra_perms reference, and shared schedule to test
fixture. Tests verify that non-macro sqlx queries (trigger reassignment,
extra_perms cleanup, operator schedule update) work correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove broken_references, add full dynamic query test coverage

Remove broken_references field from preview (user's resources/variables
are already in the owned paths list). Add shared HTTP trigger fixture
to test all dynamic query paths: trigger operator preview (line 232),
trigger permissioned_as update for non-user-path (line 951), and
extra_perms cleanup on trigger tables (line 983).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add referencing field to preview for content/value path references

Preview now includes a 'referencing' section listing scripts (by content),
flows (by value JSON), apps (by policy/extra_perms), and resources (by value)
that contain references to u/{username}/ paths. These references may break
after reassignment. Shown in export list and as a warning in the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rename objects to items in UI, detect on_behalf_of items in hasItems

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace remaining objects with items in UI text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename operator to on_behalf_of, separate owned vs on-behalf UI sections

- Rename new_operator to new_on_behalf_of_user in API and frontend
- Rename op_ prefixed variables to obo_ in backend
- UI now shows separate sections for owned items and items running
  on behalf, with the operator selector shown only when needed
- canSubmit logic updated: operator needed for folder targets OR
  when on-behalf items exist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: CSV export, side-by-side layout, always accept new_on_behalf_of_user

- Export affected items as CSV instead of text
- Owned items and on-behalf items shown side by side in summary boxes
- new_on_behalf_of_user always accepted (defaults to target user for
  user targets, required for folder targets)
- On_behalf_of selector always visible, auto-defaults when user target
  is selected

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: proper pluralization and bottom-aligned counts in summary boxes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: stack info boxes vertically, referencing box as warning style at top

Info boxes (owned, on-behalf, referencing) now one per row instead of
side-by-side. Referencing box uses warning colors. Webhook/email trigger
alerts shown below boxes. Proper pluralization in global modal too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CSV exports only referencing items, export button inside warning box

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: use ToggleButtonGroup for user/folder, add reassign toggle on remove

- User/Folder selection now uses ToggleButtonGroup component
- When removing a user, a "Reassign items before removing" toggle lets
  the admin skip reassignment and just delete directly
- In reassign-only mode, the toggle is not shown (always reassigns)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show token details with labels and scopes in preview

Preview now returns token label, scopes, and expiration instead of just
a count. Frontend shows a dedicated token box listing each token with
its scopes. Test updated to verify token label in preview response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract shared offboarding components, per-type trigger links, hash deep linking

- Extract OffboardItemsBox, OffboardReassignControls, OffboardWorkspaceSection,
  and offboarding-utils.ts as shared components used by both workspace and global modals
- Change triggers in OffboardAffectedPaths from Vec<String> to HashMap<String, Vec<String>>
  so frontend knows which trigger page to link to
- Add hash-based deep linking to all 9 trigger pages and schedules page
- Preserve URL hash in updateQueryFilters across all trigger pages
- Only open editor drawer if the item is found in the list
- Reassign toggle at top with warning alert when disabled (both modals)
- Referencing items box uses yellow warning variant with expandable path links
- Cleaner labels: "Move u/{username}/* items to", "Update triggers/runnables permissions to"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rename on_behalf_of section label to match flow advanced settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate sqlx query cache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review issues for offboarding

- Add 9 trigger tables to check_path_conflicts for user-friendly conflict messages
- Fix submit button no-op when user has only on-behalf items (show target selector, fix canSubmit)
- Only delete workspace user when reassignment entry exists (prevent orphaned objects)
- Add $azure_kv: prefix to vault secret query (match rename_user pattern)
- Use Svelte 5 onSelected callback instead of deprecated on:selected
- Make ScriptBuilder section label conditional on canPreserve
- Fix CSV export to include trigger paths via flattenPaths utility
- Fix test_offboard_reassign_only to remove conflicts and assert on response
- Parallelize workspace config fetches in global modal with Promise.all
- Delete tokens when deleting workspace user
- Return structured JSON from global offboard endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* sqlx

* fix: address second round of PR review issues

- Accumulate per-workspace OffboardSummary in global offboard instead of returning zeros
- Delete workspace user unconditionally when delete_user=true (prevent orphaned usr rows)
- Filter archived/deleted scripts in check_path_conflicts to match preview
- Reset form state when workspace offboard modal reopens
- Move hashHandled=true inside trigger-found guard on all 10 deep-link pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: improve offboarding integration tests

- Add second workspace to fixture for multi-workspace global offboard testing
- Add test_global_offboard_execution: verifies items reassigned across 2
  workspaces, user deleted from both, and password row deleted from instance
- Add test_offboard_invalid_target: verifies 400 for nonexistent user,
  nonexistent folder, and invalid target format
- Fix test_offboard_to_user: use single DELETE, add explicit new_on_behalf_of_user
- Fix test_global_offboard_preview: assert 2 workspaces instead of 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address third round of PR review issues

- Fix ScriptBuilder tooltip to match conditional section label wording
- Clear stale conflicts in global modal on reopen
- Fix test_offboard_to_folder to assert on specific moved path, not pre-existing data
- Allow deleting user with zero items (show Offboard button, skip reassignment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add global token deletion warning in instance-level offboard modal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* update sqlx

* fix: add raw_app path and dependency_map path reassignment to offboarding

Audit found these tables with user-scoped paths were not being updated:
- raw_app: mirrors app paths, needs path reassignment
- dependency_map: importer_path and imported_path reference user paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move user cleanup to delete_workspace_user_internal, fix review issues

- Move extra_perms, folder owners, drafts, favorites, inputs, captures
  cleanup into delete_workspace_user_internal so any user deletion gets
  proper cleanup (not just offboard path)
- Fix flow INSERT missing labels and lock_error_logs columns (data loss)
- Fix validate_target returning 404 instead of 400 for nonexistent targets
- Fix canSubmit blocking delete when user has no items to reassign
- Fix token preview query filtering out tokens without scopes
- Fix token warning messages: workspace-level mentions webhooks/HTTP triggers,
  instance-level mentions API calls using credentials
- Fix "Schedules and triggers" -> "Triggers and runnables" wording
- Show token section at instance level only when tokens exist
- Show Offboard button at instance level when user has no items but deleteUser=true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:31:36 +00:00
Ruben Fiszel
1deb31f1e0 fix: error on flow/app folder suffix format mismatch during sync push/pull (#8775)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:30:59 +00:00
Ruben Fiszel
c57c769dea feat: add CI test scripts with auto-trigger on deploy (#8736)
* feat: add CI test scripts with auto-trigger on deploy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: fix annotation parser early return and handle renames correctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move CI test results to top of script/flow detail pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve CI test results spacing, icon, and remove pass label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: support one-line annotation and use script/path format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: move CI test trigger logic to EE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: move CI badge next to New badge and add deduplicated CI summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add CI test e2e tests and fix nullable column annotations

Add integration tests for CI test annotation parsing (creates/removes
ci_test_reference rows) and the CI test results API (single + batch
endpoints). Add backend test for auto-trigger on deploy (private+python).

Fix sqlx LEFT JOIN LATERAL nullable column annotations in
get_ci_test_results and get_ci_test_results_batch queries — sqlx
cannot infer nullability from LATERAL subqueries, causing runtime
decode errors when no matching job exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix build/sqlx

* fix

* feat: CI test improvements and templates

- Fix windmill-dep-map/private feature propagation in worker, api-scripts,
  and api-flows Cargo.toml so CI test triggers actually fire in EE mode
- Clone ci_test_reference rows during workspace fork
- Add polling to CiTestResults component (refetch every 3s while running)
- Add running state and auto-refresh to ForkWorkspaceBanner CI summary
- Add yellow "CI test" badge on script list rows and detail page
- Fix Library badge border color (remove indigo border override)
- Add CI Test TypeScript and CI Test Python templates in ScriptBuilder
- Update sqlx offline cache
- Add debug tracing for CI test trigger in worker_lockfiles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing children prop to WorkspaceDeployLayout

Fixes svelte-fast-check type error when passing named snippets as
children content inside the component tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review feedback

- Remove empty wrapper divs around CiTestResults, move mb-4 into component
- Add batch endpoint size cap (max 200 items)
- Add ON DELETE CASCADE to ci_test_reference workspace FK (new migration)
- Downgrade CI test trigger logs from info to debug
- Fix false-positive polling: only treat status='running' as running,
  not null status (CiTestResults, CompareWorkspaces, ForkWorkspaceBanner)
- Fix test numbering in integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to latest EE commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to d9d68c2406df0b59f413ea0b2cb24780a9817d04

This commit updates the EE repository reference after PR #516 was merged in windmill-ee-private.

Previous ee-repo-ref: d7ccd9b86da99ec056a0e8708e3637d64290387a

New ee-repo-ref: d9d68c2406df0b59f413ea0b2cb24780a9817d04

Automated by sync-ee-ref workflow.

* fix: treat queued jobs (job_id set, null status) as running

Jobs that have been pushed but not yet picked up by a worker have a
job_id but null status. Treat these as 'running' to avoid showing
misleading 'pass' badges or '0 passing'. Tests that were never
triggered (no job_id, null status) remain neutral/hidden.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hugocasa <hugo@casademont.ch>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-09 17:21:36 +00:00
centdix
b73be37916 feat: add edit yaml button to raw app settings (#8771)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:21:02 +00:00
Diego Imbert
4b876392a0 feat: oauth manual connect option (#8770) 2026-04-09 17:19:25 +00:00
centdix
5f57727a4d feat: allow selecting hub flows as raw app backend runnables (#8772)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:18:40 +00:00
Diego Imbert
6d36eca216 fix: Flow status viewer layout nits (avoid excess y space and scroll) (#8780) 2026-04-09 17:15:14 +00:00
Ruben Fiszel
3fb557a7f5 fix: flow step testing UX improvements (#8781)
* fix: flow dev page layout and compact toolbar improvements

- Fix JSON.parse error on /flows/dev page when editor not yet initialized
- Increase compact topbar threshold from 700px to 800px
- Reposition "Test flow" button below settings bar when pane is narrow on dev pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: flow step testing UX improvements

- Store and display logs for step test results (previewLogs in flowState)
- Add logs toggle button in output picker popover
- Fix AI proxy 401 in VS Code extension by passing OpenAPI.TOKEN
- Prevent output picker from closing when clicking Run on same node
- Make toggleOpen idempotent to avoid flicker
- Show loading placeholder in badge area during test execution
- Keep pin button visible during test runs
- Auto-refresh step history when new test completes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: track previous previewJobId to avoid redundant history refreshes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: dev page insert popover z-index, summary editing, output picker UX

- Add #flow-editor portal div to /flows/dev page for correct popover stacking
- Add summary text field at bottom of dev pages when a step node is selected
- Keep pin button visible during test runs
- Show loading placeholder badge to prevent content shift
- Exclude same-node run button from output picker outside-click detection
- Make toggleOpen idempotent when popover already open

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: reuse findModuleInFlow instead of duplicated findModule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:40:34 +00:00
Diego Imbert
e63924e377 fix: disable scroll-to-change-number on number inputs (#8777)
* fix: disable scroll-to-change-number on number inputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add comment explaining wheel handler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:39:49 +00:00
Diego Imbert
3d02be98f7 fix: normalize multi-word pg types in build_parameters to fix float8 serialization (#8778)
Multi-word Postgres type names like "double precision" caused the SQL
parser regex to fail (no spaces allowed in type group), falling back to
otyp="text". When Postgres inferred float8 for the column, the
text-typed null couldn't serialize, breaking DB Manager inserts/updates.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:39:32 +00:00
Ruben Fiszel
89920e77f3 fix: flow dev page layout and compact toolbar improvements (#8776)
- Fix JSON.parse error on /flows/dev page when editor not yet initialized
- Increase compact topbar threshold from 700px to 800px
- Reposition "Test flow" button below settings bar when pane is narrow on dev pages

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:08:35 +00:00
Henri Courdent
11ecb5a774 Volumes link (#8773) 2026-04-09 08:00:35 -04:00
Ruben Fiszel
506b7f55e1 fix: zero-downtime coordinated restarts for OTEL and other setting changes (#8768)
* fix: zero-downtime coordinated restarts for OTEL and other setting changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use background_task_state for server heartbeats and fix stale heartbeat detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show restart propagation toast when saving settings that trigger server restarts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:35:27 +00:00
Ruben Fiszel
25f4242a87 chore(main): release 1.680.0 (#8757)
* chore(main): release 1.680.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-08 16:39:17 +00:00
Ruben Fiszel
609d94aa31 fix bun lock 2026-04-08 16:22:56 +00:00
Ruben Fiszel
80c8e076fc cli nit 2026-04-08 16:16:34 +00:00
Diego Imbert
d2992af8be refactor: move ws_specific from resource column to separate table (#8766)
* Move ws_specific to separate table

* on delete cascade

* feat: handle ws_specific on resource rename and delete

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* is_false never used

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-08 16:10:41 +00:00
Ruben Fiszel
e36d440a25 fix: resolve esbuild host/binary version mismatch in app sync push (#8765)
* fix: resolve esbuild host/binary version mismatch in app sync push

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "fix: resolve esbuild host/binary version mismatch in app sync push"

This reverts commit 8822614f8e.

* fix: update esbuild to 0.28.0 and pin version exactly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:09:58 +00:00
Ruben Fiszel
fa668707c0 fix: move alert config from config table to global_settings (#8762)
* feat: move alert config from config table to global_settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rename alert setting to alert_job_queue_waiting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add CLI unit tests for pullInstanceConfigs/pushInstanceConfigs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt to merged main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:54:44 +00:00
Diego Imbert
c69f10d20d fix: skip serializing ws_specific on resources when false (#8764)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-08 15:53:01 +00:00
Ruben Fiszel
84778ca3e9 chore: update ee-repo-ref.txt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:52:10 +00:00
Ruben Fiszel
c4c003dab8 chore: update ee-repo-ref.txt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:51:14 +00:00
Diego Imbert
470b8aa5f1 feat: add status indicator dots to parallel loop iteration picker (#8761)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:26:21 +00:00
Diego Imbert
5713760b7a Fix TS typechecker for Ducklake emitting error for NULL params (#8760) 2026-04-08 12:43:41 +00:00
Ruben Fiszel
f5c9ff709b sqlx 2026-04-08 06:11:22 +00:00
Ruben Fiszel
f0bb270723 add missing delete_after_secs column to explicit SQL queries (#8759)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:00:37 +00:00
Ruben Fiszel
4c16877366 update sys prompts 2026-04-08 05:38:38 +00:00
Ruben Fiszel
4342c18541 feat: add CLI workspace merge command and enhance fork with datatable/color support (#8756)
* feat: add CLI workspace merge command and enhance fork with datatable/color support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: abort fork on git branch failure, per-datatable error handling, guard resetDiffTally

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: add fork/merge integration tests covering full cycle

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: support deploying deletions during fork merge (archive/delete in target)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: share deploy logic between CLI and frontend via windmill-utils-internal

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: revert frontend to self-contained deploy, fix failure_module handling

The frontend imports windmill-utils-internal from npm (published v1.3.4)
which doesn't have the new deploy module yet. Revert frontend to its own
self-contained implementation with two improvements:
- Pass failure_module to getAllModules in flow deploy and getItemValue
- Add deleteItemInWorkspace for deploying deletions during merge

The shared deploy.ts in windmill-utils-internal remains for CLI use.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: share deploy logic via published windmill-utils-internal, add comprehensive integration tests

- Publish windmill-utils-internal v1.3.8 with DeployProvider interface
- Frontend now uses shared deploy module (deployItem, deleteItemInWorkspace,
  checkItemExists, getOnBehalfOf, getItemValue) via provider adapter
- Add 4 new integration test sub-tests: all item types, secret variables,
  special characters, partial deploy + resetDiffTally

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: remove unused folderName function from frontend utils_workspace_deploy

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-08 05:29:48 +00:00
Ruben Fiszel
01e6414ddb Squashed commit of the following:
commit a5400b92cc4d523589d7e3c98d866c56d950dd9f
Author: Ruben Fiszel <ruben@windmill.dev>
Date:   Wed Apr 8 04:24:25 2026 +0000

    fix
2026-04-08 04:25:26 +00:00
Ruben Fiszel
2d18a68099 feat: add scheduled job deletion with configurable retention period (#8753)
* feat: add scheduled job deletion with configurable retention period

Extends delete_after_use with delete_after_secs to enable configurable
retention periods for job args/result/logs. At completion, jobs can be
scheduled for future deletion via a new job_delete_schedule table,
processed by a monitor task. Supports per-script, per-flow, and
per-flow-step configuration. Backward compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add integration tests, revert query! macros, fix review issues

- Add integration tests for resolve_delete_after_secs, schedule_job_deletion,
  flow-level and module-level delete_after_secs, backward compat
- Revert sqlx::query() back to sqlx::query!() macros for compile-time safety
- Regenerate sqlx offline cache
- Fix FlowModule/NewScript/FlowValue constructions in all test files
- Fix autoscaling_ee.rs for updated script_path_to_payload return type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt for autoscaling_ee fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gate cleanup_scheduled_job_deletions behind enterprise feature

Prevents dead_code warning (which CI treats as error via -D warnings)
when compiling without enterprise feature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate sqlx cache after merge with main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback on scheduled deletion

- Monitor: roll back transaction on any cleanup error so schedule rows
  survive for retry on next cycle (instead of best-effort then discard)
- Migration: add FK with ON DELETE CASCADE to job_delete_schedule.job_id
  to prevent orphan rows when jobs are deleted through other means
- Simplify bool-to-Option conversion with .then_some(true)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: stop setting delete_after_use alongside delete_after_secs

No mixed-version deployment scenario exists, so delete_after_secs alone
is sufficient. The backend's resolve_delete_after_secs handles
(None, Some(secs)) correctly without needing delete_after_use set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove delete_after_use from public API surface

Remove delete_after_use from OpenAPI spec, API client, runtime client,
and workspace export. Only delete_after_secs is exposed going forward.

The field remains in Rust backend types with #[serde(skip_serializing)]
for backward-compatible deserialization of existing scripts/flows that
were saved with delete_after_use: true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 1d4b7a31fc115d6aba8640f7cd3fd5a01abe6806

This commit updates the EE repository reference after PR #519 was merged in windmill-ee-private.

Previous ee-repo-ref: 9eba09a13b778caafc6ae65098b90e53c91984d3

New ee-repo-ref: 1d4b7a31fc115d6aba8640f7cd3fd5a01abe6806

Automated by sync-ee-ref workflow.

* fix: regenerate system prompts, remove unused import

- Regenerate auto-generated system prompts after openflow schema change
- Remove unused serde_json::json import in test file (CI -D warnings)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: insert dummy v2_job row in schedule tests for FK constraint

The job_delete_schedule table has a FK to v2_job, so tests need a
real v2_job row before inserting into the schedule table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: trigger CI re-run

* fix: remove heavy flow integration tests to avoid CI worker contention

The flow integration tests spawn workers that compete for CPU with
the existing relock_skip tests under --test-threads=10, causing
consistent 60s timeouts in CI. Keep only the lightweight unit tests
and DB integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore correct ee-repo-ref for our branch

The ref was overwritten to main's EE ref during a rebase. Restore to
our branch's EE commit that includes the autoscaling tuple fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: retrigger CI on fresh runner

* fix: remove FK constraint from job_delete_schedule to unblock CI

The FK with ON DELETE CASCADE to v2_job may have caused performance
overhead during test DB setup (each sqlx::test creates a fresh DB
with all migrations). Remove the FK — orphan schedule rows are
harmlessly cleaned by the monitor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ee-ref

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-08 04:15:28 +00:00
Ruben Fiszel
9e6427d150 chore(main): release 1.679.0 (#8755)
* chore(main): release 1.679.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-07 21:14:29 +00:00
Diego Imbert
3d4f4c6c38 feat: Fork datatables (#8339)
* export_datatable_schema

* Propose to fork the datatable on ws fork

* dump datatable

* Dockerfile

* Fix import_datatable_dump

* datatable schema fork works!

* Option to copy both schema and data

* Datatable fork behavior

* nit ui

* use psql instead

* remove fork_datatable route

* feat: add fork_pg_database and export_pg_schema routes with DB Manager UI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: pluralize "schema" to "schemas" in DB Manager export/import UI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add import mode select (schema only vs schema + data) to DB Manager import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Select schema or schema+data when important database

* fix: prepend $res: prefix to resource paths in DB Manager import/export

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: dynamic import button label based on selected mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* nits

* feat: add warning alert when schema+data import mode is selected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* nit hide on cloud hosted

* refactor: remove fork_behavior from datatable settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split CreateWorkspace into layout wrapper and CreateWorkspaceInner

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: instantiate CreateWorkspaceInner in globalForkModal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* nit icons

* Data table fork UI

* feat: pass per-datatable fork behaviors from UI to backend during workspace fork

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix fork overwriting all datatables

* UI nits

* custom instance db refactor

* custom instance db wizard btn for all in dropdown

* nit

* Delete custom instance database button

* Disable forking for resource datatables

* Big import buttons when db empty

* Revert "Disable forking for resource datatables"

This reverts commit 9561cc8fd4.

* feat: add non_diffable flag to resource table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add resource-type datatable fork with CREATE DATABASE

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: tag forked datatables with nonDiffable and forkedFrom

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: diff datatable and ducklake settings individually on workspace merge

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: skip non_diffable resources and datatables in workspace diff

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: default datatable fork behavior to keep_original

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: make grant permissions non-fatal in instance datatable fork

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: make datatable and ducklake diffs visible in workspace comparison

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: remove datatable fork logic from workspace fork route

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: correct ahead/behind logic for datatable and ducklake diffs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Revert "fix: correct ahead/behind logic for datatable and ducklake diffs"

This reverts commit 6b50884dc6.

* revert: remove datatable and ducklake settings diffing logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add datatable clone UI with step-by-step confirmation modal

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract datatable fork UI into ForkDatatableSection component

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit

* fix: run datatable cloning before workspace fork creation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit disable fork admins

* nit fix switching workspace prematurely

* fix: use source workspace for forkPgDatabase calls during fork

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: update forked workspace datatable settings after fork creation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add forked_from field to DataTable and set it for instance forks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit onFinish

* fix: add forked_from to DataTableSettings OpenAPI schema

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: track datatable table DDL changes in workspace_diff

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Revert "feat: track datatable table DDL changes in workspace_diff"

This reverts commit 7526dd68b9.

* feat: add get_datatable_full_schema endpoint and snapshot schema on fork

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix duplicate migration key

* fix: set forked_from on datatable config for both instance and resource types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nits

* feat: drop forked databases on workspace deletion with confirmation UI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract drop_forked_datatable_databases from delete_workspace

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: cast pg char columns to text in FK schema query

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: show dbname instead of resource type in fork deletion modal

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* ui nit

* refactor: extract drop_custom_instance_database into windmill-common

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add datatable schema diff section to merge UI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* UI

* feat: add review drawer with YAML diff and SQL migration runner

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: use Monaco DiffEditor for YAML diff in review drawer

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit

* Revert "refactor: use Monaco DiffEditor for YAML diff in review drawer"

This reverts commit a86008ba4c.

* Revert "feat: add review drawer with YAML diff and SQL migration runner"

This reverts commit 0a0deb5ddb.

* feat: add review drawer with DiffEditor and SQL migration runner

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* ui nits

* fix: show diff between forked_from schema and changed side

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: re-fetch target live schema after migration for correct baseline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* revert

* nit auto next

* feat: add confirmation modal before deploying migration to parent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: handle missing columns/foreignKeys in schema conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nits

* refactor: use temp file on disk for pg_dump instead of in-memory string

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Don't replace postgres dbname

* fix: add validation to drop_custom_instance_database and use source db for CREATE/DROP

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: type DataTable.forked_from as DataTableForkedFrom struct

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: simplify fork_pg_database to take source + target_dbname

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* dead code

* feat: enforce schema_and_data admin-only and extract create_custom_instance_database

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: rename fork_pg_database to import_pg_database with source/target/override params

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit

* refactor: remove original_dbname/original_resource from forked_from, resolve from parent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit

* fix: resolve forked dbname from fork workspace when dropping resource databases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nits

* fix: always clean up global_settings even if database doesn't exist

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: check datatable resource_type from config instead of URL prefix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: wrap PG default value expressions in braces to prevent CAST quoting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Revert "fix: wrap PG default value expressions in braces to prevent CAST quoting"

This reverts commit 77f5a2c4e8.

* refactor: reuse columnDefToTableEditorValuesColumn for default value handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: store raw API schema in forked_from to avoid double transformation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Revert "fix: store raw API schema in forked_from to avoid double transformation"

This reverts commit e326197a20.

* Revert "refactor: reuse columnDefToTableEditorValuesColumn for default value handling"

This reverts commit bd8f071d9f.

* fix: validate dbname with strict regex to prevent SQL injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix default value

* always validate dbname

* refactor: move get_datatable_full_schema structs and logic to query_builders.rs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: split import_pg_database into create_pg_database + import_pg_database

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract drop_forked_datatable_databases into its own route

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: require admin when using $res: resource paths in import_pg_database

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use UserDB for $res: resource access and restrict dbname creation

- resolve_pg_source_checked uses UserDB (row-level security) for $res: paths
- transform_json_unchecked is now pub(crate) to prevent misuse
- Non-superadmins can only create databases with wm_fork_ prefix
- datatable:// remains accessible to everyone

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: refuse to drop forked databases unless name starts with wm_fork_

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: remove resolve_pg_source, use resolve_pg_source_checked everywhere

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix permissions

* sqlx prepare

* compilation nits

* sqlx prepare

* sqlx prepare

* wrong route syntax

* fix: allow workspace owner to edit datatable config for fork setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Revert "fix: allow workspace owner to edit datatable config for fork setup"

This reverts commit ab683e637b.

* refactor: move datatable fork setup into create_workspace_fork backend

Instead of updating datatable settings from the frontend after fork
creation (which required admin/owner access), pass forked_datatables
info to create_workspace_fork and handle it atomically in the same
transaction. Removes applyPostForkDatatableUpdates from frontend.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: snapshot schema in backend during fork instead of frontend

The schema snapshot is now taken by the backend in apply_forked_datatable
via snapshot_datatable_schema, which connects to the parent workspace's
datatable and runs pg_get_full_schema. This removes the need for the
frontend to call getDatatableFullSchema and pass the schema through.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use get_resource_value_interpolated_internal for $res: to resolve $var: references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit

* sqlx prepare

* fix: add permission check to drop_forked_datatable_databases, validate dbnames, restrict temp file perms

- drop_forked_datatable_databases: same permission as delete_workspace
  (fork owner or super admin)
- validate_dbname on target_dbname_override and ForkedDatatableInfo.new_dbname
- Enforce wm_fork_ prefix on forked datatable new_dbname
- DumpFile: set /tmp/windmill/ to 0700 and create files with 0600

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* nit CLI

* Rename to ws_specific

* sqlx prepare

* nit always validate dbname

* fix: include foreign keys in CREATE TABLE migration for added tables

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: detect nextval defaults and use SERIAL/BIGSERIAL types in CREATE TABLE

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update frontend/src/lib/components/DBManagerDrawer.svelte

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update backend/windmill-common/src/lib.rs

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update backend/windmill-common/src/lib.rs

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* fix: sort foreign keys by constraint name for deterministic schema output

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* sqlx prepare

* rename migration to update timestamp

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-04-07 21:03:06 +00:00
Ruben Fiszel
0bcbc8bd3c chore(main): release 1.678.0 (#8745)
* chore(main): release 1.678.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-07 20:25:49 +00:00
Diego Imbert
2413dbefe3 fix: Fix FlowTimeline duplicate key (#8754) 2026-04-07 20:13:30 +00:00
hugocasa
db55e8efb0 fix: remove span.enter() in dedicated worker to prevent tracing panic (#8749)
* [ee] fix: remove span.enter() in dedicated worker to prevent tracing panic

Update EE ref to include fix for dedicated worker tracing span panic that
caused benchmark failures after ~8000 jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 86158dde674238fd94f925bdcd5155759e823ed6

This commit updates the EE repository reference after PR #518 was merged in windmill-ee-private.

Previous ee-repo-ref: a0480130c241d32b7e02951bfb5a03fdfc5737c8

New ee-repo-ref: 86158dde674238fd94f925bdcd5155759e823ed6

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-07 17:59:46 +00:00
wendrul
5125263859 add tracings for long debounce key errors (#8747)
* Add messages about debug key

* update ee repo ref

* undo b

* update merged ee repo ref
2026-04-07 17:59:35 +00:00
hugocasa
d938625785 feat: add download all logs button for flow jobs (#8748)
* feat: add download all logs button for flow jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use recursive CTE to include all nested flow jobs in log download

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: start iteration index at 1 and interleave children with parents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: distinguish branch vs loop iteration in log section headers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: include flownode and singlestepflow kinds in branch/iteration labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve branch labels (branchone: default/1/2, branchall: 1/2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve module types from flow_node table for nested structures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use full path in iteration/branch labels and show step kind name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: show iteration index for simple module forloop optimized jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle aiagent jobs as intermediate flow jobs with tool call children

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: reuse existing get_logs_from_store/disk instead of duplicating

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* sqlx

* sqlx

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:59:16 +00:00
wendrul
eb32206940 cloud debounce keys potential 'value too long' error (#8750) 2026-04-07 17:57:48 +00:00
Ruben Fiszel
8b9523e03c fix: delete raw_script_temp rows before workspace deletion to avoid FK violation (#8752)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:34:18 +00:00
centdix
2f7ba9edac fix: restore ai agent tool deletion (#8744)
* fix: restore ai agent tool deletion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: reduce ai tool delete tree walks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-07 12:32:57 +00:00
hugocasa
208a597d59 feat: accept any content type on webhooks/http triggers with fallback (#8743)
* Revert "feat: restore bun for dedicated workers, fix dispatch & serialization, cross-workspace deps (#8645)"

This reverts commit 619ebb65ce.

* feat: accept any content type on webhooks/http triggers with fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Reapply "feat: restore bun for dedicated workers, fix dispatch & serialization, cross-workspace deps (#8645)"

This reverts commit ee5420e401.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 12:14:24 +00:00
Ruben Fiszel
c4be833bc0 chore(main): release 1.677.0 (#8737)
* chore(main): release 1.677.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-06 19:54:41 +00:00
Ruben Fiszel
b5c1eb3137 test: verify esbuild bundle output in app dev server for React and Svelte (#8741)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:29:07 +00:00
Ruben Fiszel
edfe074e98 fix: use runnable key for file naming in generate-metadata to prevent duplicate scripts in raw apps (#8740)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:08:07 +00:00
Ruben Fiszel
c09a4311fd fix: remove stale KMS openapi/description, restore stripped doc comments
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:19:03 +00:00
Ruben Fiszel
09bbc18bb7 feat: add AWS Secrets Manager as secret storage backend (Beta) (#8734)
* feat: add AWS KMS as secret backend (EE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: switch from AWS KMS to AWS Secrets Manager as secret backend

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add AWS Secrets Manager integration tests (requires LocalStack)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mark AWS Secrets Manager as beta

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove leftover KMS handler functions from api-settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to include AWS Secrets Manager EE impl

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use full commit hash in ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* sqlx

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:17:15 +00:00
Ruben Fiszel
a78eb6e93d test: add integration test for public_app_by_custom_path endpoint (#8735)
Regression test for the missing labels column bug. Creates an app with
a custom path and anonymous execution mode, then fetches it via the
public custom path endpoint.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:17:00 +00:00
Ruben Fiszel
af1f6506d2 chore(main): release 1.676.0 (#8730)
* chore(main): release 1.676.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-06 12:45:08 +00:00
Ruben Fiszel
d2abc0d430 fix: fix custom urls not found 2026-04-06 12:33:21 +00:00
Ruben Fiszel
e32662169a feat: add path name autocomplete with ghost text and folder cycling (#8731)
* feat: add path name autocomplete with ghost text and folder cycling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: filter out archived/deleted/draft paths from autocomplete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show subfolders immediately after Tab-navigating into a folder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: remove 2-char minimum for suggestions, hide placeholder when suggestions show

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show LCP ghost text for multiple matches, Enter accepts it for Tab cycling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: suppress Path.svelte Enter dispatch when ghost text is accepted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: compute LCP inline in Enter handler to avoid reactive timing issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Enter picks the first folder and navigates into it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Enter picks the currently Tab-highlighted folder, not always the first

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: remove stray blank lines in applyCycleOrComplete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review fixes — $bindable default, openapi cache description, non-null assertion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add sqlx query cache for path_autocomplete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 03:44:21 +00:00
Ruben Fiszel
c721fac466 perf: add partial index for expired cache resource cleanup (#8728)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-05 18:33:03 +00:00
Ruben Fiszel
7eaaf3021a chore(main): release 1.675.1 (#8727)
* chore(main): release 1.675.1

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-05 17:01:17 +00:00
Ruben Fiszel
f703fba1ef fix: log cleanup scans S3 orphans and works cross-server (#8729)
* fix: log cleanup scans S3 orphans and works cross-server

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: don't skip service log orphan scan when job retention is disabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: time-based heartbeat + flag partial folder sizes on list errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: move background_task module from common to api-settings

Only log_cleanup and storage_usage use it today, both in windmill-api-settings.
Keeping it in the consumer crate narrows the blast radius; if workers or
indexer later need cross-server lease+progress coordination they can move it
back to common then.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-05 16:56:46 +00:00
Ruben Fiszel
eae46a21a9 perf: add indexes for cleanup deletes on concurrency_key and autoscaling_event (#8726)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:10:30 -04:00
Ruben Fiszel
31df861ee2 chore(main): release 1.675.0 (#8716)
* chore(main): release 1.675.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-05 14:18:34 +00:00
Ruben Fiszel
e605bc4e07 sqlx 2026-04-05 14:15:24 +00:00
Yoaquim Cintrón
7bf6ac2b69 fix: enrich OTEL spans with job_kind, trigger_kind, trigger, created_by, and script_hash (#8718)
Add five new attributes to the `job` and `job_postprocessing` tracing spans
so that OTEL-consuming backends (Sentry, Honeycomb, Datadog, etc.) can
filter and group telemetry by how a job was triggered and what type it is.

New span attributes:
- `job_kind`     — Script, Flow, AppScript, AIAgent, Preview, etc.
- `created_by`   — the user or system identity that queued the job
- `trigger_kind` — schedule, webhook, kafka, http, sqs, etc.
- `trigger`      — the schedule/trigger path (when applicable)
- `runnable_id`  — the id of the runnable that ran

Also adds `JobKind::as_str()` for a consistent lowercase string
representation, following the same pattern as `ScriptLang::as_str()`.

Existing attributes (job_id, workspace_id, script_path, language, tag,
flow_step_id, parent_job, root_job) are unchanged.

Note: the EE `full_job` span in `otel_ee.rs` and the log records emitted
by `job_logger_ee.rs` would also benefit from these attributes. This PR
covers only the public-repo spans; a follow-up EE change would propagate
the same fields to logs and the full_job span.
2026-04-05 14:11:22 +00:00
Ruben Fiszel
01e39d9cd1 fix: split DB health endpoint and add slow query controls (#8725)
Split the DB health page into independent panes so fast pg_catalog-based
diagnostics render without waiting for the slower job table scans, and
enrich the slow queries view with server-side sort, reset, and better
setup guidance.

Backend:
- Split /api/db_health into two endpoints: fast panes (database_size,
  connection_pool, table_maintenance, slow_queries, datatables) and
  /jobs (job_retention, large_results with scan_limit).
- Add GET /api/db_health/slow_queries?sort=total|mean|calls for
  server-side sorting of pg_stat_statements queries (sort whitelisted
  via enum, SQL-injection safe).
- Add POST /api/db_health/slow_queries/reset to call
  pg_stat_statements_reset().
- Return stats_reset timestamp from pg_stat_statements_info (PG 14+).
- Bump slow queries to top 50 sorted by total_exec_time (was top 10 by
  mean_exec_time, which misses high-cumulative-load queries).
- Truncate slow queries to 500 chars (was 200).
- Filter table_maintenance to tables with >= 1000 total tuples.

Frontend (DbHealth.svelte):
- Two tabs (Overview / Jobs) with auto-refresh on selection.
- Refresh buttons right-aligned in both tabs; Jobs tab keeps the
  scan_limit selector on the left.
- Job Retention & Large Results always render, with "Click Refresh to
  load" placeholders when no data yet.
- Slow queries table: clickable column headers for server-side sort,
  click a row to toggle the full query text.
- Reset stats button with confirmation dialog, displays "Stats since"
  timestamp for before/after comparison workflow.
- When pg_stat_statements is not installed, show numbered setup
  instructions with copyable SQL snippets.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:00:44 +00:00
Ruben Fiszel
02d0ee9198 feat: add object storage usage view and manual log cleanup (#8724) 2026-04-05 13:10:48 +00:00
Ruben Fiszel
dd39c110a8 fix: add admin check to count_completed_jobs_detail and document query builder SQL safety (#8722)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 12:06:14 +00:00
Ruben Fiszel
342defecd2 block adding/inviting members to admins workspace (#8721)
* fix: block adding/inviting members to admins workspace on CE

The admins workspace is reserved for superadmins only. On CE (non-enterprise),
prevent adding or inviting users to it via both API and UI.

Backend: add #[cfg(not(feature = "enterprise"))] guards to invite_user and
add_user endpoints that reject requests targeting the admins workspace.

Frontend: show an info alert on the admins workspace members page and hide
the add/invite/auto-add buttons.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use derived variable for admins workspace alert consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:06:14 +00:00
Ruben Fiszel
2b865c0694 fix: allow private AI base URLs in ai_proxy integration test (#8715)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-03 21:26:24 -04:00
Ruben Fiszel
7c3bd67639 chore(main): release 1.674.2 (#8714)
* chore(main): release 1.674.2

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-04 01:00:33 +00:00
Ruben Fiszel
ff8e39c69b fix: enforce RLS on $var: resolution in AI proxy (GHSA-jwg4-v3cj-rvfm) (#8713)
* fix: enforce RLS on $var: resolution in AI proxy to prevent secret exfiltration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update sqlx prepared queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:55:57 +00:00
Ruben Fiszel
f394e674f2 fix: SSRF via X-Resource-Path header in AI proxy endpoint (#8712)
* fix: validate AI provider base URLs to prevent SSRF via X-Resource-Path header

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: improve SSRF error message to mention ALLOW_PRIVATE_AI_BASE_URLS env var

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-04 00:43:08 +00:00
Ruben Fiszel
6bb5cac0c4 chore(main): release 1.674.1 (#8711)
* chore(main): release 1.674.1

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-04 00:05:05 +00:00
Ruben Fiszel
3e9a9f44bb nit windows tests 2026-04-04 00:04:49 +00:00
Ruben Fiszel
aff95c33b2 fix: create pg connection for cloud-hosted jobs instead of panicking (#8710)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:00:22 +00:00
Ruben Fiszel
653356011c chore(main): release 1.674.0 (#8693)
* chore(main): release 1.674.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-03 22:52:49 +00:00
Diego Imbert
fda68a72e5 feat: Support .ducklake() and .datatable() in agent workers (#8697)
* Update clients to check for agent workers

* fixes

* typescript uses 127.0.0.1

* Refresh system prompts

* fix: check both localhost and 127.0.0.1 in workerHasInternalServer detection

Both Python and TypeScript clients now check for both hostnames to avoid
silent breakage if BASE_INTERNAL_URL uses one or the other. Also adds
return type annotation to the Python method.

Co-authored-by: Diego Imbert <diegoimbert@users.noreply.github.com>

* refresh system prompts

* nit localhost regex boundary

* fix: use provider.language instead of undefined bare language in sqlUtils

The language variable was referenced as a bare identifier in the fetch
calls, resolving to undefined at runtime instead of reading from
provider.language.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Diego Imbert <diegoimbert@users.noreply.github.com>
Co-authored-by: Ruben Fiszel <ruben@windmill.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:34:34 +00:00
Ruben Fiszel
ef74a1bb4d add type predicates to .filter() in sqlUtils for strict TypeScript (#8709)
The discriminated union type from values.map() wasn't being narrowed by
.filter((info) => !info.raw), causing info.argNum to be typed as
number | undefined instead of number.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:27:38 +00:00
Ruben Fiszel
1a0f580f3d nit 2026-04-03 22:21:31 +00:00
Ruben Fiszel
6d58d1a74d fix: pipeline DISCARD ALL with first query on cached pg connections (#8707)
* perf: pipeline DISCARD ALL with first query on cached pg connections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: use RESET ALL instead of DISCARD ALL for lighter session reset

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add integration test for pg session reset on cached connections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: release MutexGuard before caching so pg connection cache actually works

The old code shadowed the MutexGuard variable without dropping it, so
try_lock() in the post-query caching path always failed — connection
caching was effectively dead code. Restructure to explicitly drop the
guard before connecting.

Also adds a CACHE_HITS counter and clear_pg_cache() helper so the
integration test can verify the cached-connection path is exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add single-worker session isolation test for SET ROLE + search_path

Pushes 3 jobs into the queue before starting the worker so a single
worker processes them all sequentially (matching production). Verifies
SET ROLE and SET search_path do not leak between jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add RESET ROLE to session reset (RESET ALL does not undo SET ROLE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use DISCARD ALL for full session reset and retry on stale connections

- Switch from pipelined RESET ROLE; RESET ALL to eager DISCARD ALL when
  validating cached connections. This resets everything: role, GUCs,
  prepared statements, temp tables, advisory locks, LISTEN registrations.
- DISCARD ALL also serves as a health check: if it fails, the stale
  connection is discarded and a fresh one is created transparently.
- Extract new_pg_connection() helper to avoid duplicating the connect +
  spawn-connection-task logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 100-job single-worker cache stress test

Runs 100 varied PG jobs (plain SELECTs, SET ROLE, SET search_path,
multi-statement) through one worker. Verifies all succeed, 99 hit the
cache, and no session state leaks between jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:16:48 +00:00
Diego Imbert
ce290f68db feat: sql.raw in Typescript client (#8706)
* feat: detect sql.raw() in TS parser and tag queries with has_raw_interpolation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: filter out sql.raw queries from type-checking and preparation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: implement sql.raw() for inline raw SQL fragments in template literals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: split sqlProviderImpl into provider interface + shared builder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix ts client compilation

* update asset parser

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:20:37 +00:00
Ruben Fiszel
dcd615fdc3 feat: add Azure Key Vault as secret storage backend (#8704)
* feat: add --main flag to write_latest_ee_ref.sh to point to latest EE main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Azure Key Vault as secret storage backend (EE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt to azure-key-vault-support branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add token auth, insecure TLS for emulator, and integration tests

Adds optional `token` field to AzureKeyVaultSettings for direct Bearer
auth (bypasses OAuth2), enables self-signed cert acceptance in token mode,
and includes 4 integration tests against the Azure KV emulator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle Azure KV soft-delete and emulator quirks

- Purge soft-deleted secrets after delete to allow name reuse
- Retry set_secret on 409 Conflict (purge stale soft-deleted secret)
- Accept self-signed certs when using static token (emulator mode)
- Work around emulator version-ordering bug in CRUD test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 47b0d9d5d163efdab1e145ee012bdb2eb1373b78

This commit updates the EE repository reference after PR #511 was merged in windmill-ee-private.

Previous ee-repo-ref: d432d78bda151d611d8065162de7c1b7edce92e9

New ee-repo-ref: 47b0d9d5d163efdab1e145ee012bdb2eb1373b78

Automated by sync-ee-ref workflow.

* fix: accept token OR client_secret in Azure KV validation, add token UI field

- isAzureKvConfigValid() now accepts either client_secret or token
- Added token input field to the Azure KV config form for emulator/dev use

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 21:02:36 +00:00
Ruben Fiszel
18eb6e0df7 point to latest otel 2026-04-03 19:59:30 +00:00
Ruben Fiszel
adc9fe722d fix: gate relock_skip tests on private feature and update ee-repo-ref (#8703)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 19:58:54 +00:00
Ruben Fiszel
0aea49f960 feat: add http/protobuf support for OTEL exporters (#8702)
* feat: add --main flag to write_latest_ee_ref.sh to point to latest EE main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [ee] feat: add http/protobuf support for OTEL exporters

Add http-proto and reqwest-client features to opentelemetry-otlp to
enable HTTP/protobuf transport as an alternative to gRPC.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: expose OTEL protocol selector in instance settings UI

Replace the hardcoded "gRPC" label with a dropdown allowing users to
select between grpc (default) and http/protobuf.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 681b725781611510ed3040c00e8f9b8497d6feda

This commit updates the EE repository reference after PR #509 was merged in windmill-ee-private.

Previous ee-repo-ref: 50051ded8183e662a9e932d87d17258501f3e944

New ee-repo-ref: 681b725781611510ed3040c00e8f9b8497d6feda

Automated by sync-ee-ref workflow.

* fix: remove reqwest-client feature to avoid conflict with default reqwest-blocking-client

The opentelemetry-otlp crate only activates the reqwest-client HTTP client
when reqwest-blocking-client is NOT also enabled. Since the default features
include reqwest-blocking-client, having both resulted in no HTTP client being
created. The default reqwest-blocking-client works correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* iterate

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 19:30:25 +00:00
Ruben Fiszel
ba214709b9 fix: add secretKeyRef support for jwt_secret and rsa_keys (#8698)
* feat: add secretKeyRef support for jwt_secret and extra fields (rsa_keys)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update ee-repo-ref to 2c24cf597fdf8c4dccd483f1f1e5c49eb42ef3a3

This commit updates the EE repository reference after PR #508 was merged in windmill-ee-private.

Previous ee-repo-ref: ade3bb76f8e0a6e658313b54c7180577fc9efc37

New ee-repo-ref: 2c24cf597fdf8c4dccd483f1f1e5c49eb42ef3a3

Automated by sync-ee-ref workflow.

* test: replace unit tests with integration tests for jwt_secret and rsa_keys

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update ee-repo-ref.txt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 18:17:27 +00:00
hugocasa
bffa61e33f fix: dedicated worker dispatch, cross-workspace deps, UI improvements (#8689)
* feat: restore bun as default runtime for dedicated workers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add context comment for bun dedicated worker nodejs migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: dedicated worker dispatch for flows + add E2E tests

- Add workspace_id prefix to dedicated worker map lookup keys
- Update ee-repo-ref for dedicated worker path handling fix
- Add spawn_test_worker_dedicated/in_test_worker_dedicated test helpers
- Add 6 E2E tests for dedicated workers:
  - test_dedicated_flow_rawscript (regression for "Script not found" bug)
  - test_dedicated_flow_workspace_script
  - test_dedicated_flow_multiple_steps
  - test_dedicated_standalone_script
  - test_dedicated_runner_group
  - test_dedicated_flow_runners
- Add dedicated_flows.sql fixture with scripts, flows, and worker config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: always run dependency job for dedicated worker scripts

When a script with dedicated_worker=true is deployed with a pre-computed
lock (e.g. via wmill sync push), no dependency job was created, so the
dedicated worker never detected the update and kept running the old version.

Now dedicated worker scripts always generate a dependency job regardless
of whether a lock is provided. The dependency job runs on the dedicated
worker and triggers a restart so it picks up the new script version.

Fixes #8638

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use serial_test for dedicated worker tests to avoid WORKER_CONFIG races

Dedicated worker tests need non-default worker tags in the global
WORKER_CONFIG. When run in parallel (CI uses --test-threads=10),
multiple tests clobber each other's config. Use #[serial] to ensure
dedicated worker tests run sequentially.

Also load worker config from DB via load_worker_config() instead of
manually setting WORKER_CONFIG fields, ensuring consistency with the
monitor's reload path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: nodejs dedicated worker script_path shadowing + add multi-language E2E tests

Fix script_path shadowing in bun_executor nodejs branch where the wrapper
file path was passed to handle_dedicated_process instead of the logical
path, causing "Script not found" for all //nodejs dedicated workers.

Add E2E tests for dedicated flows in all supported languages:
- test_dedicated_flow_deno
- test_dedicated_flow_python
- test_dedicated_flow_bunnative (V8 PrewarmedIsolate path)
- test_dedicated_flow_bun_nodejs (//nodejs annotation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify dedicated worker dispatch + add serialization and E2E tests

- Unified lookup: always use {workspace}:{runnable_path} for dedicated
  worker dispatch, replacing the flow_step_id iteration approach
- Added serialization_semaphore parameter to executor start_worker fns
- Added E2E tests: cross-workspace isolation, conflicting flow step IDs,
  preprocessor on dedicated worker
- Added workspace field to RunJob for cross-workspace test support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cross-workspace workspace dependencies on workers page

Add two new instance-level endpoints to the configs router:
- GET /configs/list_all_workspace_dependencies
- GET /configs/list_all_dedicated_with_deps

Both require devops role and return data across all workspaces,
enabling the workers page to show a consistent view of which
workspace dependencies exist regardless of which workspace the
user is browsing.

Update DedicatedWorkersSelector to use the new cross-workspace
endpoints with fallback to per-workspace calls for non-devops users.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to include dedicated worker lookup simplification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: use branch name for ee-repo-ref (CI can't fetch by SHA from non-default branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ee-repo-ref.txt with new reference

* sqlx

* fix: revert serialization semaphore, multi-workspace picker, dep conflict warnings

- Remove serialization_semaphore from executor start_worker signatures
- Remove serialization test and fixtures
- Fix DedicatedWorkersSelector to preserve tags from other workspaces
  when toggling in the picker
- Track workspace deps per-workspace for conflict detection
- Show warning when dep exists in another workspace but not the script's
- Group runner groups per-workspace to prevent cross-workspace merging
- Add workspace to dep badge link URL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify exec protocol — execd: for single-script, exec: for runner groups

Add execd:/execd_preprocess: commands to bun/deno/python wrappers for
single-script dedicated workers (no path needed). Runner groups keep
exec:/exec_preprocess: with path for multi-script disambiguation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add unit tests for execd:/exec: wrapper protocol

Verify generate_multi_script_wrapper produces both execd: (single-script)
and exec: (runner group) protocol handlers, including preprocessor variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update commit reference in ee-repo-ref.txt

* fix: remove beta badge from squash loop, keep tooltip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update protocol tests to use execd: for single-script wrappers

Deno and bun single-script protocol tests now send execd:{args} instead
of exec:{path}:{args}, matching the updated wrapper protocol. Multi-script
(runner group) tests continue to use exec:{path}:{args}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unused TEST_SCRIPT_PATH in deno protocol tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — down migration, push_as workspace, UI improvements

- Use regexp_replace in down migration for positional accuracy
- Fix push_as() to use self.workspace_id instead of hardcoded value
- Remove per-workspace API fallbacks, use cross-workspace endpoints only
- Skip devops-only API calls when user is not devops (disabled prop)
- Fix duplicate key error for cross-workspace runner groups
- Add workspace to RunnerGroup for unique keying
- Reuse tagRow snippet for standalone items with expand/collapse
- Fix picker alignment: remove empty column for non-expandable items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive dedicated worker test coverage, fix Python execd_preprocess

- Add Python execd_preprocess: handler (was missing for single-script dedicated workers)
- Add 10 E2E tests: flow+standalone conflict, mixed lang fallback, unsupported lang
  flow runners, python runner group, bun/python/deno/bunnative preprocessors,
  runner group preprocessors, branchone flow
- Add 4 Python unit tests for execd:/execd_preprocess: protocol
- Update EE ref

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — migration escaping, deno try/catch, loadRunnables guard

- Down migration: use E'...' so \n matches actual newlines
- Up migration: anchor regex with ^ to avoid mid-content matches
- Deno execd_preprocess: move JSON.parse inside try/catch
- DedicatedWorkersSelector: skip devops-only API calls when disabled

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add dedicated worker relative import tests for bun and python

Verifies that build_loader's CURRENT_PATH correctly resolves workspace-
relative imports when running on a dedicated worker subprocess.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: dedicated worker dispatch for nested flow structures (branches/loops)

- Add extract_flow_root() to strip nesting segments from runnable_path
- Dispatch uses flow_root/flow_step_id for nested paths, runnable_path
  for flat paths — deterministic, O(1)
- Fix assert_ran_on_dedicated_worker to BFS all descendants
- Fix python mode labels (python vs python3 for runner groups)
- Add tests: simple forloop, multi-step forloop, whileloop, branchall,
  nested branch-in-loop, mixed lang fallback, unsupported lang runners

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: fix ee-repo-ref SHA

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: hide picker and skip API calls for read-only users, hide empty runner badge

- Hide "Add more scripts/flows" section when disabled (read-only)
- Skip per-runnable API calls (getScriptByPath, getFlowByPath) for
  disabled users — just show path info
- Hide "0 runners" badge on flows with no eligible steps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 9422b189762ae27edfc346541ae668a4ad728325

This commit updates the EE repository reference after PR #503 was merged in windmill-ee-private.

Previous ee-repo-ref: 4c6ba214bfc23fff05d1dc3200ac59e650af3f4f

New ee-repo-ref: 9422b189762ae27edfc346541ae668a4ad728325

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 17:50:07 +00:00
Ruben Fiszel
27ca417201 fix: resolve schedule update deadlock (#8701)
* feat: add --main flag to write_latest_ee_ref.sh to point to latest EE main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve schedule update deadlock by fixing lock ordering in edit_schedule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:44:10 +00:00
Ruben Fiszel
c4c9ef5fd7 feat: add optional labels to scripts, flows, apps, schedules, triggers (#8609)
* feat: add optional labels to scripts, flows, apps, raw apps, schedules, and triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update sqlx cache, make labels optional in openapi, regenerate system prompts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add minimal labels input UI to script, flow, and schedule editors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reduce gap between summary and labels input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add labels to script/flow detail pages and summary/path popover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move labels inside SummaryPathDisplay trigger for clickable area, reduce gap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: display labels inline to the right of summary, not below

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase gap between summary and labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add labels to resources/variables, make labels nullable, add home page label filter badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add labels to workspace export/import, resources, variables + test coverage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make migration idempotent, regenerate sqlx cache after merge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pass labels in script create and flow create/update API calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add labels input UI to resource and variable editors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove negative margin from LabelsInput to prevent overlap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add top and left margin to LabelsInput for better spacing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reduce left margin on LabelsInput

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: widen label input to w-32

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use inline-flex so LabelsInput doesn't stretch full width

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove flex-wrap so label input stays on same line as badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add label filter presets to resources, variables, and schedules search

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use max-w-32 on label input to prevent stretching

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pull labels closer to summary with negative top margin

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase negative margin to pull labels even closer to summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pass labels in schedule create/update API calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use COALESCE to preserve existing labels when not provided in schedule/flow update

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels to CreateResource, EditResource, CreateVariable, EditVariable in OpenAPI spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: display label badges on resource and variable list pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: display label badges on schedule and all trigger list pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add folder and label presets to schedules search filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: apply user_folders_only filter on all workspaces including admins

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add label presets to resources and variables search filters

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: derive folder presets from loaded items, not all workspace folders

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add label query parameter to resource and variable list endpoints in OpenAPI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: display label filter badges inline with folder filters on home page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "feat: display label filter badges inline with folder filters on home page"

This reverts commit 6767a50aa6.

* feat: support comma-separated label filters (allowMultiple) in all list endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: append label presets with comma for allowMultiple filters instead of duplicating key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: hide label presets that are already in the comma-separated filter value

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace unsafe manual SQL ARRAY construction with parameterized queries, add labels to ScriptWDraft

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: complete down migration, add labels to Resource/Variable OpenAPI schemas, remove type cast, add label length validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels field to Schedule test fixture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels field to Rust client struct constructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: regenerate sqlx cache with --all-features for EE builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate sqlx cache and package-lock after merge with main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: squash two migrations into one, use IF NOT EXISTS for idempotency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: track label changes in SummaryPathDisplay to enable save button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use JSON string comparison for label dirty tracking in popover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: navigate to script by path after save from popover to load new version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update initialLabels after save so subsequent label changes enable save again

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use onchange callback for label dirty tracking instead of derived comparison

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reload script by path after label save to fetch new version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: propagate script/flow labels to jobs at push time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show script/flow labels on runs page, merge with wm_labels for completed jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: change job labels type from JSONB to text[], show labels on job detail page, fix type mismatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels to QueuedJob struct, fix get_job queries to return v2_job.labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace +Label text with icon only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add tag icon before labels on job detail page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move tag icon inside badge on job detail page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use blue badge with tag icon in RunBadges, remove duplicate labels from JobDetailHeader

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: set icon position to left so tag icon renders in badge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: render Tag icon inline in badge children instead of via icon prop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: retry icon prop with small badge and position left

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add hover tooltip showing "Label: X" on job label badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: include v2_job.labels in runs page label filter and broad search

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate sqlx cache and system prompts after merge with main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels to EE JobPayload constructions, regenerate sqlx cache with --all-features

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: regenerate sqlx cache CE-only (without EE symlinks that cause conflicts)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update remaining wm_labels JSONB queries to use text[] merge expression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify job labels to just read v2_job.labels (wm_labels already merged at completion)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: consistent label badge spacing with gap-0.5 wrapper and px-0.5 on badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels: None to test utils JobPayload construction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add labels to all test fixture JobPayload/NewFlow/EditApp constructions, regenerate sqlx cache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: fix vertical content shift by fixing container and input height to h-5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: npm_check errors - unused imports, combinedItems order, flow.labels type, badge px-1 padding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unused FolderService imports, fix label badge alignment in RunBadges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore deleted service imports in variables page, remove empty loadFolders

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: trigger CI with updated ee-repo-ref

* chore: update ee-repo-ref to merged EE companion PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: trigger fresh CI run for updated ee-repo-ref

* fix: match label badge size with other badges in RunBadges using {large} prop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove icon from RunBadges label badge to fix vertical alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: shorten "Job kind" to "Kind" in run badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add small inline tag icon (10px, -mt-px) to label badge without disrupting height

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add "Label: X" hover tooltip to all label badges, show hidden labels on +N hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add tag icon and "Label: X" tooltip to home page label filter badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: show LabelsInput even when path is hidden in ResourceEditor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add labels input to new resource creation drawer (AppConnectInner)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* iterate

* fix: add LabelsInput to all resource creation steps in AppConnectInner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reduce LabelsInput top margin from -mt-3 to -mt-1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase negative margin to -mt-2 for tighter spacing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: split the difference with -mt-1.5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adjust to -mt-1 for label spacing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: per-site label spacing via class prop instead of global negative margin

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: make label badges clickable to toggle label filter on resources, variables, schedules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use proper array indexOf for label filter toggle, set undefined correctly on removal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use delete instead of undefined to properly clear label filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /labels/list endpoint and autocomplete dropdown to LabelsInput

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use inline preventDefault for Svelte 5 event handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add "Create new" option in label autocomplete, regenerate sqlx cache with update_sqlx.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add GIN indexes on labels column for all 16 tables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove CONCURRENTLY from GIN index creation in migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add comprehensive label coverage for pull, edit, removal across all item types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify job label filters to only use v2_job.labels, remove wm_labels back-compat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add integration tests for job label propagation, display, and filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review findings — missing labels in fetch_script_for_update, app rename, escape key bug

- Add `labels` to SELECT in `fetch_script_for_update` to prevent lost labels on script clone
- Pass `labels` in app branch of `moveRenameManager.ts` so app renames preserve labels
- Clear `inputValue` before `adding = false` in LabelsInput escape handler to prevent accidental label add via onblur
- Fix `test_job_label_filter` to complete jobs via SQL (label filtering only works on completed jobs)
- Add `test_wm_labels_from_result_merged_with_static_labels` integration test using Bun

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:39:32 +00:00
centdix
b960598431 fix: hide deprecated cli metadata commands (#8699)
* fix: hide deprecated cli metadata commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: simplify generate-metadata guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-03 15:38:01 +00:00
centdix
f234df97ec fix: support raw app deployment history (#8657)
* fix: support raw app deployment history

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: refresh deployment history diffs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: preserve deployment history preview context

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: limit deployment history to diffs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: remove unused history backend hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-03 14:56:56 +00:00
Diego Imbert
a3073ad824 fix: debounce S3 proxy logs (#8694)
* Debounce S3 proxy logs

* missing workspace id

* nit perf

* nit

* prevent DOS

* handle 4xx/5xx statuses

* fix magic numbers
2026-04-03 13:46:09 +00:00
Alexander Petric
ceaa613522 chore: add .playwright-mcp/ to .gitignore (#8696)
Prevent Playwright MCP console logs from being accidentally committed.
Addresses GitHub security advisory for leaked credentials in log files.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:39:10 +00:00
Ruben Fiszel
0317d5891c feat: add powershell common parameters support (#8683)
* feat: add powershell common parameters support (-Verbose, -Debug, -ErrorAction, -WhatIf)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add powershell common params to script editor test panel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: detect CmdletBinding from code instead of schema in script editor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: ignore commented-out CmdletBinding in powershell detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use preference variables for -Verbose/-Debug instead of CLI args

Verbose/Debug output goes to PowerShell stream 4/5 which isn't captured
by the 2>&1 redirect. Setting $VerbosePreference/$DebugPreference in the
wrapper scope propagates to child scripts and output flows through the
host to stderr, which Windmill captures as logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use *>&1 to capture all powershell streams including verbose/debug

The previous 2>&1 only captured error stream. Verbose (stream 4) and
debug (stream 5) output was silently lost. Using *>&1 redirects all
streams to success stream so they flow through Tee-Object into logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use targeted stream redirects (4>&1 5>&1 2>&1) instead of *>&1

*>&1 breaks $PSCmdlet.ShouldProcess() by redirecting internal streams.
Only redirect verbose (4), debug (5), and error (2) to success stream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert to 2>&1 redirect — stream 4/5 redirects break powershell

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use 4>&1 5>&1 for verbose/debug capture, remove WhatIf support

Stream 4/5 redirects capture verbose/debug in the pipeline. WhatIf is
removed because $PSCmdlet.ShouldProcess() doesn't work when scripts
are invoked through Windmill's wrapper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: redirect verbose/debug to files to keep result pipeline clean

Verbose (4) and debug (5) streams are redirected to separate log files
during script execution, then output via Write-Host after the script
completes. This keeps them out of the Tee-Object pipeline (used for
result extraction) while still showing them in the job logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: output verbose/debug to stderr via Console.Error for log capture

Write-Host goes to stdout which gets mixed with result output and
truncated by OSS log threshold. Using [Console]::Error.WriteLine()
writes to stderr which Windmill captures separately as logs, with
VERBOSE:/DEBUG: prefixes for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: redirect script output to file only, send verbose/debug to stdout

The OSS log storage has a 9KB threshold. Previously, Tee-Object sent
the full JSON result to both stdout (logs) and the pipe file, eating
the log budget. Now script output goes only to the pipe file (> $pipe),
and only verbose/debug messages go to stdout for the log viewer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: preserve original Tee-Object behavior, append verbose/debug after

Keep the original wrapper behavior (Tee-Object to stdout + pipe file).
Only add 4>verbose.log 5>debug.log to capture those streams, and
output them at the end of logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: inject preference vars into main.ps1 instead of CLI args

Passing -Verbose/-Debug as CLI args causes PowerShell module loading
to emit verbose noise. Instead, inject $VerbosePreference/$DebugPreference
inside main.ps1's try block so they only affect user code. Stream 4/5
are still redirected to files in the wrapper for log output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore common param toggles from previous job args on Run Again

Extract _wm_ps_* keys from loaded args and initialize the toggle
states in PowerShellCommonParams. Also strip them from main args
so they don't appear as unknown schema form inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show active common param badges when section is collapsed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: inject ErrorAction as preference variable instead of CLI arg

-ErrorAction as a CLI arg only affects the caller, not the script's
internal error handling. Setting $ErrorActionPreference inside main.ps1
correctly overrides the default 'Stop' behavior for the user's code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: ensure full backward compatibility with existing powershell scripts

- Only filter common param names when [CmdletBinding()] is present
  (without it, $Verbose etc. are regular user-defined parameters)
- Only add 4>verbose.log 5>debug.log and log output lines when common
  params are actually enabled — original wrapper is unchanged otherwise

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: lighter styling for common params section

Replaced heavy Section component with a subtle inline chevron toggle
labeled "Common parameters". Smaller text, secondary color, indented
options. Badges still show when collapsed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rename section to CmdletBinding parameters

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add ..Default::default() to windmill-parser-r (new parser from main)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: missing comma in graphql parser test + merge main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing commas before ..Default::default() in parser tests

Merge from main brought test constructors with formatting issues
from the original automated script (missing comma between last field
and ..Default::default()).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore comment markers in nu parser test that script broke

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review — ignore commented CmdletBinding, clear stale params

1. Parser: strip comment lines before detecting [CmdletBinding()] to
   avoid false positives from commented-out attributes
2. RunForm: always assign psCommonParams (not just when non-empty) so
   stale settings from a previous run don't leak into later runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:03:22 +00:00
Ruben Fiszel
7485a5db04 remove dead legacyBehaviour param from metadata functions (#8695)
The legacyBehaviour parameter on generateFlowLockInternal,
generateAppLocksInternal, and generateScriptMetadataInternal was never
passed as true — the tree parameter alone determines the code path.
Replace `!legacyBehaviour && tree` with just `tree` and remove the
param from all call sites. getRawWorkspaceDependencies keeps its
legacyBehaviour param since it has a real effect there.

Also adds 6 integration tests covering generate-metadata lockfile
generation and idempotency for scripts, flows, and apps.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:01:34 +00:00
Diego Imbert
0cfa462c37 fix: optimize S3 proxy performance (#8685)
* perf: re-export GetOptions and GetRange from object_store

Needed by S3 proxy to use get_opts with range for single-request
range fetches instead of HEAD + get_range.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Avoid logging S3 proxy requests as info

* Revert "Avoid logging S3 proxy requests as info"

This reverts commit b6359a7a03.

* Don't log s3 proxy

* Revert "Don't log s3 proxy"

This reverts commit 2b21ee3c78.

* Update duckdb

* AUTOMATIC_MIGRATION for ducklake

* ee repo ref

* wrong comment

* chore: update ee-repo-ref to 41b0d1cb312919109407640fc4bd7060cfe0e107

This commit updates the EE repository reference after PR #505 was merged in windmill-ee-private.

Previous ee-repo-ref: 9b97a1c563365006657c4c6cde6e7df31c5173c3

New ee-repo-ref: 41b0d1cb312919109407640fc4bd7060cfe0e107

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
Co-authored-by: Ruben Fiszel <ruben@windmill.dev>
2026-04-03 11:53:10 +00:00
Kain
6656b46f10 fix: align script push metadata warning with generated locks (#8690) 2026-04-03 11:41:31 +00:00
Alexander Petric
5b7fa63bf1 feat: add application-level heartbeat support for websocket triggers (#8686)
* feat: add application-level heartbeat support for websocket triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update SQLx metadata

* chore: regenerate auto-generated schema and skill files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle missing heartbeat channel gracefully, fix TextInput props

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: only clone heartbeat sender when heartbeat is configured

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 11:31:08 +00:00
hugocasa
cdf3c29664 fix: use pre-aggregated stats for telemetry job usage queries (#8688)
* fix: use pre-aggregated worker_group_job_stats for telemetry job usage queries

Replace slow v2_job_completed JOIN v2_job scans with reads from the
pre-aggregated worker_group_job_stats table for the schedule-only
job_usage (48h) and daily_job_usage queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to telemetry-query-timeout branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 94567b204a5536ec3dc7591830c58c5bdc1d8381

This commit updates the EE repository reference after PR #506 was merged in windmill-ee-private.

Previous ee-repo-ref: da62a74e965a079d95eea6510f2ac7fc004cdccc

New ee-repo-ref: 94567b204a5536ec3dc7591830c58c5bdc1d8381

Automated by sync-ee-ref workflow.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-03 11:30:33 +00:00
Ruben Fiszel
81ab777cbb chore(main): release 1.673.0 (#8660)
* chore(main): release 1.673.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-02 19:41:12 -04:00
hugocasa
61a867f086 Revert "feat: restore bun for dedicated workers, fix dispatch & serialization, cross-workspace deps (#8645)" (#8687)
This reverts commit 619ebb65ce.
2026-04-02 23:09:38 +00:00
Diego Imbert
8581a3300d fix: Run typed pg queries in a single protocol conversation (#8679)
* Run typed pg queries in one protocol conversation

* Update pg queries for db manager to use explicit type syntax

* Unused import
2026-04-02 22:00:38 +00:00
Ruben Fiszel
ff5fa9f64f fix: poll for preview results to avoid undici headers timeout (#8682)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 20:55:46 +00:00
Ruben Fiszel
55e8a5cff1 fix: add HMAC signature verification to Slack interactive callback endpoint (#8611)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 20:55:23 +00:00
Ruben Fiszel
1a39bd538d add opt-in SMTP click tracking disable for email links (#8665)
* feat: add opt-in SMTP click tracking disable for email links

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt for email clicktracking branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt after simplification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: exclude trailing commas from URL regex in clicktracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 57dd88faa3b0b354f813385cf3f6a34eca54a4a1

This commit updates the EE repository reference after PR #504 was merged in windmill-ee-private.

Previous ee-repo-ref: 5cf901db7fb0ea169b09564372e444f28e23ac3a

New ee-repo-ref: 57dd88faa3b0b354f813385cf3f6a34eca54a4a1

Automated by sync-ee-ref workflow.

* chore: update ee-repo-ref.txt to include dedicated worker fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-02 20:44:08 +00:00
Ruben Fiszel
1049c1026b fix Windows compilation errors in MemoryLimitedChild (#8684)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 20:33:23 +00:00
hugocasa
619ebb65ce feat: restore bun for dedicated workers, fix dispatch & serialization, cross-workspace deps (#8645)
* feat: restore bun as default runtime for dedicated workers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add context comment for bun dedicated worker nodejs migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: dedicated worker dispatch for flows + add E2E tests

- Add workspace_id prefix to dedicated worker map lookup keys
- Update ee-repo-ref for dedicated worker path handling fix
- Add spawn_test_worker_dedicated/in_test_worker_dedicated test helpers
- Add 6 E2E tests for dedicated workers:
  - test_dedicated_flow_rawscript (regression for "Script not found" bug)
  - test_dedicated_flow_workspace_script
  - test_dedicated_flow_multiple_steps
  - test_dedicated_standalone_script
  - test_dedicated_runner_group
  - test_dedicated_flow_runners
- Add dedicated_flows.sql fixture with scripts, flows, and worker config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: always run dependency job for dedicated worker scripts

When a script with dedicated_worker=true is deployed with a pre-computed
lock (e.g. via wmill sync push), no dependency job was created, so the
dedicated worker never detected the update and kept running the old version.

Now dedicated worker scripts always generate a dependency job regardless
of whether a lock is provided. The dependency job runs on the dedicated
worker and triggers a restart so it picks up the new script version.

Fixes #8638

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use serial_test for dedicated worker tests to avoid WORKER_CONFIG races

Dedicated worker tests need non-default worker tags in the global
WORKER_CONFIG. When run in parallel (CI uses --test-threads=10),
multiple tests clobber each other's config. Use #[serial] to ensure
dedicated worker tests run sequentially.

Also load worker config from DB via load_worker_config() instead of
manually setting WORKER_CONFIG fields, ensuring consistency with the
monitor's reload path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: nodejs dedicated worker script_path shadowing + add multi-language E2E tests

Fix script_path shadowing in bun_executor nodejs branch where the wrapper
file path was passed to handle_dedicated_process instead of the logical
path, causing "Script not found" for all //nodejs dedicated workers.

Add E2E tests for dedicated flows in all supported languages:
- test_dedicated_flow_deno
- test_dedicated_flow_python
- test_dedicated_flow_bunnative (V8 PrewarmedIsolate path)
- test_dedicated_flow_bun_nodejs (//nodejs annotation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify dedicated worker dispatch + add serialization and E2E tests

- Unified lookup: always use {workspace}:{runnable_path} for dedicated
  worker dispatch, replacing the flow_step_id iteration approach
- Added serialization_semaphore parameter to executor start_worker fns
- Added E2E tests: cross-workspace isolation, conflicting flow step IDs,
  preprocessor on dedicated worker
- Added workspace field to RunJob for cross-workspace test support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cross-workspace workspace dependencies on workers page

Add two new instance-level endpoints to the configs router:
- GET /configs/list_all_workspace_dependencies
- GET /configs/list_all_dedicated_with_deps

Both require devops role and return data across all workspaces,
enabling the workers page to show a consistent view of which
workspace dependencies exist regardless of which workspace the
user is browsing.

Update DedicatedWorkersSelector to use the new cross-workspace
endpoints with fallback to per-workspace calls for non-devops users.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to include dedicated worker lookup simplification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: use branch name for ee-repo-ref (CI can't fetch by SHA from non-default branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ee-repo-ref.txt with new reference

* sqlx

* fix: revert serialization semaphore, multi-workspace picker, dep conflict warnings

- Remove serialization_semaphore from executor start_worker signatures
- Remove serialization test and fixtures
- Fix DedicatedWorkersSelector to preserve tags from other workspaces
  when toggling in the picker
- Track workspace deps per-workspace for conflict detection
- Show warning when dep exists in another workspace but not the script's
- Group runner groups per-workspace to prevent cross-workspace merging
- Add workspace to dep badge link URL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify exec protocol — execd: for single-script, exec: for runner groups

Add execd:/execd_preprocess: commands to bun/deno/python wrappers for
single-script dedicated workers (no path needed). Runner groups keep
exec:/exec_preprocess: with path for multi-script disambiguation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add unit tests for execd:/exec: wrapper protocol

Verify generate_multi_script_wrapper produces both execd: (single-script)
and exec: (runner group) protocol handlers, including preprocessor variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update commit reference in ee-repo-ref.txt

* fix: remove beta badge from squash loop, keep tooltip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update protocol tests to use execd: for single-script wrappers

Deno and bun single-script protocol tests now send execd:{args} instead
of exec:{path}:{args}, matching the updated wrapper protocol. Multi-script
(runner group) tests continue to use exec:{path}:{args}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unused TEST_SCRIPT_PATH in deno protocol tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:37:02 +00:00
Ruben Fiszel
d2d6810db9 feat: add LIMIT_WINDOWS_TO_1CU env var for Windows worker memory limits (#8681)
* feat: add LIMIT_WINDOWS_TO_1CU env var for Windows worker memory limits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CI review — stricter env var parsing and SAFETY comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:31:07 +00:00
Ruben Fiszel
39af1b75af fix: skip generate-metadata confirmation prompt in non-interactive CI (#8678)
* fix: generate-metadata non-interactive CI and misleading log path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add hash consistency tests for workspace deps staleness checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:28:39 +00:00
Ruben Fiszel
d569e9e29c fix: resolve race condition where flow sync push reverts to stale version (#8673)
* fix: resolve race condition where flow sync push reverts to stale version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add sqlx offline cache for new queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add version guard before writing to prevent TOCTOU race

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:17:37 +00:00
Ruben Fiszel
381011a4a8 fix: pass selected language to AI agent when generating flow scripts (#8680)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:06:22 +00:00
hugocasa
f0437eba19 feat: add endpoint to restart workers in a worker group (#8659)
* feat: add endpoint to restart workers in a worker group

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate sqlx query cache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing modules field to RawCode in tests and regenerate sqlx cache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* update sqlx

* fix: use require_devops_role for restart worker group endpoint

Matches the permission level of the clean cache endpoint (update_config),
allowing both superadmin and devops role users.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback for restart worker group

- Fix OpenAPI description to say "devops role" instead of "superadmin"
- Add dispatch('reload') after restart to refresh worker list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: only dispatch reload on successful restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:36:48 +00:00
Ruben Fiszel
d8edaad99c improve bun bundle error message for syntax errors (#8677)
* fix: improve bun bundle error message for syntax errors like unclosed brackets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: remove error hint from node_builder.ts wrapper catch blocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:19:46 +00:00
Ruben Fiszel
7fd0bf974d fix: respect disabled fields in JSON input mode (#8663)
* fix: respect disabled fields in JSON input mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard against undefined default in disabled field enforcement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: show toast when disabled fields are reset to defaults on run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:13:12 +00:00
hugocasa
6a5cfbc159 feat: add Entra ID (Azure Workload Identity) database auth (#8526)
* feat: add Entra ID (Azure Workload Identity) support for database auth

Add support for Azure Workload Identity to authenticate to Azure Database
for PostgreSQL using short-lived Entra ID tokens. Mirrors the existing
AWS IAM RDS auth pattern.

- Extract shared DatabaseParams to db_params.rs for reuse across providers
- Add DatabaseUrl::EntraId variant with token refresh
- Detect "entraid" magic password in DATABASE_URL
- Unified background refresh task for both IAM RDS and Entra ID
- Support sovereign clouds via AZURE_AUTHORITY_HOST env var

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore needs_refresh() check in background token refresh task

The unified refresh task was missing the needs_refresh() gate, causing
it to refresh tokens every 10 seconds instead of only when near expiry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref.txt for Entra ID branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move entraid env var reads inside cfg(private) block

Fixes unused variable warnings in OSS and EE-without-private builds
where -D warnings is enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update ee-repo-ref to 0e001bab643e449b3310b0692dd3598ee0902ecc

This commit updates the EE repository reference after PR #483 was merged in windmill-ee-private.

Previous ee-repo-ref: 44199013ed0c96680672e718f35124aa34a5d010

New ee-repo-ref: 0e001bab643e449b3310b0692dd3598ee0902ecc

Automated by sync-ee-ref workflow.

* refactor: add needs_refresh() and refresh_if_needed() to DatabaseUrl

Simplify duplicated refresh logic per Claude review suggestion.
Background task and get_database_url() now use shared methods
instead of matching on each variant individually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: windmill-internal-app[bot] <windmill-internal-app[bot]@users.noreply.github.com>
2026-04-02 16:00:33 +02:00
Ruben Fiszel
8c3c97f7a6 fix: sanitize MCP tool schemas for JSON Schema draft 2020-12 compliance (#8666)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-02 13:53:16 +00:00
Diego Imbert
c3a1c26be1 nit: revert ee.rs in substitute_ee_code.sh (#8672) 2026-04-02 10:16:28 +00:00
Ruben Fiszel
c87a6a0f2c fix: support branch-specific folder.meta.yaml in missing-meta check (#8661)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-02 10:05:02 +00:00
hugocasa
350ffdce29 fix: pre-fix trigger edited_by for superadmins not in workspace (#8669)
Add a migration that runs just before 20260318000000 (add_permissioned_as).
For each trigger table, if the email column still exists, update edited_by
to the trigger's email when the user is not in the workspace but is a
superadmin. This ensures the subsequent permissioned_as migration stores
the raw email instead of an invalid u/{username} reference.

If 20260318000000 was already applied, the migration is a no-op (email
column is gone, guarded by information_schema check).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:39:44 +00:00
centdix
28c073056c fix: correct raw app flow inputs (#8667)
* fix: correct raw app flow inputs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: remove raw app legacy migration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-02 01:57:56 +00:00
Ruben Fiszel
c86846ac19 rate limit token creation on CLOUD_HOSTED (10/min per user) (#8664)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 01:52:02 +00:00
Ruben Fiszel
7ab0ea581d fix: strip f/ prefix from folder paths when deploying from workspace forks (#8662)
* fix: strip f/ prefix from folder paths when deploying from workspace forks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract folderName helper for f/ prefix stripping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:06:19 +00:00
Ruben Fiszel
bcce627387 fix: validate rd redirect on login with same rules as logout (#8655)
* fix: validate rd redirect on login with same rules as logout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sanitize rd at source in login callback to prevent leaking to goto

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate rd redirect in Login component for fresh login flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 19:44:22 +00:00
Ruben Fiszel
175af8032f chore(main): release 1.672.0 (#8654)
* chore(main): release 1.672.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-04-01 12:05:44 -04:00
Diego Imbert
1784bed4ac Fix WAC RunForm layout (#8658) 2026-04-01 12:01:29 -04:00
Ruben Fiszel
a46aa641f9 feat: add R language support (#8263)
* feat: add R language support

Add R as a new supported scripting language in Windmill, following the
same pattern used for Ruby. Includes:

- Backend: ScriptLang::Rlang enum variant, DB migration, tree-sitter-r
  parser crate with tests, WASM parser binding, R executor with NSJail
  sandboxing, job dispatch and signature parsing
- Frontend: language picker, R icon, syntax highlighting, editor bar
  insertions (Sys.getenv, get_variable, get_resource), schema inference,
  init code template, BETA badge
- CLI: .r extension mapping, sync support, bootstrap template

R scripts use `main <- function(...)` syntax, jsonlite for JSON
serialization, and system curl for the Windmill client helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add R package resolution and installation

Parse library()/require() calls from R scripts to extract dependencies.
Resolve versions from CRAN, cache lockfiles in pip_resolution_cache,
and install packages to a shared R library cache. The run step sets
R_LIBS_USER so installed packages are available to the script.

- Parser: parse_r_requirements() extracts package names from AST
- Executor: resolve() generates lockfile, install() installs from CRAN
- Worker lockfiles: wire up R resolve for dependency jobs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add nsjail sandboxing for R resolve and install phases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: fix R get_variable/get_resource and add sandbox annotation + e2e tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: fix R arg inference with JS fallback parser and get_variable/get_resource

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix flake

* nsjail

* nits

* fix: R install improvements - suppress verbose output, flat lockfile logging, Dockerfile R support, rlimits

- Suppress renv verbose output during resolve and install (controlled by #verbose annotation)
- Filter renv from install list (already loaded, causes noisy restart message)
- Log compact "resolved N packages" instead of full renv.lock JSON
- Add R (r-base, r-cran-renv) to DockerfileFull and DockerfileFullEe
- Use disable_rl for nsjail install config (R compiles from source)
- Reduce default concurrency from 20 to 5
- Add rlang to openflow.openapi.yaml
- Fix MainArgSignature (no_main_func -> auto_kind) after main merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* final

* fix: remove accidental R install from multiplayer Dockerfile

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: remove R from Windows build and DockerfileExtra

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: rename R migration to avoid timestamp collision with trigger_filter_logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* all

* fix: R install improvements - suppress verbose output, flat lockfile logging, Dockerfile R support, rlimits

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add clear error when Rscript binary is missing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: fix type errors in R fallback parser, use format! in wrap(), add R system prompts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: pyranota <pyra@duck.com>
2026-04-01 06:11:37 +00:00
Alexander Petric
7069202190 fix: approval page freeze, stale state, and missing approval link (#8653)
* fix: prevent browser freeze when approval form number field has no default value

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: disable approval buttons and keep polling after approve/deny action

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: restore approval page link and prevent double resume in flow viewer

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: guard against NaN fallback in Range and reset actionTaken on new approval step

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix approval page url

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-01 05:25:22 +02:00
Ruben Fiszel
df7a8eebcf chore(main): release 1.671.0 (#8650)
* chore(main): release 1.671.0

* Apply automatic changes

---------

Co-authored-by: rubenfiszel <275584+rubenfiszel@users.noreply.github.com>
2026-03-31 21:26:34 +00:00
centdix
2862c1cf56 add codex PR review workflow (#8626)
* feat: add codex PR review workflow

* refactor: simplify codex PR review comments

* chore: use ubicloud for codex review

* fix: harden codex review workflow

* chore: use chatgpt auth for codex review
2026-03-31 19:21:39 +00:00
centdix
d67223de9b chore: use fully qualified tmux pane targets in webmux systemPrompt (#8651)
* fix: use fully qualified tmux pane targets in webmux systemPrompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: anchor tmux pane targets to $TMUX_PANE for stability across window switches

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:16:38 +00:00
Ruben Fiszel
da8886be85 feat: add configurable preview job tag override in default tags settings (#8649)
* feat: add configurable preview job tag override in default tags settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: skip re-tagging for FlowPreview jobs when preview override is active

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:59:23 +00:00
centdix
040a199685 feat: support hub flows in raw app runnables (#8627)
* feat: support hub flows in raw app runnables

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: support hub flow previews in app ui

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: move trigger context into flow graph viewer

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use script viewer for hub flow steps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: stretch raw app flow previews to pane height

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: improve hub flow run links

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: stabilize hub flow preview drawer

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: align hub flow id validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: fix runnable panel indentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 18:26:56 +00:00
Alexander Petric
6c3c971af5 feat: improve CLI flow log streaming and job inspection (#8644)
* fix: improve CLI flow log streaming, sub-job listing, and failure handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add hierarchical flow status in job get and aggregated flow logs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove duplicate ansi color hint in job logs output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update cli-commands skill with new job/flow features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add integration tests for flow job inspection and log aggregation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal friction discovery doc from branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: trim cli-commands skill to reduce context bloat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update job command descriptions and regenerate skills.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: commit auto-generated files from system_prompts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments on flow streaming and test assertions

- Move for-loop waiting logic outside --silent guard (Cubic #2)
- Break outer loop when for-loop module fails (Cubic #3)
- Strengthen test assertion: toContain("a") -> toContain("a: Generate data") (Cubic #1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: generator regex truncating descriptions with parentheses

The .command() regex used [^)]+ for the second arg, stopping at the
first ')' inside description strings like "(machine-friendly)".
Now matches quoted strings properly before falling back.

Fixes 6 truncated descriptions across job, flow, and script commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 18:22:18 +00:00
Ruben Fiszel
852c59efbb fix: return default_args/enums in approval info and fix subflow resume buttons (#8648)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:16:47 +00:00
1002 changed files with 63279 additions and 13436 deletions

23
.github/codex/pr-review.prompt.md vendored Normal file
View File

@@ -0,0 +1,23 @@
You are reviewing a GitHub pull request for this repository.
Review policy:
- Read `CLAUDE.md` before reviewing code.
- Only report issues you are confident are real and introduced by this pull request.
- Focus on bugs, security problems, and clear `CLAUDE.md` violations.
- Do not report style nits, speculative concerns, pre-existing issues, or problems that a normal linter/typechecker would obviously catch.
- Keep the review high signal. If there is no clear issue, return no findings.
Repository context:
- Read `./.github/codex/pr-review-context.md` for the PR metadata and the exact diff commands to use.
- Review only the changes introduced by this PR.
- Read additional files only when the diff is not enough to validate a finding.
- Do not modify any files.
Output requirements:
- Return a GitHub PR comment in markdown, not JSON.
- Start with `## Codex Review`.
- Give a short overall summary first.
- If you found high-signal issues, list them in a short numbered list with file paths and line numbers when you know them confidently.
- If you found no high-signal issues, say that explicitly.
- End with a `### Reproduction instructions` section containing a short descriptive paragraph for a tester explaining how to navigate the app to observe the change. Do not make it a numbered list. If the diff is not enough to infer this safely, say that plainly.
- Prefer at most 10 findings.

145
.github/workflows/codex-pr-review.yml vendored Normal file
View File

@@ -0,0 +1,145 @@
name: Codex Auto Review
on:
pull_request:
types: [ready_for_review, opened]
concurrency:
group: codex-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
codex-review:
runs-on: ubicloud-standard-2
timeout-minutes: 30
if: github.event.pull_request.draft == false && github.event.pull_request.head.repo.fork == false
permissions:
contents: read
issues: write
steps:
- name: Check Codex configuration
id: codex_config
env:
CODEX_AUTH_JSON: ${{ secrets.CODEX_AUTH_JSON }}
run: |
if [ -n "$CODEX_AUTH_JSON" ]; then
echo "enabled=true" >> "$GITHUB_OUTPUT"
else
echo "enabled=false" >> "$GITHUB_OUTPUT"
echo "CODEX_AUTH_JSON is not configured; skipping Codex review."
fi
- name: Checkout repository
if: steps.codex_config.outputs.enabled == 'true'
uses: actions/checkout@v5
with:
ref: refs/pull/${{ github.event.pull_request.number }}/merge
fetch-depth: 1
- name: Set up Node.js
if: steps.codex_config.outputs.enabled == 'true'
uses: actions/setup-node@v4
with:
node-version: 22
- name: Install Codex CLI
if: steps.codex_config.outputs.enabled == 'true'
run: npm install --global @openai/codex@0.117.0
- name: Configure file-backed Codex auth
if: steps.codex_config.outputs.enabled == 'true'
env:
CODEX_AUTH_JSON: ${{ secrets.CODEX_AUTH_JSON }}
run: |
CODEX_HOME="$HOME/.codex"
echo "CODEX_HOME=$CODEX_HOME" >> "$GITHUB_ENV"
mkdir -p "$CODEX_HOME"
chmod 700 "$CODEX_HOME"
cat > "$CODEX_HOME/config.toml" <<'EOF'
cli_auth_credentials_store = "file"
EOF
printf '%s' "$CODEX_AUTH_JSON" > "$CODEX_HOME/auth.json"
chmod 600 "$CODEX_HOME/auth.json"
node -e 'JSON.parse(require("fs").readFileSync(process.argv[1], "utf8"))' "$CODEX_HOME/auth.json"
- name: Pre-fetch base and head refs for the PR
if: steps.codex_config.outputs.enabled == 'true'
env:
PR_BASE_REF: ${{ github.event.pull_request.base.ref }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
git fetch --no-tags origin \
"$PR_BASE_REF" \
"+refs/pull/$PR_NUMBER/head"
- name: Write Codex review context
if: steps.codex_config.outputs.enabled == 'true'
env:
PR_REPOSITORY: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
PR_TITLE: ${{ github.event.pull_request.title }}
PR_BODY: ${{ github.event.pull_request.body || '' }}
run: |
mkdir -p .github/codex
node <<'NODE'
const fs = require('fs');
const lines = [
`Repository: ${process.env.PR_REPOSITORY}`,
`PR number: ${process.env.PR_NUMBER}`,
`Base SHA: ${process.env.PR_BASE_SHA}`,
`Head SHA: ${process.env.PR_HEAD_SHA}`,
'',
'PR title:',
process.env.PR_TITLE || '(empty)',
'',
'PR body:',
process.env.PR_BODY || '(empty)',
'',
'Changed commits command:',
`git log --oneline ${process.env.PR_BASE_SHA}...${process.env.PR_HEAD_SHA}`,
'',
'Changed files command:',
`git diff --stat ${process.env.PR_BASE_SHA}...${process.env.PR_HEAD_SHA}`,
'',
'Full review diff command:',
`git diff --unified=0 ${process.env.PR_BASE_SHA}...${process.env.PR_HEAD_SHA}`
];
fs.writeFileSync('.github/codex/pr-review-context.md', `${lines.join('\n')}\n`);
NODE
- name: Run Codex review
if: steps.codex_config.outputs.enabled == 'true'
run: |
codex exec \
-C "$GITHUB_WORKSPACE" \
-m gpt-5.4 \
-c 'model_reasoning_effort="xhigh"' \
-s read-only \
-o codex-final-message.md \
- < .github/codex/pr-review.prompt.md
- name: Post Codex review comment
if: steps.codex_config.outputs.enabled == 'true'
uses: actions/github-script@v7
with:
github-token: ${{ github.token }}
script: |
const fs = require('fs');
const path = `${process.env.GITHUB_WORKSPACE}/codex-final-message.md`;
if (!fs.existsSync(path)) {
core.info('Codex did not produce a final message; skipping PR comment.');
return;
}
const body = fs.readFileSync(path, 'utf8').trim();
if (!body) {
core.info('Codex final message was empty; skipping PR comment.');
return;
}
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.pull_request.number,
body,
});

3
.gitignore vendored
View File

@@ -25,7 +25,10 @@ rust-client/Cargo.toml
backend/target
frontend/node_modules
typescript-client/node_modules
ai_evals/node_modules
ai_evals/results/
frontend/.svelte-kit
backend/chrome_profiler.json
.fast-check/
__pycache__/
.playwright-mcp/

View File

@@ -43,7 +43,7 @@ profiles:
- Pane 0: this pane (claude agent)
- Pane 1: backend (cargo watch -x run)
- Pane 2: frontend (npm run dev)
To check logs, use: \`tmux capture-pane -t .1 -p -S -50\` (backend) or \`tmux capture-pane -t .2 -p -S -50\` (frontend).
To check logs, use: \`tmux capture-pane -t $(tmux display-message -t "$TMUX_PANE" -p '#{session_name}:#{window_name}').1 -p -S -50\` (backend) or \`tmux capture-pane -t $(tmux display-message -t "$TMUX_PANE" -p '#{session_name}:#{window_name}').2 -p -S -50\` (frontend).
For this window specifically, backend is running on: ${BACKEND_PORT} and frontend is running on: ${FRONTEND_PORT}.
To connect to the database, use this connection string: ${DATABASE_URL}
Because we are running backend with cargo watch, to verify your changes, just check the logs in the backend pane. No need for cargo check.
@@ -72,7 +72,7 @@ profiles:
Pane layout (current window):
- Pane 0: this pane (claude agent)
- Pane 1: frontend (npm run dev)
To check logs, use: \`tmux capture-pane -t .1 -p -S -50\` (frontend).
To check logs, use: \`tmux capture-pane -t $(tmux display-message -t "$TMUX_PANE" -p '#{session_name}:#{window_name}').1 -p -S -50\` (frontend).
On this window specifically, frontend is running on: ${FRONTEND_PORT}.
To connect to the database, use this connection string: ${DATABASE_URL}
Because we are running frontend with npm run dev, to verify your changes, just check the logs in the frontend pane. No need for npm run build.

View File

@@ -1,5 +1,303 @@
# Changelog
## [1.684.1](https://github.com/windmill-labs/windmill/compare/v1.684.0...v1.684.1) (2026-04-14)
### Bug Fixes
* stop escalating missing email recipients to critical alert ([#8833](https://github.com/windmill-labs/windmill/issues/8833)) ([6158ff2](https://github.com/windmill-labs/windmill/commit/6158ff2ebe29d6a9a7ff4d524e152bb2f7c24dfc))
## [1.684.0](https://github.com/windmill-labs/windmill/compare/v1.683.2...v1.684.0) (2026-04-14)
### Features
* cascade trigger script_path on runnable rename + fix trigger permissioned_as ([#8823](https://github.com/windmill-labs/windmill/issues/8823)) ([64ba3a6](https://github.com/windmill-labs/windmill/commit/64ba3a632eee041d09093e89961f63f2a090fcad))
* **frontend:** improve permissions drawer UX and auto-share resource variables ([#8824](https://github.com/windmill-labs/windmill/issues/8824)) ([91064ce](https://github.com/windmill-labs/windmill/commit/91064ce85712b85e32e3f8cff2a0794cd5597ed6))
### Bug Fixes
* allow dedicated flow substeps to inherit parent tag ([#8832](https://github.com/windmill-labs/windmill/issues/8832)) ([aebf758](https://github.com/windmill-labs/windmill/commit/aebf758412383dd65e0bf6c72de8f2668561cd88))
* compute wall-clock duration for flow job groups in CLI ([#8826](https://github.com/windmill-labs/windmill/issues/8826)) ([e1dbce0](https://github.com/windmill-labs/windmill/commit/e1dbce02c22bcaa3d7d447ee54db69373bc1cf7b))
* DB Manager delete/update for timestamp and serial types ([#8830](https://github.com/windmill-labs/windmill/issues/8830)) ([06fe809](https://github.com/windmill-labs/windmill/commit/06fe809ecc3c6b37af7582175f9dd90c2c2a8f98))
* hide serial types in column type dropdown for existing columns ([#8828](https://github.com/windmill-labs/windmill/issues/8828)) ([7fe639d](https://github.com/windmill-labs/windmill/commit/7fe639d91e93a6b3069e0d87b57c232d67c8ad65))
## [1.683.2](https://github.com/windmill-labs/windmill/compare/v1.683.1...v1.683.2) (2026-04-14)
### Bug Fixes
* detect WAC v2 Python workflows that only use step() (no [@task](https://github.com/task)) ([#8819](https://github.com/windmill-labs/windmill/issues/8819)) ([89c8e4b](https://github.com/windmill-labs/windmill/commit/89c8e4bb9680c179bf44a66a22dcf047334944ae))
* persist indexer max_index_time_window_secs setting ([#8821](https://github.com/windmill-labs/windmill/issues/8821)) ([4dc54ca](https://github.com/windmill-labs/windmill/commit/4dc54ca3aa14beab175da59eb8b9072918301b43))
## [1.683.1](https://github.com/windmill-labs/windmill/compare/v1.683.0...v1.683.1) (2026-04-13)
### Bug Fixes
* use OpenAPI 3.0 nullable pattern for getOpenDeploymentRequest ([#8816](https://github.com/windmill-labs/windmill/issues/8816)) ([f7f26b3](https://github.com/windmill-labs/windmill/commit/f7f26b32244536b6efb7c1b5aafd4a7644dcb42f))
## [1.683.0](https://github.com/windmill-labs/windmill/compare/v1.682.0...v1.683.0) (2026-04-13)
### Features
* add black-box ai eval benchmarks ([#8618](https://github.com/windmill-labs/windmill/issues/8618)) ([cdcc564](https://github.com/windmill-labs/windmill/commit/cdcc56461b77554964622f490ae901f170886595))
* add deploy restriction rule and fork review requests ([#8804](https://github.com/windmill-labs/windmill/issues/8804)) ([64c58c8](https://github.com/windmill-labs/windmill/commit/64c58c824fcefe00f15405b7e3877eb566a3ffa2))
* allow non-admins to create and edit HTTP triggers ([#8810](https://github.com/windmill-labs/windmill/issues/8810)) ([9fb7816](https://github.com/windmill-labs/windmill/commit/9fb78164b4baa14c10d10f91ae969d48590c29f3))
* display agent message in flow graph ([#8806](https://github.com/windmill-labs/windmill/issues/8806)) ([95411b2](https://github.com/windmill-labs/windmill/commit/95411b256332fa41816a93b19906f1534da9b300))
* folder default_permissioned_as rules for ownership defaults on deploy ([#8801](https://github.com/windmill-labs/windmill/issues/8801)) ([60211c1](https://github.com/windmill-labs/windmill/commit/60211c1d1910b5f7ac6fed112f790201d2047a4c))
* instance-level ruff config auto-pulled by LSP container ([#8803](https://github.com/windmill-labs/windmill/issues/8803)) ([3f5841f](https://github.com/windmill-labs/windmill/commit/3f5841f84d878cd3f43c435fa237d3f0c2265fb9))
### Bug Fixes
* **cli:** make cli help resilient to npm registry fetch failures ([#8809](https://github.com/windmill-labs/windmill/issues/8809)) ([b6f1cc7](https://github.com/windmill-labs/windmill/commit/b6f1cc70cd87c61df7112d3838fbb5fe9bcdc145))
* enrich OTEL log records with per-request LogContext ([#8812](https://github.com/windmill-labs/windmill/issues/8812)) ([42d3e8c](https://github.com/windmill-labs/windmill/commit/42d3e8c7893cd959c7faffd19cd210c869c604f8))
* silence user-facing toast for non-critical hub script tracking error ([#8808](https://github.com/windmill-labs/windmill/issues/8808)) ([378ba78](https://github.com/windmill-labs/windmill/commit/378ba7828456c871b5778f1144c4bb559bd5a733))
### Performance Improvements
* add inline-persist fast path for WAC v2 step() ([#8807](https://github.com/windmill-labs/windmill/issues/8807)) ([b3ef4bc](https://github.com/windmill-labs/windmill/commit/b3ef4bc26c5696624efee89b5e4e33e77e10cf15))
## [1.682.0](https://github.com/windmill-labs/windmill/compare/v1.681.0...v1.682.0) (2026-04-10)
### Features
* enrich hanging flow error with worker and service log info ([#8800](https://github.com/windmill-labs/windmill/issues/8800)) ([59c457a](https://github.com/windmill-labs/windmill/commit/59c457a13881e35c229baed3edd87e618f89b9a0))
### Bug Fixes
* bypass OTEL MITM tracing proxy for git sync jobs ([#8796](https://github.com/windmill-labs/windmill/issues/8796)) ([9c85565](https://github.com/windmill-labs/windmill/commit/9c855652212dbac0e49f87dedd447d3d7d7b500a))
* show full path on hover in deploy drawer and widen drawer ([#8799](https://github.com/windmill-labs/windmill/issues/8799)) ([b783bf2](https://github.com/windmill-labs/windmill/commit/b783bf2d835cde0843739f7d1099193bb0af042e))
## [1.681.0](https://github.com/windmill-labs/windmill/compare/v1.680.0...v1.681.0) (2026-04-10)
### Features
* add CI test scripts with auto-trigger on deploy ([#8736](https://github.com/windmill-labs/windmill/issues/8736)) ([c57c769](https://github.com/windmill-labs/windmill/commit/c57c769deaa207e7ba7995f75649d3630774e898))
* add edit yaml button to raw app settings ([#8771](https://github.com/windmill-labs/windmill/issues/8771)) ([b73be37](https://github.com/windmill-labs/windmill/commit/b73be37916de808dc64bec1337edf6e7d3993c5e))
* add user offboarding flow with object reassignment ([#8647](https://github.com/windmill-labs/windmill/issues/8647)) ([435b25e](https://github.com/windmill-labs/windmill/commit/435b25e6a4c7272c0189cbcfb83526379f41ebf0))
* allow selecting hub flows as raw app backend runnables ([#8772](https://github.com/windmill-labs/windmill/issues/8772)) ([5f57727](https://github.com/windmill-labs/windmill/commit/5f57727a4d956a9066b005b3c55f08dd6780475a))
* list external JWT tokens in instance settings ([#8783](https://github.com/windmill-labs/windmill/issues/8783)) ([ce3e676](https://github.com/windmill-labs/windmill/commit/ce3e676f4ab0c442058c64db4ebf35545a805ef5))
* oauth manual connect option ([#8770](https://github.com/windmill-labs/windmill/issues/8770)) ([4b87639](https://github.com/windmill-labs/windmill/commit/4b876392a0ce41ae42bd882ced10fe0187e532bc))
* unify CLI config to workspaces, deprecate gitBranches/environments ([#8767](https://github.com/windmill-labs/windmill/issues/8767)) ([5b97092](https://github.com/windmill-labs/windmill/commit/5b9709299761b83a88df17a4259c431dfcd244f9))
* **vault:** add skip_ssl_verify option for HashiCorp Vault ([#8791](https://github.com/windmill-labs/windmill/issues/8791)) ([6cf7ffc](https://github.com/windmill-labs/windmill/commit/6cf7ffc26bcbc8f4ef0e4ad2879fcd114332c4e2))
### Bug Fixes
* bypass sql type injection during formatting to prevent offset corruption ([#8786](https://github.com/windmill-labs/windmill/issues/8786)) ([8957d8f](https://github.com/windmill-labs/windmill/commit/8957d8f19bce3430871c2858b3accd53e0be178f))
* CLI falls back to workspace whoami for workspace-scoped tokens ([#8789](https://github.com/windmill-labs/windmill/issues/8789)) ([d243eb3](https://github.com/windmill-labs/windmill/commit/d243eb31b014781a249f903b2a467aa58909ddd6))
* disable scroll-to-change-number on number inputs ([#8777](https://github.com/windmill-labs/windmill/issues/8777)) ([e63924e](https://github.com/windmill-labs/windmill/commit/e63924e3778b40486813192dc2913e565e0a765e))
* error on flow/app folder suffix format mismatch during sync push/pull ([#8775](https://github.com/windmill-labs/windmill/issues/8775)) ([1deb31f](https://github.com/windmill-labs/windmill/commit/1deb31f1e01d6168eee3c2cc242cb483272d1965))
* flow dev page layout and compact toolbar improvements ([#8776](https://github.com/windmill-labs/windmill/issues/8776)) ([89920e7](https://github.com/windmill-labs/windmill/commit/89920e77f3f5dc45db939ec938d92c881dccc8a0))
* Flow status viewer layout nits (avoid excess y space and scroll) ([#8780](https://github.com/windmill-labs/windmill/issues/8780)) ([6d36eca](https://github.com/windmill-labs/windmill/commit/6d36eca21684f9d3ab36658c2b66f85b9be8d331))
* flow step testing UX improvements ([#8781](https://github.com/windmill-labs/windmill/issues/8781)) ([3fb557a](https://github.com/windmill-labs/windmill/commit/3fb557a7f51dbbd3fac445734196f1b9a1d2e287))
* hide legacy global_settings.worker_configs ghost row ([#8790](https://github.com/windmill-labs/windmill/issues/8790)) ([4fff89f](https://github.com/windmill-labs/windmill/commit/4fff89f98ce72997a055cc313c8fe217d2f1fe78))
* limit multi-runnable dedicated workers to one job at a time ([#8782](https://github.com/windmill-labs/windmill/issues/8782)) ([946848f](https://github.com/windmill-labs/windmill/commit/946848feef60aba2a54bc2f5b686b33cc96ec9ef))
* normalize multi-word pg types in build_parameters to fix float8 serialization ([#8778](https://github.com/windmill-labs/windmill/issues/8778)) ([3d02be9](https://github.com/windmill-labs/windmill/commit/3d02be98f748d985f688243f3215d15ca4227f8f))
* refresh custom instance user password if auth failed ([#8787](https://github.com/windmill-labs/windmill/issues/8787)) ([3d43d31](https://github.com/windmill-labs/windmill/commit/3d43d31aba276f23903f16f06035a4c4955b52e2))
* treat empty global setting strings as unset ([#8793](https://github.com/windmill-labs/windmill/issues/8793)) ([ec9cec1](https://github.com/windmill-labs/windmill/commit/ec9cec1d02d87328db92a71a1b3a945e9e0c6bd2))
* zero-downtime coordinated restarts for OTEL and other setting changes ([#8768](https://github.com/windmill-labs/windmill/issues/8768)) ([506b7f5](https://github.com/windmill-labs/windmill/commit/506b7f55e17472d1384e9676c1b6df7a9d7a118b))
## [1.680.0](https://github.com/windmill-labs/windmill/compare/v1.679.0...v1.680.0) (2026-04-08)
### Features
* add CLI workspace merge command and enhance fork with datatable/color support ([#8756](https://github.com/windmill-labs/windmill/issues/8756)) ([4342c18](https://github.com/windmill-labs/windmill/commit/4342c1854134500d3b2bc46280f9885ee84e2c9e))
* add scheduled job deletion with configurable retention period ([#8753](https://github.com/windmill-labs/windmill/issues/8753)) ([2d18a68](https://github.com/windmill-labs/windmill/commit/2d18a680991babe317ca315bbce40e6ce733afda))
* add status indicator dots to parallel loop iteration picker ([#8761](https://github.com/windmill-labs/windmill/issues/8761)) ([470b8aa](https://github.com/windmill-labs/windmill/commit/470b8aa5f1870e26fea022c1e2a9f48471d8a205))
### Bug Fixes
* move alert config from config table to global_settings ([#8762](https://github.com/windmill-labs/windmill/issues/8762)) ([fa66870](https://github.com/windmill-labs/windmill/commit/fa668707c0ee7f261d78e145666b1073471259fd))
* resolve esbuild host/binary version mismatch in app sync push ([#8765](https://github.com/windmill-labs/windmill/issues/8765)) ([e36d440](https://github.com/windmill-labs/windmill/commit/e36d440a251a43ea888e3ce378d0bb8ed8f42e11))
* skip serializing ws_specific on resources when false ([#8764](https://github.com/windmill-labs/windmill/issues/8764)) ([c69f10d](https://github.com/windmill-labs/windmill/commit/c69f10d20dd064f0c329934096c2945424ff81f2))
## [1.679.0](https://github.com/windmill-labs/windmill/compare/v1.678.0...v1.679.0) (2026-04-07)
### Features
* Fork datatables ([#8339](https://github.com/windmill-labs/windmill/issues/8339)) ([3d4f4c6](https://github.com/windmill-labs/windmill/commit/3d4f4c6c38155396e9b2236a6a7a7ad4e02da877))
## [1.678.0](https://github.com/windmill-labs/windmill/compare/v1.677.0...v1.678.0) (2026-04-07)
### Features
* accept any content type on webhooks/http triggers with fallback ([#8743](https://github.com/windmill-labs/windmill/issues/8743)) ([208a597](https://github.com/windmill-labs/windmill/commit/208a597d599b4d203f7ab817a5d8ce2c06f79d0a))
* add download all logs button for flow jobs ([#8748](https://github.com/windmill-labs/windmill/issues/8748)) ([d938625](https://github.com/windmill-labs/windmill/commit/d938625785ba301fbd2c5f3d001c320eab1c504c))
### Bug Fixes
* delete raw_script_temp rows before workspace deletion to avoid FK violation ([#8752](https://github.com/windmill-labs/windmill/issues/8752)) ([8b9523e](https://github.com/windmill-labs/windmill/commit/8b9523e03c82c5a095b7cb2d5f70a87b7bbc8608))
* Fix FlowTimeline duplicate key ([#8754](https://github.com/windmill-labs/windmill/issues/8754)) ([2413dbe](https://github.com/windmill-labs/windmill/commit/2413dbefe3cc3b65c28bea437cd4471cf7e9ecba))
* remove span.enter() in dedicated worker to prevent tracing panic ([#8749](https://github.com/windmill-labs/windmill/issues/8749)) ([db55e8e](https://github.com/windmill-labs/windmill/commit/db55e8efb0c9ae198ca5ac7013439a94dfe9f550))
* restore ai agent tool deletion ([#8744](https://github.com/windmill-labs/windmill/issues/8744)) ([2f7ba9e](https://github.com/windmill-labs/windmill/commit/2f7ba9edac1a57dfc0eb3417574c72292855fc56))
## [1.677.0](https://github.com/windmill-labs/windmill/compare/v1.676.0...v1.677.0) (2026-04-06)
### Features
* add AWS Secrets Manager as secret storage backend (Beta) ([#8734](https://github.com/windmill-labs/windmill/issues/8734)) ([09bbc18](https://github.com/windmill-labs/windmill/commit/09bbc18bb773d9ffaa5aaa4bd9d7ce296f3ac468))
### Bug Fixes
* remove stale KMS openapi/description, restore stripped doc comments ([c09a431](https://github.com/windmill-labs/windmill/commit/c09a4311fd73c58acc8f3997428f002598dacce6))
* use runnable key for file naming in generate-metadata to prevent duplicate scripts in raw apps ([#8740](https://github.com/windmill-labs/windmill/issues/8740)) ([edfe074](https://github.com/windmill-labs/windmill/commit/edfe074e98cb3955be0768de7ed19e6ed8525916))
## [1.676.0](https://github.com/windmill-labs/windmill/compare/v1.675.1...v1.676.0) (2026-04-06)
### Features
* add path name autocomplete with ghost text and folder cycling ([#8731](https://github.com/windmill-labs/windmill/issues/8731)) ([e326621](https://github.com/windmill-labs/windmill/commit/e32662169a9762605de2dbe058514ddefbe07982))
### Bug Fixes
* fix custom urls not found ([d2abc0d](https://github.com/windmill-labs/windmill/commit/d2abc0d4300bb53f4035102f214d3c05bf0976a1))
### Performance Improvements
* add partial index for expired cache resource cleanup ([#8728](https://github.com/windmill-labs/windmill/issues/8728)) ([c721fac](https://github.com/windmill-labs/windmill/commit/c721fac466524747de04e3623c8cd62de8bd4dae))
## [1.675.1](https://github.com/windmill-labs/windmill/compare/v1.675.0...v1.675.1) (2026-04-05)
### Bug Fixes
* log cleanup scans S3 orphans and works cross-server ([#8729](https://github.com/windmill-labs/windmill/issues/8729)) ([f703fba](https://github.com/windmill-labs/windmill/commit/f703fba1ef56c89a97b2b4da7b4c188158f4c982))
### Performance Improvements
* add indexes for cleanup deletes on concurrency_key and autoscaling_event ([#8726](https://github.com/windmill-labs/windmill/issues/8726)) ([eae46a2](https://github.com/windmill-labs/windmill/commit/eae46a21a93fe7ab191228658dd5825f472bd851))
## [1.675.0](https://github.com/windmill-labs/windmill/compare/v1.674.2...v1.675.0) (2026-04-05)
### Features
* add object storage usage view and manual log cleanup ([#8724](https://github.com/windmill-labs/windmill/issues/8724)) ([02d0ee9](https://github.com/windmill-labs/windmill/commit/02d0ee919880823a33b112bcaf626a8933e1f715))
### Bug Fixes
* add admin check to count_completed_jobs_detail and document query builder SQL safety ([#8722](https://github.com/windmill-labs/windmill/issues/8722)) ([dd39c11](https://github.com/windmill-labs/windmill/commit/dd39c110a8468bf31d42428fc978cd302426fa86))
* allow private AI base URLs in ai_proxy integration test ([#8715](https://github.com/windmill-labs/windmill/issues/8715)) ([2b865c0](https://github.com/windmill-labs/windmill/commit/2b865c0694d79ce6477e5f14a077b73837007500))
* enrich OTEL spans with job_kind, trigger_kind, trigger, created_by, and script_hash ([#8718](https://github.com/windmill-labs/windmill/issues/8718)) ([7bf6ac2](https://github.com/windmill-labs/windmill/commit/7bf6ac2b694fc829327248ff2480c20c97e03e48))
* split DB health endpoint and add slow query controls ([#8725](https://github.com/windmill-labs/windmill/issues/8725)) ([01e39d9](https://github.com/windmill-labs/windmill/commit/01e39d9cd1b841d085bcc28a578654a5486cf76e))
## [1.674.2](https://github.com/windmill-labs/windmill/compare/v1.674.1...v1.674.2) (2026-04-04)
### Bug Fixes
* enforce RLS on $var: resolution in AI proxy (GHSA-jwg4-v3cj-rvfm) ([#8713](https://github.com/windmill-labs/windmill/issues/8713)) ([ff8e39c](https://github.com/windmill-labs/windmill/commit/ff8e39c69b1438defcaabd9d4906e7adafa7010c))
* SSRF via X-Resource-Path header in AI proxy endpoint ([#8712](https://github.com/windmill-labs/windmill/issues/8712)) ([f394e67](https://github.com/windmill-labs/windmill/commit/f394e674f22af13bb77915f33aa1e8de402b6fe1))
## [1.674.1](https://github.com/windmill-labs/windmill/compare/v1.674.0...v1.674.1) (2026-04-04)
### Bug Fixes
* create pg connection for cloud-hosted jobs instead of panicking ([#8710](https://github.com/windmill-labs/windmill/issues/8710)) ([aff95c3](https://github.com/windmill-labs/windmill/commit/aff95c33b2fd4c248dfaf595b8d18a6dbc50f0e6))
## [1.674.0](https://github.com/windmill-labs/windmill/compare/v1.673.0...v1.674.0) (2026-04-03)
### Features
* add application-level heartbeat support for websocket triggers ([#8686](https://github.com/windmill-labs/windmill/issues/8686)) ([5b7fa63](https://github.com/windmill-labs/windmill/commit/5b7fa63bf1800313e9b82465b8a4399a48634371))
* add Azure Key Vault as secret storage backend ([#8704](https://github.com/windmill-labs/windmill/issues/8704)) ([dcd615f](https://github.com/windmill-labs/windmill/commit/dcd615fdc3c66ec2a8e39c01f8142a7e7c82f534))
* add http/protobuf support for OTEL exporters ([#8702](https://github.com/windmill-labs/windmill/issues/8702)) ([0aea49f](https://github.com/windmill-labs/windmill/commit/0aea49f9607d5cbb5bcfa3068a179c9b7bf9afd6))
* add optional labels to scripts, flows, apps, schedules, triggers ([#8609](https://github.com/windmill-labs/windmill/issues/8609)) ([c4c9ef5](https://github.com/windmill-labs/windmill/commit/c4c9ef5fd7b41052b08ee941725434e8ca4ac970))
* add powershell common parameters support ([#8683](https://github.com/windmill-labs/windmill/issues/8683)) ([0317d58](https://github.com/windmill-labs/windmill/commit/0317d5891cfcfbde7b04795c034c088e933ee3d0))
* sql.raw in Typescript client ([#8706](https://github.com/windmill-labs/windmill/issues/8706)) ([ce290f6](https://github.com/windmill-labs/windmill/commit/ce290f68db866c07b30c97c2c0b3e39fee0a26d8))
* Support .ducklake() and .datatable() in agent workers ([#8697](https://github.com/windmill-labs/windmill/issues/8697)) ([fda68a7](https://github.com/windmill-labs/windmill/commit/fda68a72e5dfcded2350d1ff33ca4c695ab337b7))
### Bug Fixes
* add secretKeyRef support for jwt_secret and rsa_keys ([#8698](https://github.com/windmill-labs/windmill/issues/8698)) ([ba21470](https://github.com/windmill-labs/windmill/commit/ba214709b94f9467738e66b016331e97ac7d5d10))
* align script push metadata warning with generated locks ([#8690](https://github.com/windmill-labs/windmill/issues/8690)) ([6656b46](https://github.com/windmill-labs/windmill/commit/6656b46f10408e1c15961a72cde4c13b5c5b3923))
* debounce S3 proxy logs ([#8694](https://github.com/windmill-labs/windmill/issues/8694)) ([a3073ad](https://github.com/windmill-labs/windmill/commit/a3073ad8244efd9043e27f6731f7b53dbda662c1))
* dedicated worker dispatch, cross-workspace deps, UI improvements ([#8689](https://github.com/windmill-labs/windmill/issues/8689)) ([bffa61e](https://github.com/windmill-labs/windmill/commit/bffa61e33f2305bbeb79a2c91989a47baa7dff31))
* gate relock_skip tests on private feature and update ee-repo-ref ([#8703](https://github.com/windmill-labs/windmill/issues/8703)) ([adc9fe7](https://github.com/windmill-labs/windmill/commit/adc9fe722d8511a5914d81faac40af757e7f5e3f))
* hide deprecated cli metadata commands ([#8699](https://github.com/windmill-labs/windmill/issues/8699)) ([b960598](https://github.com/windmill-labs/windmill/commit/b96059843168c072f24072f93fecd80431e5d4cf))
* optimize S3 proxy performance ([#8685](https://github.com/windmill-labs/windmill/issues/8685)) ([0cfa462](https://github.com/windmill-labs/windmill/commit/0cfa462c379e887fdb5ad5e3bbff7798648d4e91))
* pipeline DISCARD ALL with first query on cached pg connections ([#8707](https://github.com/windmill-labs/windmill/issues/8707)) ([6d58d1a](https://github.com/windmill-labs/windmill/commit/6d58d1a74d1e69b163210a795502a7b3931001b5))
* resolve schedule update deadlock ([#8701](https://github.com/windmill-labs/windmill/issues/8701)) ([27ca417](https://github.com/windmill-labs/windmill/commit/27ca417201c99cf6fe0ae5b52a63c0395033e196))
* support raw app deployment history ([#8657](https://github.com/windmill-labs/windmill/issues/8657)) ([f234df9](https://github.com/windmill-labs/windmill/commit/f234df97ec3cdc480ee9d403370a3512496b024b))
* use pre-aggregated stats for telemetry job usage queries ([#8688](https://github.com/windmill-labs/windmill/issues/8688)) ([cdf3c29](https://github.com/windmill-labs/windmill/commit/cdf3c29664e4142c0f4487c07e585d1af3f97f91))
## [1.673.0](https://github.com/windmill-labs/windmill/compare/v1.672.0...v1.673.0) (2026-04-02)
### Features
* add endpoint to restart workers in a worker group ([#8659](https://github.com/windmill-labs/windmill/issues/8659)) ([f0437eb](https://github.com/windmill-labs/windmill/commit/f0437eba1925a9aa4c430008027d637a0c89ee39))
* add Entra ID (Azure Workload Identity) database auth ([#8526](https://github.com/windmill-labs/windmill/issues/8526)) ([6a5cfbc](https://github.com/windmill-labs/windmill/commit/6a5cfbc159a0ad7925fd7ce5eefc8eaa21bbb70b))
* add LIMIT_WINDOWS_TO_1CU env var for Windows worker memory limits ([#8681](https://github.com/windmill-labs/windmill/issues/8681)) ([d2d6810](https://github.com/windmill-labs/windmill/commit/d2d6810db954114f3333853bd3476cb8fc735f92))
* restore bun for dedicated workers, fix dispatch & serialization, cross-workspace deps ([#8645](https://github.com/windmill-labs/windmill/issues/8645)) ([619ebb6](https://github.com/windmill-labs/windmill/commit/619ebb65ce8dce8264add31c3147919802a8286a))
### Bug Fixes
* add HMAC signature verification to Slack interactive callback endpoint ([#8611](https://github.com/windmill-labs/windmill/issues/8611)) ([55e8a5c](https://github.com/windmill-labs/windmill/commit/55e8a5cff1f185b1dbd332d37b877972efa1ed7d))
* correct raw app flow inputs ([#8667](https://github.com/windmill-labs/windmill/issues/8667)) ([28c0730](https://github.com/windmill-labs/windmill/commit/28c073056c65d4ed1600e39679497e5af964347f))
* pass selected language to AI agent when generating flow scripts ([#8680](https://github.com/windmill-labs/windmill/issues/8680)) ([381011a](https://github.com/windmill-labs/windmill/commit/381011a4a8e48454e9c146c64db502293e646b99))
* poll for preview results to avoid undici headers timeout ([#8682](https://github.com/windmill-labs/windmill/issues/8682)) ([ff5fa9f](https://github.com/windmill-labs/windmill/commit/ff5fa9f64fe4aaf33e06b20f02373894b5df0f95))
* pre-fix trigger edited_by for superadmins not in workspace ([#8669](https://github.com/windmill-labs/windmill/issues/8669)) ([350ffdc](https://github.com/windmill-labs/windmill/commit/350ffdce297ba5b84f9dd247eede6da0c6b0956c))
* resolve race condition where flow sync push reverts to stale version ([#8673](https://github.com/windmill-labs/windmill/issues/8673)) ([d569e9e](https://github.com/windmill-labs/windmill/commit/d569e9e29c588243a90b1cd25f866efb0d178640))
* respect disabled fields in JSON input mode ([#8663](https://github.com/windmill-labs/windmill/issues/8663)) ([7fd0bf9](https://github.com/windmill-labs/windmill/commit/7fd0bf974d2ba2644bb01dd5e9ddc84749e166f5))
* Run typed pg queries in a single protocol conversation ([#8679](https://github.com/windmill-labs/windmill/issues/8679)) ([8581a33](https://github.com/windmill-labs/windmill/commit/8581a3300d056040b7e3ab77d629c74f034c9c97))
* sanitize MCP tool schemas for JSON Schema draft 2020-12 compliance ([#8666](https://github.com/windmill-labs/windmill/issues/8666)) ([8c3c97f](https://github.com/windmill-labs/windmill/commit/8c3c97f7a670d47019cc666219f8187f48499672))
* skip generate-metadata confirmation prompt in non-interactive CI ([#8678](https://github.com/windmill-labs/windmill/issues/8678)) ([39af1b7](https://github.com/windmill-labs/windmill/commit/39af1b75afc8458f85dec4fe51dfaed3d0cb000d))
* strip f/ prefix from folder paths when deploying from workspace forks ([#8662](https://github.com/windmill-labs/windmill/issues/8662)) ([7ab0ea5](https://github.com/windmill-labs/windmill/commit/7ab0ea581d349fbfdb56d22cf9903a90efa045bb))
* support branch-specific folder.meta.yaml in missing-meta check ([#8661](https://github.com/windmill-labs/windmill/issues/8661)) ([c87a6a0](https://github.com/windmill-labs/windmill/commit/c87a6a0f2c1346bf5e21f128d32d89bdca039243))
* validate rd redirect on login with same rules as logout ([#8655](https://github.com/windmill-labs/windmill/issues/8655)) ([bcce627](https://github.com/windmill-labs/windmill/commit/bcce62738791a4e9b9f4dbc64731eef163230172))
## [1.672.0](https://github.com/windmill-labs/windmill/compare/v1.671.0...v1.672.0) (2026-04-01)
### Features
* add R language support ([#8263](https://github.com/windmill-labs/windmill/issues/8263)) ([a46aa64](https://github.com/windmill-labs/windmill/commit/a46aa641f9d72809c52a0eb11a877a0f2d587c32))
### Bug Fixes
* approval page freeze, stale state, and missing approval link ([#8653](https://github.com/windmill-labs/windmill/issues/8653)) ([7069202](https://github.com/windmill-labs/windmill/commit/70692021909443b86ed61fa621fe49f28742fb54))
## [1.671.0](https://github.com/windmill-labs/windmill/compare/v1.670.0...v1.671.0) (2026-03-31)
### Features
* add configurable preview job tag override in default tags settings ([#8649](https://github.com/windmill-labs/windmill/issues/8649)) ([da8886b](https://github.com/windmill-labs/windmill/commit/da8886be8575dd925b6d24c55ab379bc6984c5f8))
* improve CLI flow log streaming and job inspection ([#8644](https://github.com/windmill-labs/windmill/issues/8644)) ([6c3c971](https://github.com/windmill-labs/windmill/commit/6c3c971af5aa1362632ee0deeddf91b8bc47c853))
* support hub flows in raw app runnables ([#8627](https://github.com/windmill-labs/windmill/issues/8627)) ([040a199](https://github.com/windmill-labs/windmill/commit/040a199685cea5c99c944bacb5584a381d6ec829))
### Bug Fixes
* return default_args/enums in approval info and fix subflow resume buttons ([#8648](https://github.com/windmill-labs/windmill/issues/8648)) ([852c59e](https://github.com/windmill-labs/windmill/commit/852c59efbb04510e5e6f99919707effcf6769a2f))
## [1.670.0](https://github.com/windmill-labs/windmill/compare/v1.669.1...v1.670.0) (2026-03-31)

View File

@@ -26,6 +26,7 @@ Open-source platform for internal tools, workflows, API integrations, background
- **DB**: `psql postgres://postgres:changeme@localhost:5432/windmill`
- **Login**: `admin@windmill.dev` / `changeme`
- **Instance settings**: navigate to `/#superadmin-settings`
- **Migrations**: use `cargo sqlx migrate add -r <name>` from `backend/` to create new migrations (never generate timestamps manually)
## Banned Patterns

View File

@@ -162,11 +162,19 @@ ENV PATH /usr/local/bin:/root/.local/bin:/tmp/.local/bin:$PATH
RUN apt-get update \
&& apt-get install -y --no-install-recommends netbase tzdata ca-certificates wget curl jq unzip build-essential unixodbc xmlsec1 software-properties-common tini \
&& apt-get install -y --no-install-recommends netbase tzdata ca-certificates wget curl jq unzip build-essential unixodbc xmlsec1 software-properties-common tini gnupg lsb-release \
&& if echo "$features" | grep -q "ee"; then apt-get install -y --no-install-recommends libsasl2-modules-gssapi-mit krb5-user; fi \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install latest PostgreSQL client (pg_dump) from official PostgreSQL apt repository
RUN curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /usr/share/keyrings/postgresql-archive-keyring.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/postgresql-archive-keyring.gpg] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
&& apt-get update \
&& apt-get install -y --no-install-recommends postgresql-client \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN if [ "$WITH_GIT" = "true" ]; then \
apt-get update -y \
&& apt-get install -y git \

2
ai_evals/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
.env
results/

172
ai_evals/AGENTS.md Normal file
View File

@@ -0,0 +1,172 @@
# AI Evals Authoring Guide
This folder contains black-box benchmark cases for:
- `flow`
- `app`
- `script`
- `cli`
The goal is to test the current production prompts and guidance with realistic user requests, not to test one exact implementation shape.
## Core rules
1. Write prompts like a real user request.
2. Prefer behavior, inputs, constraints, and outcomes over internal implementation details.
3. Keep deterministic validation narrow and hard.
4. Put semantic expectations in `judgeChecklist`.
5. Use `expected` fixtures only when exact structure really matters.
## Prompt writing
Prompts should sound like something a user would naturally ask.
Good:
- "Create a flow that routes support requests based on customer tier."
- "Add a reset button that sets the counter back to 0."
- "Create a flow that reuses the existing greeting script instead of duplicating the logic."
Bad:
- "Use `branchone` with 3 branches and a default branch."
- "Create a `rawscript` step with this exact topology."
- "This is a benchmark harness."
Do not write prompts as if the user knows Windmill internals unless the case is explicitly testing a power-user workflow.
## Flow-specific rules
This is the main principle you asked for:
- flow prompts should read like requests from a user who does not know the product internals
- the user should ask for behavior, not for `branchone`, `branchall`, `rawscript`, `preprocessor_module`, `failure_module`, exact graph topology, or other internal constructs
That means:
- creation cases should describe the business behavior and expected result
- modification cases may mention existing step names, because the user can see the current flow
- only mention special Windmill constructs when the case is explicitly about those constructs
Examples:
- acceptable creation prompt:
"Create a purchase approval flow that pauses for approval and asks the approver for a comment."
- avoid:
"Create a suspend step with one required event and a resume form."
For flow cases, do not fail a case just because the model chose a different valid topology.
## App-specific rules
App prompts should focus on user-visible behavior:
- what the UI should let the user do
- what should persist
- what backend behavior is needed
Avoid prompting in terms of React structure, component names, or implementation unless the case is specifically about editing an existing app.
## CLI-specific rules
CLI prompts can be more explicit about paths and file names because real CLI users often do specify them.
Still, avoid benchmark phrasing. The prompt should read like a repo task, not a harness instruction.
When relevant, ask the assistant to tell the user which `wmill` commands to run next. That is part of the benchmarked behavior.
## Deterministic validation
Use deterministic validation only for hard failures such as:
- missing required files
- unexpected extra files when the prompt says not to create them
- syntax errors
- unresolved flow refs
- missing required special modules or suspend config
- obvious artifact corruption
Do not use deterministic validation to enforce one preferred implementation for broad creation tasks.
Examples of bad hard checks:
- exact step topology for a creation flow
- exact branch structure when the prompt only asked for routing behavior
- exact input shape when multiple reasonable shapes are acceptable
## Judge checklist
Every non-trivial case should have a `judgeChecklist`.
The checklist should capture:
- the user-visible behavior that must be present
- important constraints
- key completion criteria
The checklist should not duplicate low-level implementation details unless they are truly required by the task.
Good checklist items:
- "the flow calculates the order total with 8% tax"
- "the app persists recipes appropriately for a raw Windmill app"
- "the flow reuses the existing workspace script instead of rewriting the logic"
Bad checklist items:
- "uses `branchone`"
- "contains a `rawscript` node"
## When to use `expected`
Use `expected` fixtures when the case is structure-sensitive, for example:
- exact file creation
- exact script content
- modification cases where a specific file must change in a specific way
- cases where preserving an existing structure is part of the requirement
Do not use a full `expected` artifact as the semantic oracle for broad creation tasks when multiple valid outputs should pass.
## When to use `initial`
Use `initial` when the benchmark is about:
- editing an existing artifact
- reusing existing workspace assets
- preserving existing behavior while adding a change
If the case is greenfield, prefer no `initial`.
## Case design ladder
Prefer suites that get gradually harder:
1. trivial create case
2. realistic create case
3. reuse-existing-assets case
4. modification case
5. refactor case
6. edge-case or niche product behavior
The last cases in a suite should cover unusual or product-specific behavior.
## Anti-patterns
Avoid these:
- benchmark framing in prompts
- over-specified internal topology for creation tasks
- judge checklists that just restate implementation details
- deterministic validation that encodes one preferred solution
- fixtures that are so minimal or brittle that they create false negatives
## Before adding a case
Ask:
1. Would a real user plausibly write this prompt?
2. If the model solves it in a different valid way, would the case still pass?
3. Are the hard deterministic checks only catching objectively broken output?
4. Does the `judgeChecklist` describe the real success criteria?
5. If this case fails, will the reason be understandable from the saved artifacts?

1
ai_evals/CLAUDE.md Normal file
View File

@@ -0,0 +1 @@
@AGENTS.md

197
ai_evals/README.md Normal file
View File

@@ -0,0 +1,197 @@
# AI Evals
Small benchmark runner for the four Windmill AI generation modes:
- `cli`
- `flow`
- `script`
- `app`
The benchmark always tests the current production prompts, tools, and guidance in this checkout.
Each attempt runs:
1. the real production path
2. deterministic validation
3. LLM judging
## Install
```bash
cd ai_evals
bun install
```
Frontend modes also require frontend dependencies:
```bash
cd frontend
bun install
```
## Commands
List model aliases:
```bash
cd ai_evals
bun run cli -- models
```
List cases:
```bash
cd ai_evals
bun run cli -- cases
bun run cli -- cases flow
```
Run benchmarks:
```bash
cd ai_evals
bun run cli -- run flow
bun run cli -- run flow flow-test4-order-processing-loop --model opus
bun run cli -- run flow flow-test0-sum-two-numbers --models haiku,opus,4o
bun run cli -- run flow flow-test0-sum-two-numbers --runs 3 --verbose
bun run cli -- run flow --record
WMILL_AI_EVAL_BACKEND_URL=http://127.0.0.1:8000 bun run cli -- run flow --backend-validation preview
bun run cli -- run cli bun-hello-script
```
Public CLI surface:
- `models`
- `cases [mode]`
- `run <mode> [caseIds...]`
`run` options:
- `--runs <n>`: repeat each case `n` times
- `--output <path>`: custom result JSON path
- `--model <alias>`: choose the model under test
- `--models <a,b,c>`: run the same cases sequentially against several model aliases
- `--verbose`: stream assistant output for frontend runs
- `--record`: append a compact tracked summary line to `ai_evals/history/<mode>.jsonl` for full-suite runs only
- `--backend-validation <mode>`: optional backend smoke validation (`off` or `preview`) for `script` and `flow` evals
## Models
Use `bun run cli -- models` to see the current aliases.
Today:
- `haiku`
- `sonnet`
- `opus`
- `4o`
- `gemini-flash`
- `gemini-pro`
- `gemini-3-flash-preview`
- `gemini-3.1-pro-preview`
Notes:
- the command also prints accepted alias spellings such as `gpt-4o`, `claude-opus-4.6`, and `claude-haiku-4.5`
- frontend modes (`flow`, `script`, `app`) can use Anthropic, OpenAI, and Gemini-backed aliases
- `cli` mode always uses the Anthropic agent SDK, so only Anthropic aliases are valid there
- the judge model is separate and currently defaults to `claude-sonnet-4-6`
## Case Format
Cases live in one YAML file per mode under `ai_evals/cases/`.
Minimal shape:
```yaml
- id: flow-test0-sum-two-numbers
prompt: |-
Create a flow that takes two numbers, `a` and `b`, and returns their sum.
initial: ai_evals/fixtures/...
expected: ai_evals/fixtures/...
```
Optional fields:
- `initial`: starting state fixture
- `expected`: expected artifact fixture
- `validate`: extra deterministic validation rules
- `runtime.backendPreview`: optional real backend preview config for smoke validation
For `flow` mode, `validate` can express requirements such as:
- accepted input schema shapes
- required `results.*` reference validity
- required module/code/input characteristics
For `flow` mode, an `initial` fixture can also include a benchmark workspace catalog of
existing scripts and flows. That lets the real `search_workspace` and
`get_runnable_details` tools discover reusable workspace runnables during evals.
If `--backend-validation preview` is enabled:
- `script` evals run a real backend script preview in an isolated temp workspace
- `flow` evals run a real backend flow preview only for cases that define `runtime.backendPreview`
- `flow` cases with `initial.workspace` fixtures seed those scripts and flows into the preview workspace before preview
- when `WMILL_AI_EVAL_BACKEND_WORKSPACE` is set, `ai_evals` treats that workspace as a dedicated test workspace, clears managed eval assets under `f/evals/*` before each preview run, and then reseeds the current case fixtures
Supported backend validation env vars:
- `WMILL_AI_EVAL_BACKEND_VALIDATION=preview`
- `WMILL_AI_EVAL_BACKEND_URL=http://127.0.0.1:8000`
- `WMILL_AI_EVAL_BACKEND_EMAIL=admin@windmill.dev`
- `WMILL_AI_EVAL_BACKEND_PASSWORD=changeme`
- `WMILL_AI_EVAL_BACKEND_WORKSPACE=integration-tests` to reuse an existing workspace on CE installs with low workspace limits
- `WMILL_AI_EVAL_KEEP_WORKSPACES=1`
- `WMILL_AI_EVAL_WORKSPACE_PREFIX=ai-evals`
## Results And Artifacts
Every run writes:
- a summary JSON under `ai_evals/results/`
- generated artifacts in a sibling directory
If `--record` is used, the CLI also appends one compact JSON line to:
- `ai_evals/history/flow.jsonl`
- `ai_evals/history/script.jsonl`
- `ai_evals/history/app.jsonl`
- `ai_evals/history/cli.jsonl`
Each recorded line contains:
- run metadata (`createdAt`, `gitSha`, `mode`, `runModel`, `judgeModel`)
- suite totals (`caseCount`, `attemptCount`, `passedAttempts`, `passRate`, `averageDurationMs`, `averageJudgeScore`)
- average token usage (`averageTokenUsagePerAttempt`)
- per-case metrics under `cases[]` (`averageDurationMs`, `averageJudgeScore`, `averageTokenUsagePerAttempt`, pass rate)
- `failedCaseIds`
Example:
- summary: `ai_evals/results/2026-04-09T09-40-33.051Z__flow.json`
- artifacts: `ai_evals/results/2026-04-09T09-40-33.051Z__flow/`
Typical artifacts by mode:
- `flow`: `flow.json`
- `script`: `script.json` plus the generated script file
- `app`: `app.json` plus frontend/backend files
- `cli`: `assistant-output.txt` plus generated workspace files
- backend-validated attempts also include `backend-preview.json`
## Layout
- `cases/`: one YAML file per mode
- `fixtures/`: initial and expected fixtures
- `core/`: shared loading, model resolution, validation, judging, and result writing
- `modes/`: one runner per mode
- `history/`: optional tracked pass-rate history written by `run --record`, one JSONL file per mode
- `results/`: local benchmark output and artifacts
## Notes
- Frontend modes reuse the production frontend chat code through the Vitest bridge.
- CLI mode creates an isolated workspace, writes the current checkout guidance into it, and benchmarks the real skills / `AGENTS.md` flow.
- Frontend progress streams live while the benchmark is running.
- Deterministic validators should stay focused on real correctness constraints, not one exact implementation shape.

View File

@@ -0,0 +1,72 @@
import { describe, expect, it } from "bun:test";
import {
anthropicUsageToBenchmarkTokenUsage,
extractCliResultTokenUsage,
} from "./runtime";
describe("anthropicUsageToBenchmarkTokenUsage", () => {
it("includes cache tokens in prompt usage", () => {
expect(
anthropicUsageToBenchmarkTokenUsage({
input_tokens: 120,
output_tokens: 45,
cache_creation_input_tokens: 30,
cache_read_input_tokens: 5,
})
).toEqual({
prompt: 155,
completion: 45,
total: 200,
});
});
it("returns null when usage is absent", () => {
expect(anthropicUsageToBenchmarkTokenUsage(null)).toBeNull();
});
});
describe("extractCliResultTokenUsage", () => {
it("reads aggregate usage from the SDK result event", () => {
expect(
extractCliResultTokenUsage({
type: "result",
usage: {
input_tokens: 400,
output_tokens: 120,
cache_creation_input_tokens: 50,
cache_read_input_tokens: 25,
},
})
).toEqual({
prompt: 475,
completion: 120,
total: 595,
});
});
it("falls back to modelUsage when aggregate usage is unavailable", () => {
expect(
extractCliResultTokenUsage({
type: "result",
modelUsage: {
opus: {
inputTokens: 200,
outputTokens: 60,
cacheCreationInputTokens: 10,
cacheReadInputTokens: 5,
},
haiku: {
inputTokens: 80,
outputTokens: 20,
cacheCreationInputTokens: 0,
cacheReadInputTokens: 15,
},
},
})
).toEqual({
prompt: 310,
completion: 80,
total: 390,
});
});
});

View File

@@ -0,0 +1,199 @@
import { query, type Options } from "@anthropic-ai/claude-agent-sdk";
import { join } from "path";
import { fileURLToPath } from "url";
import { getCliEvalModel, resolveEvalModel, type CliEvalModelConfig } from "../../core/models";
import type { BenchmarkTokenUsage } from "../../core/types";
export interface ToolInvocation {
tool: string;
input: Record<string, unknown>;
timestamp: number;
}
export interface PromptRunResult {
toolsUsed: ToolInvocation[];
skillsInvoked: string[];
output: string;
durationMs: number;
assistantMessageCount: number;
tokenUsage: BenchmarkTokenUsage | null;
}
interface AnthropicUsageLike {
input_tokens?: number | null;
output_tokens?: number | null;
cache_creation_input_tokens?: number | null;
cache_read_input_tokens?: number | null;
}
interface AnthropicModelUsageLike {
inputTokens?: number | null;
outputTokens?: number | null;
cacheCreationInputTokens?: number | null;
cacheReadInputTokens?: number | null;
}
interface CliResultMessageLike {
type?: string;
usage?: AnthropicUsageLike | null;
modelUsage?: Record<string, AnthropicModelUsageLike> | null;
}
const REPO_ROOT = fileURLToPath(new URL("../../../", import.meta.url));
export const DEFAULT_CLI_EVAL_MODEL: CliEvalModelConfig = getCliEvalModel(resolveEvalModel("cli"));
export function getGeneratedSkillsSource(): string {
return join(REPO_ROOT, "system_prompts", "auto-generated", "skills");
}
export function anthropicUsageToBenchmarkTokenUsage(
usage: AnthropicUsageLike | null | undefined
): BenchmarkTokenUsage | null {
if (!usage) {
return null;
}
const prompt =
(usage.input_tokens ?? 0) +
(usage.cache_creation_input_tokens ?? 0) +
(usage.cache_read_input_tokens ?? 0);
const completion = usage.output_tokens ?? 0;
return {
prompt,
completion,
total: prompt + completion,
};
}
export function extractCliResultTokenUsage(message: unknown): BenchmarkTokenUsage | null {
if (!message || typeof message !== "object") {
return null;
}
const resultMessage = message as CliResultMessageLike;
if (resultMessage.type !== "result") {
return null;
}
const usage = anthropicUsageToBenchmarkTokenUsage(resultMessage.usage);
if (usage) {
return usage;
}
if (!resultMessage.modelUsage || typeof resultMessage.modelUsage !== "object") {
return null;
}
let prompt = 0;
let completion = 0;
let sawModelUsage = false;
for (const modelUsage of Object.values(resultMessage.modelUsage)) {
if (!modelUsage || typeof modelUsage !== "object") {
continue;
}
prompt +=
(modelUsage.inputTokens ?? 0) +
(modelUsage.cacheCreationInputTokens ?? 0) +
(modelUsage.cacheReadInputTokens ?? 0);
completion += modelUsage.outputTokens ?? 0;
sawModelUsage = true;
}
if (!sawModelUsage) {
return null;
}
return {
prompt,
completion,
total: prompt + completion,
};
}
export async function runPromptAndCapture(
prompt: string,
cwd: string,
maxTurns: number = 3,
modelConfig: CliEvalModelConfig = DEFAULT_CLI_EVAL_MODEL
): Promise<PromptRunResult> {
const toolsUsed: ToolInvocation[] = [];
const skillsInvoked: string[] = [];
let output = "";
let assistantMessageCount = 0;
let tokenUsage: BenchmarkTokenUsage | null = null;
const startedAt = Date.now();
const options: Options = {
cwd,
model: modelConfig.model,
maxTurns,
settingSources: ["project"],
allowedTools: ["Skill", "Read", "Glob", "Grep", "Bash", "Write", "Edit"]
};
for await (const message of query({ prompt, options })) {
if (message.type === "assistant") {
assistantMessageCount += 1;
const content = message.message?.content;
if (Array.isArray(content)) {
for (const block of content) {
if (block.type === "tool_use") {
toolsUsed.push({
tool: block.name,
input: block.input as Record<string, unknown>,
timestamp: Date.now()
});
if (block.name === "Skill" && typeof block.input === "object" && block.input !== null) {
const skillInput = block.input as { skill?: string };
if (skillInput.skill) {
skillsInvoked.push(skillInput.skill);
}
}
} else if (block.type === "text") {
output += block.text;
}
}
}
} else if (message.type === "result") {
const resultMessage = message as { result?: string };
tokenUsage = extractCliResultTokenUsage(message) ?? tokenUsage;
if (typeof resultMessage.result === "string") {
output += resultMessage.result;
}
}
}
return {
toolsUsed,
skillsInvoked,
output,
durationMs: Date.now() - startedAt,
assistantMessageCount,
tokenUsage,
};
}
export function wasSkillInvoked(result: PromptRunResult, skillName: string): boolean {
return result.skillsInvoked.some((skill) => skill === skillName || skill.includes(skillName));
}
export function wasToolUsed(result: PromptRunResult, toolName: string): boolean {
return result.toolsUsed.some((tool) => tool.tool === toolName);
}
export function formatCliRunModelLabel(modelConfig: CliEvalModelConfig): string {
return `${modelConfig.provider}:${modelConfig.model}`;
}
export function getToolInputs(
result: PromptRunResult,
toolName: string
): Record<string, unknown>[] {
return result.toolsUsed
.filter((tool) => tool.tool === toolName)
.map((tool) => tool.input);
}

View File

@@ -0,0 +1,246 @@
import { afterEach, describe, expect, it } from 'bun:test'
import type { BackendValidationSettings } from '../../core/backendValidation'
import { BackendPreviewClient } from './backendPreview'
const ORIGINAL_FETCH = globalThis.fetch
afterEach(() => {
globalThis.fetch = ORIGINAL_FETCH
})
describe('BackendPreviewClient', () => {
it('updates an existing seeded script on path conflict and waits for deployment', async () => {
const requests: Array<{ url: string; init?: RequestInit }> = []
globalThis.fetch = mockFetch(
requests,
textResponse(200, 'token'),
textResponse(200, ''),
textResponse(400, 'Path conflict for f/evals/add_two_numbers with non-archived hash 123'),
jsonResponse(200, { hash: '123' }),
textResponse(200, '456'),
jsonResponse(200, { lock: 'script.lock', lock_error_logs: null })
)
const client = new BackendPreviewClient(
buildSettings({ baseUrl: 'http://backend.test/script-upsert' })
)
await client.createScript({
workspaceId: 'test',
path: 'f/evals/add_two_numbers',
summary: 'Add two numbers',
content: 'export async function main(a: number, b: number) { return a + b }',
language: 'bun'
})
expect(requests.map((entry) => entry.url)).toEqual([
'http://backend.test/script-upsert/api/auth/login',
'http://backend.test/script-upsert/api/w/test/folders/create',
'http://backend.test/script-upsert/api/w/test/scripts/create',
'http://backend.test/script-upsert/api/w/test/scripts/get/p/f/evals/add_two_numbers',
'http://backend.test/script-upsert/api/w/test/scripts/create',
'http://backend.test/script-upsert/api/w/test/scripts/deployment_status/h/456'
])
const updateRequest = requests[4]
expect(updateRequest.init?.method).toBe('POST')
expect(JSON.parse(String(updateRequest.init?.body))).toMatchObject({
path: 'f/evals/add_two_numbers',
parent_hash: '123',
language: 'bun'
})
})
it('updates an existing seeded flow on create conflict', async () => {
const requests: Array<{ url: string; init?: RequestInit }> = []
globalThis.fetch = mockFetch(
requests,
textResponse(200, 'token'),
textResponse(200, ''),
textResponse(400, 'Flow f/evals/add_numbers_flow already exists'),
textResponse(200, '')
)
const client = new BackendPreviewClient(
buildSettings({ baseUrl: 'http://backend.test/flow-upsert' })
)
await client.createFlow({
workspaceId: 'test',
path: 'f/evals/add_numbers_flow',
summary: 'Add numbers',
value: { modules: [] }
})
expect(requests.map((entry) => entry.url)).toEqual([
'http://backend.test/flow-upsert/api/auth/login',
'http://backend.test/flow-upsert/api/w/test/folders/create',
'http://backend.test/flow-upsert/api/w/test/flows/create',
'http://backend.test/flow-upsert/api/w/test/flows/update/f/evals/add_numbers_flow'
])
const updateRequest = requests[3]
expect(updateRequest.init?.method).toBe('POST')
expect(JSON.parse(String(updateRequest.init?.body))).toMatchObject({
path: 'f/evals/add_numbers_flow',
value: { modules: [] }
})
})
it('serializes shared-workspace validations inside the overridden workspace', async () => {
globalThis.fetch = async (input) => {
const url = String(input)
if (url.endsWith('/api/auth/login')) {
return textResponse(200, 'token')
}
if (url.endsWith('/api/workspaces/exists')) {
return textResponse(200, 'true')
}
if (url.endsWith('/api/w/shared-preview/flows/list_paths')) {
return jsonResponse(200, [])
}
if (url.endsWith('/api/w/shared-preview/scripts/list_paths')) {
return jsonResponse(200, [])
}
throw new Error(`Unexpected fetch: ${url}`)
}
const client = new BackendPreviewClient(
buildSettings({
baseUrl: 'http://backend.test/shared-lock',
workspaceOverride: 'shared-preview'
})
)
const order: string[] = []
let releaseFirst: (() => void) | undefined
let notifyFirstStart: (() => void) | undefined
const firstStarted = new Promise<void>((resolve) => {
notifyFirstStart = resolve
})
const first = client.withWorkspace('flow-test1', 1, async () => {
order.push('first:start')
notifyFirstStart?.()
await new Promise<void>((resolve) => {
releaseFirst = resolve
})
order.push('first:end')
})
const second = client.withWorkspace('flow-test2', 1, async () => {
order.push('second:start')
order.push('second:end')
})
await firstStarted
expect(order).toEqual(['first:start'])
releaseFirst?.()
await Promise.all([first, second])
expect(order).toEqual(['first:start', 'first:end', 'second:start', 'second:end'])
})
it('clears managed shared-workspace assets before preview runs', async () => {
const requests: Array<{ url: string; init?: RequestInit }> = []
globalThis.fetch = mockFetch(
requests,
textResponse(200, 'token'),
textResponse(200, 'true'),
jsonResponse(200, ['f/evals/old_subflow', 'u/admin/keep_flow']),
textResponse(200, ''),
jsonResponse(200, ['f/evals/old_script', 'f/shared/keep_script']),
textResponse(200, '')
)
const client = new BackendPreviewClient(
buildSettings({
baseUrl: 'http://backend.test/shared-cleanup',
workspaceOverride: 'shared-preview'
})
)
await client.withWorkspace('flow-test1', 1, async () => undefined)
expect(requests.map((entry) => entry.url)).toEqual([
'http://backend.test/shared-cleanup/api/auth/login',
'http://backend.test/shared-cleanup/api/workspaces/exists',
'http://backend.test/shared-cleanup/api/w/shared-preview/flows/list_paths',
'http://backend.test/shared-cleanup/api/w/shared-preview/flows/delete/f/evals/old_subflow',
'http://backend.test/shared-cleanup/api/w/shared-preview/scripts/list_paths',
'http://backend.test/shared-cleanup/api/w/shared-preview/scripts/delete/p/f/evals/old_script'
])
})
it('retries login after a cached login failure', async () => {
const requests: Array<{ url: string; init?: RequestInit }> = []
globalThis.fetch = mockFetch(
requests,
textResponse(503, 'backend starting'),
textResponse(200, 'token'),
textResponse(200, 'true'),
jsonResponse(200, []),
jsonResponse(200, [])
)
const client = new BackendPreviewClient(
buildSettings({
baseUrl: 'http://backend.test/login-retry',
workspaceOverride: 'shared-preview'
})
)
await expect(client.withWorkspace('flow-test1', 1, async () => undefined)).rejects.toThrow(
'login for backend validation failed'
)
await expect(client.withWorkspace('flow-test1', 1, async () => 'ok')).resolves.toBe('ok')
expect(
requests.filter((entry) => entry.url === 'http://backend.test/login-retry/api/auth/login')
).toHaveLength(2)
})
})
function buildSettings(
overrides: Partial<BackendValidationSettings> = {}
): BackendValidationSettings {
return {
mode: 'preview',
baseUrl: 'http://backend.test/default',
email: 'admin@windmill.dev',
password: 'changeme',
keepWorkspaces: true,
workspacePrefix: 'ai-evals',
pollIntervalMs: 1,
maxWaitMs: 50,
...overrides
}
}
function mockFetch(
requests: Array<{ url: string; init?: RequestInit }>,
...responses: Response[]
): typeof fetch {
const queue = [...responses]
return async (input, init) => {
const url = String(input)
requests.push({ url, init })
const next = queue.shift()
if (!next) {
throw new Error(`Unexpected fetch: ${url}`)
}
return next
}
}
function jsonResponse(status: number, body: unknown): Response {
return new Response(JSON.stringify(body), {
status,
headers: { 'Content-Type': 'application/json' }
})
}
function textResponse(status: number, body: string): Response {
return new Response(body, { status })
}

View File

@@ -0,0 +1,502 @@
import { randomUUID } from 'node:crypto'
import type { BackendValidationSettings } from '../../core/backendValidation'
interface CompletedJobResultMaybe {
completed: boolean
result: unknown
success?: boolean
started?: boolean
}
interface ScriptDeploymentStatus {
lock?: unknown
lock_error_logs?: string | null
}
export interface CompletedPreviewJob {
id: string
success: boolean
result: unknown
logs?: string | null
raw: Record<string, unknown>
}
const tokenCache = new Map<string, Promise<string>>()
const sharedWorkspaceQueue = new Map<string, Promise<void>>()
const managedSharedWorkspacePrefixes = ['f/evals/']
export class BackendPreviewClient {
constructor(private readonly settings: BackendValidationSettings) {}
async withWorkspace<T>(
caseId: string,
attempt: number,
body: (workspaceId: string) => Promise<T>
): Promise<T> {
const workspaceId =
this.settings.workspaceOverride ??
buildWorkspaceId(this.settings.workspacePrefix, caseId, attempt)
const run = async () => {
await this.ensureWorkspace(workspaceId)
if (this.settings.workspaceOverride) {
await this.clearManagedSharedWorkspaceAssets(workspaceId)
}
try {
return await body(workspaceId)
} finally {
if (!this.settings.keepWorkspaces && !this.settings.workspaceOverride) {
await this.deleteWorkspace(workspaceId).catch(() => undefined)
}
}
}
if (this.settings.workspaceOverride) {
return await withSharedWorkspaceLock(workspaceId, run)
}
return await run()
}
async createScript(input: {
workspaceId: string
path: string
summary: string
description?: string
schema?: Record<string, unknown>
content: string
language: string
}): Promise<void> {
await this.ensureFolderForPath(input.workspaceId, input.path)
const payload = {
path: input.path,
summary: input.summary,
description: input.description ?? '',
content: input.content,
schema: input.schema ?? { type: 'object', properties: {}, required: [] },
is_template: false,
language: input.language,
kind: 'script'
}
const response = await this.request(`/w/${encodeURIComponent(input.workspaceId)}/scripts/create`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
})
if (response.ok) {
await this.waitForScriptDeployment(input.workspaceId, input.path, (await response.text()).trim())
return
}
const message = await response.text()
if (!isConflictMessage(message)) {
throw new Error(`create script ${input.path} failed: ${response.status} ${response.statusText} - ${message}`)
}
const currentScript = await this.getScriptByPath(input.workspaceId, input.path)
const currentHash = readStringField(currentScript, 'hash', `script ${input.path}`)
const updateResponse = await this.request(
`/w/${encodeURIComponent(input.workspaceId)}/scripts/create`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
...payload,
parent_hash: currentHash
})
}
)
await expectOk(updateResponse, `update script ${input.path}`)
await this.waitForScriptDeployment(input.workspaceId, input.path, (await updateResponse.text()).trim())
}
async createFlow(input: {
workspaceId: string
path: string
summary: string
description?: string
schema?: Record<string, unknown>
value: Record<string, unknown>
}): Promise<void> {
await this.ensureFolderForPath(input.workspaceId, input.path)
const payload = {
path: input.path,
summary: input.summary,
description: input.description ?? '',
schema: input.schema ?? { type: 'object', properties: {}, required: [] },
value: input.value
}
const response = await this.request(`/w/${encodeURIComponent(input.workspaceId)}/flows/create`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
})
if (response.ok) {
return
}
const message = await response.text()
if (!isConflictMessage(message)) {
throw new Error(`create flow ${input.path} failed: ${response.status} ${response.statusText} - ${message}`)
}
const updateResponse = await this.request(
`/w/${encodeURIComponent(input.workspaceId)}/flows/update/${input.path}`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
}
)
await expectOk(updateResponse, `update flow ${input.path}`)
}
async runScriptPreview(input: {
workspaceId: string
content: string
args: Record<string, unknown>
language: string
path?: string
timeoutSeconds?: number
}): Promise<CompletedPreviewJob> {
const response = await this.request(
withQuery(`/w/${encodeURIComponent(input.workspaceId)}/jobs/run/preview`, {
timeout: input.timeoutSeconds
}),
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
content: input.content,
args: input.args,
language: input.language,
path: input.path
})
}
)
await expectOk(response, 'start script preview')
const jobId = (await response.text()).trim()
return await this.waitForCompletedJob(input.workspaceId, jobId)
}
async runFlowPreview(input: {
workspaceId: string
value: Record<string, unknown>
args: Record<string, unknown>
timeoutSeconds?: number
path?: string
}): Promise<CompletedPreviewJob> {
const response = await this.request(
withQuery(`/w/${encodeURIComponent(input.workspaceId)}/jobs/run/preview_flow`, {
timeout: input.timeoutSeconds
}),
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
value: input.value,
args: input.args,
path: input.path
})
}
)
await expectOk(response, 'start flow preview')
const jobId = (await response.text()).trim()
return await this.waitForCompletedJob(input.workspaceId, jobId)
}
private async ensureWorkspace(workspaceId: string): Promise<void> {
const existsResponse = await this.request('/workspaces/exists', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ id: workspaceId })
})
await expectOk(existsResponse, `check workspace ${workspaceId}`)
if ((await existsResponse.text()).trim() === 'true') {
return
}
const createResponse = await this.request('/workspaces/create', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ id: workspaceId, name: workspaceId })
})
try {
await expectOk(createResponse, `create workspace ${workspaceId}`)
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
if (message.includes('maximum number of workspaces')) {
throw new Error(
`${message}. Reuse an existing workspace with WMILL_AI_EVAL_BACKEND_WORKSPACE=<workspace-id>.`
)
}
throw error
}
}
private async deleteWorkspace(workspaceId: string): Promise<void> {
const response = await this.request(`/workspaces/delete/${encodeURIComponent(workspaceId)}`, {
method: 'DELETE'
})
await expectOk(response, `delete workspace ${workspaceId}`)
}
private async ensureFolderForPath(workspaceId: string, path: string): Promise<void> {
const folderName = extractFolderName(path)
if (!folderName) {
return
}
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/folders/create`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: folderName })
})
if (response.ok) {
return
}
const message = await response.text()
if (!message.toLowerCase().includes('already exists')) {
throw new Error(`Failed to create folder ${folderName}: ${message}`)
}
}
private async waitForCompletedJob(
workspaceId: string,
jobId: string
): Promise<CompletedPreviewJob> {
const deadline = Date.now() + this.settings.maxWaitMs
while (Date.now() < deadline) {
const maybeResponse = await this.request(
`/w/${encodeURIComponent(workspaceId)}/jobs_u/completed/get_result_maybe/${encodeURIComponent(jobId)}?get_started=false`
)
await expectOk(maybeResponse, `poll job ${jobId}`)
const maybeResult = (await maybeResponse.json()) as CompletedJobResultMaybe
if (maybeResult.completed) {
const completedResponse = await this.request(
`/w/${encodeURIComponent(workspaceId)}/jobs_u/completed/get/${encodeURIComponent(jobId)}`
)
await expectOk(completedResponse, `get completed job ${jobId}`)
const completedJob = (await completedResponse.json()) as Record<string, unknown>
return {
id: jobId,
success: Boolean(maybeResult.success),
result: maybeResult.result,
logs:
typeof completedJob.logs === 'string' || completedJob.logs === null
? (completedJob.logs as string | null)
: null,
raw: completedJob
}
}
await new Promise((resolve) => setTimeout(resolve, this.settings.pollIntervalMs))
}
throw new Error(`Timed out waiting for preview job ${jobId} to complete`)
}
private async getScriptByPath(workspaceId: string, path: string): Promise<Record<string, unknown>> {
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/scripts/get/p/${path}`)
await expectOk(response, `get script ${path}`)
return (await response.json()) as Record<string, unknown>
}
private async clearManagedSharedWorkspaceAssets(workspaceId: string): Promise<void> {
const flowPaths = await this.listFlowPaths(workspaceId)
for (const path of flowPaths.filter(isManagedSharedWorkspacePath)) {
await this.deleteFlowByPath(workspaceId, path)
}
const scriptPaths = await this.listScriptPaths(workspaceId)
for (const path of scriptPaths.filter(isManagedSharedWorkspacePath)) {
await this.deleteScriptByPath(workspaceId, path)
}
}
private async listFlowPaths(workspaceId: string): Promise<string[]> {
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/flows/list_paths`)
await expectOk(response, `list flows in workspace ${workspaceId}`)
return await response.json()
}
private async listScriptPaths(workspaceId: string): Promise<string[]> {
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/scripts/list_paths`)
await expectOk(response, `list scripts in workspace ${workspaceId}`)
return await response.json()
}
private async deleteFlowByPath(workspaceId: string, path: string): Promise<void> {
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/flows/delete/${path}`, {
method: 'DELETE'
})
await expectOk(response, `delete flow ${path}`)
}
private async deleteScriptByPath(workspaceId: string, path: string): Promise<void> {
const response = await this.request(`/w/${encodeURIComponent(workspaceId)}/scripts/delete/p/${path}`, {
method: 'POST'
})
await expectOk(response, `delete script ${path}`)
}
private async waitForScriptDeployment(
workspaceId: string,
path: string,
hash: string
): Promise<void> {
const deadline = Date.now() + this.settings.maxWaitMs
while (Date.now() < deadline) {
const response = await this.request(
`/w/${encodeURIComponent(workspaceId)}/scripts/deployment_status/h/${encodeURIComponent(hash)}`
)
await expectOk(response, `check deployment status for script ${path}`)
const deployment = (await response.json()) as ScriptDeploymentStatus
if (deployment.lock != null) {
return
}
if (deployment.lock_error_logs) {
throw new Error(`Script deployment failed for ${path}: ${deployment.lock_error_logs}`)
}
await new Promise((resolve) => setTimeout(resolve, this.settings.pollIntervalMs))
}
throw new Error(`Timed out waiting for script ${path} (${hash}) to deploy`)
}
private async request(path: string, init?: RequestInit): Promise<Response> {
const token = await this.getToken()
return await fetch(`${this.settings.baseUrl}/api${path}`, {
...init,
headers: {
Authorization: `Bearer ${token}`,
...(init?.headers ?? {})
}
})
}
private async getToken(): Promise<string> {
const cacheKey = `${this.settings.baseUrl}|${this.settings.email}`
let tokenPromise = tokenCache.get(cacheKey)
if (!tokenPromise) {
tokenPromise = this.login().catch((error) => {
if (tokenCache.get(cacheKey) === tokenPromise) {
tokenCache.delete(cacheKey)
}
throw error
})
tokenCache.set(cacheKey, tokenPromise)
}
return await tokenPromise
}
private async login(): Promise<string> {
const response = await fetch(`${this.settings.baseUrl}/api/auth/login`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
email: this.settings.email,
password: this.settings.password
})
})
await expectOk(response, 'login for backend validation')
return (await response.text()).trim()
}
}
async function withSharedWorkspaceLock<T>(workspaceId: string, body: () => Promise<T>): Promise<T> {
const previous = sharedWorkspaceQueue.get(workspaceId) ?? Promise.resolve()
let releaseCurrent: (() => void) | undefined
const current = new Promise<void>((resolve) => {
releaseCurrent = resolve
})
const tail = previous.catch(() => undefined).then(() => current)
sharedWorkspaceQueue.set(workspaceId, tail)
await previous.catch(() => undefined)
try {
return await body()
} finally {
releaseCurrent?.()
if (sharedWorkspaceQueue.get(workspaceId) === tail) {
sharedWorkspaceQueue.delete(workspaceId)
}
}
}
function buildWorkspaceId(prefix: string, caseId: string, attempt: number): string {
const caseSlug = caseId
.toLowerCase()
.replace(/[^a-z0-9-]+/g, '-')
.replace(/^-+|-+$/g, '')
.slice(0, 30)
const suffix = randomUUID().slice(0, 8)
return `${prefix}-${caseSlug || 'case'}-a${attempt}-${suffix}`
}
function extractFolderName(path: string): string | null {
if (!path.startsWith('f/')) {
return null
}
const segments = path.split('/').slice(1, -1)
return segments.length > 0 ? segments.join('/') : null
}
function withQuery(
path: string,
params: Record<string, string | number | undefined>
): string {
const query = new URLSearchParams()
for (const [key, value] of Object.entries(params)) {
if (value === undefined) {
continue
}
query.set(key, String(value))
}
const suffix = query.toString()
return suffix ? `${path}?${suffix}` : path
}
async function expectOk(response: Response, context: string): Promise<void> {
if (response.ok) {
return
}
throw new Error(`${context} failed: ${response.status} ${response.statusText} - ${await response.text()}`)
}
function readStringField(
value: Record<string, unknown>,
field: string,
context: string
): string {
const candidate = value[field]
if (typeof candidate === 'string' && candidate.length > 0) {
return candidate
}
throw new Error(`${context} is missing string field ${field}`)
}
function isConflictMessage(message: string): boolean {
const normalized = message.toLowerCase()
return normalized.includes('already exists') || normalized.includes('path conflict')
}
function isManagedSharedWorkspacePath(path: string): boolean {
return managedSharedWorkspacePrefixes.some((prefix) => path.startsWith(prefix))
}

View File

@@ -0,0 +1,93 @@
import { loadSelectedCases } from "../../core/cases";
import { resolveBackendValidationSettings } from "../../core/backendValidation";
import {
formatRunModelLabel,
getFrontendEvalModel,
resolveEvalModel,
} from "../../core/models";
import { buildRunResult } from "../../core/results";
import { runSuite } from "../../core/runSuite";
import type { BenchmarkRunResult, ModeRunner } from "../../core/types";
import { emitFrontendBenchmarkProgress } from "./progress";
import { createAppModeRunner } from "../../modes/app";
import { createFlowModeRunner } from "../../modes/flow";
import { createScriptModeRunner } from "../../modes/script";
import { DEFAULT_JUDGE_MODEL } from "../../core/judge";
export type FrontendBenchmarkMode = "flow" | "app" | "script";
export async function runFrontendBenchmarkFromEnv(): Promise<BenchmarkRunResult> {
const mode = parseMode(process.env.WMILL_FRONTEND_AI_EVAL_MODE);
const caseIds = parseOptionalJsonStringArray(process.env.WMILL_FRONTEND_AI_EVAL_CASE_IDS);
const runs = parsePositiveInteger(process.env.WMILL_FRONTEND_AI_EVAL_RUNS, "WMILL_FRONTEND_AI_EVAL_RUNS");
const emitProgress = process.env.WMILL_FRONTEND_AI_EVAL_PROGRESS === "1";
const verbose = process.env.WMILL_FRONTEND_AI_EVAL_VERBOSE === "1";
const model = resolveEvalModel(mode, process.env.WMILL_FRONTEND_AI_EVAL_MODEL);
const backendValidation = resolveBackendValidationSettings({
evalMode: mode,
requestedMode: process.env.WMILL_FRONTEND_AI_EVAL_BACKEND_VALIDATION,
});
const selectedCases = await loadSelectedCases(mode, caseIds);
const modeRunner = getModeRunner(mode, getFrontendEvalModel(model), backendValidation);
const runModel = formatRunModelLabel(mode, model);
const caseResults = await runSuite({
modeRunner,
cases: selectedCases,
runs,
runModel,
judgeModel: DEFAULT_JUDGE_MODEL,
concurrency: verbose ? 1 : undefined,
verbose,
onProgress: emitProgress ? (event) => emitFrontendBenchmarkProgress(event) : undefined,
});
return buildRunResult({
mode,
runs,
runModel,
judgeModel: DEFAULT_JUDGE_MODEL,
caseResults,
});
}
function getModeRunner(
mode: FrontendBenchmarkMode,
model: ReturnType<typeof getFrontendEvalModel>,
backendValidation: ReturnType<typeof resolveBackendValidationSettings>
): ModeRunner<any, any, any> {
switch (mode) {
case "flow":
return createFlowModeRunner(model, backendValidation);
case "app":
return createAppModeRunner(model);
case "script":
return createScriptModeRunner(model, backendValidation);
}
}
function parseMode(value: string | undefined): FrontendBenchmarkMode {
if (value === "flow" || value === "app" || value === "script") {
return value;
}
throw new Error(`Unsupported frontend benchmark mode: ${String(value)}`);
}
function parseOptionalJsonStringArray(value: string | undefined): string[] {
if (!value) {
return [];
}
const parsed = JSON.parse(value) as unknown;
if (!Array.isArray(parsed) || parsed.some((entry) => typeof entry !== "string")) {
throw new Error("WMILL_FRONTEND_AI_EVAL_CASE_IDS must be a JSON string array");
}
return parsed;
}
function parsePositiveInteger(value: string | undefined, envName: string): number {
const parsed = Number(value);
if (!Number.isInteger(parsed) || parsed <= 0) {
throw new Error(`${envName} must be a positive integer`);
}
return parsed;
}

View File

@@ -0,0 +1,92 @@
import { mkdtemp } from 'fs/promises'
import { tmpdir } from 'os'
import { join } from 'path'
import type {
AppFiles,
BackendRunnable,
AppAIChatHelpers
} from '../../../../../frontend/src/lib/components/copilot/chat/app/core'
import {
getAppTools,
prepareAppSystemMessage,
prepareAppUserMessage
} from '../../../../../frontend/src/lib/components/copilot/chat/app/core'
import type { Tool as ProductionTool } from '../../../../../frontend/src/lib/components/copilot/chat/shared'
import { createAppFileHelpers } from './fileHelpers'
import { runEval } from '../shared'
import type { AIProvider } from '$lib/gen/types.gen'
import type { ModeRunContext } from '../../../../core/types'
import type { TokenUsage } from '../shared/types'
export interface AppEvalResult {
success: boolean
files: AppFiles
error?: string
assistantMessageCount: number
toolCallCount: number
toolsUsed: string[]
tokenUsage: TokenUsage
}
export interface AppEvalOptions {
initialFrontend?: Record<string, string>
initialBackend?: Record<string, BackendRunnable>
model?: string
maxIterations?: number
provider?: AIProvider
workspaceRoot?: string
runContext?: ModeRunContext
}
export async function runAppEval(
userPrompt: string,
apiKey: string,
options?: AppEvalOptions
): Promise<AppEvalResult> {
const workspaceRoot =
options?.workspaceRoot ??
(await mkdtemp(join(tmpdir(), 'wmill-frontend-app-benchmark-')))
const { helpers, getFiles, cleanup } = await createAppFileHelpers(
options?.initialFrontend ?? {},
options?.initialBackend ?? {},
workspaceRoot
)
try {
const systemMessage = prepareAppSystemMessage()
const tools = getAppTools() as ProductionTool<AppAIChatHelpers>[]
const model = options?.model ?? 'claude-haiku-4-5-20251001'
const userMessage = prepareAppUserMessage(userPrompt, helpers.getSelectedContext())
const rawResult = await runEval({
userPrompt,
systemMessage,
userMessage,
tools,
helpers,
apiKey,
getOutput: getFiles,
onAssistantMessageStart: options?.runContext?.onAssistantMessageStart,
onAssistantToken: options?.runContext?.onAssistantChunk,
onAssistantMessageEnd: options?.runContext?.onAssistantMessageEnd,
options: {
maxIterations: options?.maxIterations,
model,
workspace: workspaceRoot,
provider: options?.provider
}
})
return {
files: rawResult.output,
success: rawResult.success,
error: rawResult.error,
assistantMessageCount: rawResult.iterations,
toolCallCount: rawResult.toolCallsCount,
toolsUsed: rawResult.toolsCalled,
tokenUsage: rawResult.tokenUsage
}
} finally {
await cleanup()
}
}

View File

@@ -1,4 +1,8 @@
import type { AppFiles, BackendRunnable, InlineScript } from '../../app/core'
import type {
AppFiles,
BackendRunnable,
InlineScript
} from '../../../../../frontend/src/lib/components/copilot/chat/app/core'
/**
* Backend runnable metadata stored in meta.json files.

View File

@@ -0,0 +1,255 @@
import { mkdir, rm, writeFile } from 'fs/promises'
import { dirname, join } from 'path'
import type {
AppAIChatHelpers,
AppFiles,
BackendRunnable,
DataTableSchema,
LintResult,
SelectedContext
} from '../../../../../frontend/src/lib/components/copilot/chat/app/core'
function createEmptyLintResult(): LintResult {
return {
errorCount: 0,
warningCount: 0,
errors: { frontend: {}, backend: {} },
warnings: { frontend: {}, backend: {} }
}
}
async function writeFrontendFile(
workspaceRoot: string | undefined,
path: string,
content: string
): Promise<void> {
if (!workspaceRoot) {
return
}
const relativePath = path.startsWith('/') ? path.slice(1) : path
const fullPath = join(workspaceRoot, 'frontend', relativePath)
await mkdir(dirname(fullPath), { recursive: true })
await writeFile(fullPath, content, 'utf8')
}
async function removeFrontendFile(workspaceRoot: string | undefined, path: string): Promise<void> {
if (!workspaceRoot) {
return
}
const relativePath = path.startsWith('/') ? path.slice(1) : path
await rm(join(workspaceRoot, 'frontend', relativePath), { force: true })
}
async function writeBackendRunnable(
workspaceRoot: string | undefined,
key: string,
runnable: BackendRunnable
): Promise<void> {
if (!workspaceRoot) {
return
}
const runnableDir = join(workspaceRoot, 'backend', key)
await mkdir(runnableDir, { recursive: true })
const meta: { name: string; language?: string; type?: string; path?: string } = {
name: runnable.name
}
if (runnable.type === 'inline' && runnable.inlineScript) {
meta.language = runnable.inlineScript.language
const extension = runnable.inlineScript.language === 'python3' ? 'py' : 'ts'
await writeFile(
join(runnableDir, `main.${extension}`),
runnable.inlineScript.content,
'utf8'
)
} else {
meta.type = runnable.type
if (runnable.path) {
meta.path = runnable.path
}
}
await writeFile(join(runnableDir, 'meta.json'), JSON.stringify(meta, null, 2) + '\n', 'utf8')
}
async function removeBackendRunnable(workspaceRoot: string | undefined, key: string): Promise<void> {
if (!workspaceRoot) {
return
}
await rm(join(workspaceRoot, 'backend', key), { recursive: true, force: true })
}
async function persistDatatables(
workspaceRoot: string | undefined,
datatables: DataTableSchema[]
): Promise<void> {
if (!workspaceRoot) {
return
}
await writeFile(
join(workspaceRoot, 'datatables.json'),
JSON.stringify(datatables, null, 2) + '\n',
'utf8'
)
}
export async function createAppFileHelpers(
initialFrontend: Record<string, string> = {},
initialBackend: Record<string, BackendRunnable> = {},
workspaceRoot?: string
): Promise<{
helpers: AppAIChatHelpers
getFiles: () => AppFiles
getFrontend: () => Record<string, string>
getBackend: () => Record<string, BackendRunnable>
cleanup: () => Promise<void>
workspaceDir: string | null
}> {
let frontend = { ...initialFrontend }
let backend = { ...initialBackend }
let snapshotId = 0
const snapshots = new Map<
number,
{ frontend: Record<string, string>; backend: Record<string, BackendRunnable> }
>()
const datatables: DataTableSchema[] = []
for (const [path, content] of Object.entries(frontend)) {
await writeFrontendFile(workspaceRoot, path, content)
}
for (const [key, runnable] of Object.entries(backend)) {
await writeBackendRunnable(workspaceRoot, key, runnable)
}
await persistDatatables(workspaceRoot, datatables)
const helpers: AppAIChatHelpers = {
listFrontendFiles: () => Object.keys(frontend),
getFrontendFile: (path: string) => frontend[path],
getFrontendFiles: () => ({ ...frontend }),
setFrontendFile: (path: string, content: string) => {
frontend[path] = content
void writeFrontendFile(workspaceRoot, path, content)
return createEmptyLintResult()
},
deleteFrontendFile: (path: string) => {
delete frontend[path]
void removeFrontendFile(workspaceRoot, path)
},
listBackendRunnables: () =>
Object.entries(backend).map(([key, runnable]) => ({
key,
name: runnable.name
})),
getBackendRunnable: (key: string) => backend[key],
getBackendRunnables: () => ({ ...backend }),
setBackendRunnable: async (key: string, runnable: BackendRunnable) => {
backend[key] = runnable
await writeBackendRunnable(workspaceRoot, key, runnable)
return createEmptyLintResult()
},
deleteBackendRunnable: (key: string) => {
delete backend[key]
void removeBackendRunnable(workspaceRoot, key)
},
getFiles: (): AppFiles => ({
frontend: { ...frontend },
backend: { ...backend }
}),
getSelectedContext: (): SelectedContext => ({ type: 'none' }),
snapshot: () => {
const id = ++snapshotId
snapshots.set(id, {
frontend: { ...frontend },
backend: { ...backend }
})
return id
},
revertToSnapshot: (id: number) => {
const snapshot = snapshots.get(id)
if (!snapshot) {
return
}
frontend = { ...snapshot.frontend }
backend = { ...snapshot.backend }
void syncWorkspace()
},
lint: () => createEmptyLintResult(),
getDatatables: async () => structuredClone(datatables),
getAvailableDatatableNames: () => datatables.map((datatable) => datatable.datatable_name),
execDatatableSql: async (
datatableName: string,
sql: string,
newTable?: { schema: string; name: string }
) => {
if (newTable) {
datatables.push({
datatable_name: datatableName,
schemas: {
[newTable.schema]: {
[newTable.name]: {}
}
}
})
await persistDatatables(workspaceRoot, datatables)
}
return {
success: true,
result: [
{
datatableName,
sql
}
]
}
},
addTableToWhitelist: (datatableName: string, schemaName: string, tableName: string) => {
const existing = datatables.find((entry) => entry.datatable_name === datatableName)
if (existing) {
existing.schemas[schemaName] ??= {}
existing.schemas[schemaName][tableName] ??= {}
} else {
datatables.push({
datatable_name: datatableName,
schemas: {
[schemaName]: {
[tableName]: {}
}
}
})
}
void persistDatatables(workspaceRoot, datatables)
}
}
async function syncWorkspace(): Promise<void> {
if (!workspaceRoot) {
return
}
await rm(join(workspaceRoot, 'frontend'), { recursive: true, force: true })
await rm(join(workspaceRoot, 'backend'), { recursive: true, force: true })
for (const [path, content] of Object.entries(frontend)) {
await writeFrontendFile(workspaceRoot, path, content)
}
for (const [key, runnable] of Object.entries(backend)) {
await writeBackendRunnable(workspaceRoot, key, runnable)
}
await persistDatatables(workspaceRoot, datatables)
}
return {
helpers,
getFiles: () => ({
frontend: { ...frontend },
backend: { ...backend }
}),
getFrontend: () => ({ ...frontend }),
getBackend: () => ({ ...backend }),
cleanup: async () => {
if (workspaceRoot) {
await rm(workspaceRoot, { recursive: true, force: true })
}
},
workspaceDir: workspaceRoot ?? null
}
}

View File

@@ -0,0 +1,169 @@
import { mkdir, rm, writeFile } from 'fs/promises'
import { dirname, join } from 'path'
import type { FlowModule, InputTransform } from '../../../../../frontend/src/lib/gen'
import type { ExtendedOpenFlow } from '../../../../../frontend/src/lib/components/flows/types'
import type { FlowAIChatHelpers } from '../../../../../frontend/src/lib/components/copilot/chat/flow/core'
import type { ScriptLintResult } from '../../../../../frontend/src/lib/components/copilot/chat/shared'
import { getSubModules } from '../../../../../frontend/src/lib/components/flows/flowExplorer'
import {
createInlineScriptSession
} from '../../../../../frontend/src/lib/components/copilot/chat/flow/inlineScriptsUtils'
import {
applyFlowJsonUpdate,
getFlowModuleById,
updateRawScriptModuleContent
} from '../../../../../frontend/src/lib/components/copilot/chat/flow/helperUtils'
import {
registerBenchmarkWorkspace,
registerBenchmarkWorkspaceRunnables,
unregisterBenchmarkWorkspaceRunnables,
createBenchmarkCompletedJob,
type BenchmarkWorkspaceFlow,
type BenchmarkWorkspaceScript
} from '../../mockBackend'
const EMPTY_SCRIPT_LINT_RESULT: ScriptLintResult = {
errorCount: 0,
warningCount: 0,
errors: [],
warnings: []
}
export interface FlowWorkspaceFixtures {
scripts?: BenchmarkWorkspaceScript[]
flows?: BenchmarkWorkspaceFlow[]
}
export async function createFlowFileHelpers(
initialModules: FlowModule[] = [],
initialSchema?: Record<string, any>,
initialPreprocessorModule?: FlowModule,
initialFailureModule?: FlowModule,
workspaceRoot?: string,
workspaceFixtures?: FlowWorkspaceFixtures
): Promise<{
helpers: FlowAIChatHelpers
getFlow: () => ExtendedOpenFlow
getModules: () => FlowModule[]
cleanup: () => Promise<void>
workspaceDir: string | null
}> {
let flow: ExtendedOpenFlow = {
value: {
modules: structuredClone(initialModules),
preprocessor_module: structuredClone(initialPreprocessorModule),
failure_module: structuredClone(initialFailureModule)
},
summary: '',
schema: initialSchema ?? {
$schema: 'https://json-schema.org/draft/2020-12/schema',
properties: {},
required: [],
type: 'object'
}
}
const inlineScriptSession = createInlineScriptSession()
const flowFilePath = workspaceRoot ? join(workspaceRoot, 'flow.json') : null
async function persistFlow(): Promise<void> {
if (!flowFilePath) {
return
}
await mkdir(dirname(flowFilePath), { recursive: true })
await writeFile(flowFilePath, JSON.stringify(flow, null, 2) + '\n', 'utf8')
}
await persistFlow()
if (workspaceRoot) {
registerBenchmarkWorkspace(workspaceRoot)
if (workspaceFixtures) {
registerBenchmarkWorkspaceRunnables(workspaceRoot, workspaceFixtures)
}
}
const helpers: FlowAIChatHelpers = {
getFlowAndSelectedId: () => ({ flow, selectedId: '' }),
getModules: (id?: string) => {
if (!id) return flow.value.modules
const module = getFlowModuleById(flow, id)
return module ? getSubModules(module).flat() : []
},
inlineScriptSession,
setSnapshot: () => {},
revertToSnapshot: () => {},
setCode: async (id: string, code: string) => {
updateRawScriptModuleContent(flow, id, code)
inlineScriptSession.set(id, code)
await persistFlow()
},
setFlowJson: async (
modules: FlowModule[] | undefined,
schema: Record<string, any> | undefined,
preprocessorModule: FlowModule | null | undefined,
failureModule: FlowModule | null | undefined
) => {
applyFlowJsonUpdate(flow, inlineScriptSession, {
modules,
schema,
preprocessorModule,
failureModule
})
await persistFlow()
},
getFlowInputsSchema: async () => flow.schema ?? {},
updateExprsToSet: (_id: string, _inputTransforms: Record<string, InputTransform>) => {},
acceptAllModuleActions: () => {},
rejectAllModuleActions: () => {},
hasPendingChanges: () => false,
selectStep: (_id: string) => {},
testFlow: async (args?: Record<string, any>) => {
if (workspaceRoot) {
const runPath = join(workspaceRoot, 'test-run.json')
await writeFile(
runPath,
JSON.stringify(
{
requestedArgs: args ?? {},
modules: flow.value.modules.map((module) => module.id),
preprocessor_module: flow.value.preprocessor_module?.id ?? null,
failure_module: flow.value.failure_module?.id ?? null
},
null,
2
) + '\n',
'utf8'
)
}
return createBenchmarkCompletedJob({
workspace: workspaceRoot ?? 'benchmark',
jobKind: 'flowpreview',
result: {
requestedArgs: args ?? {},
modules: flow.value.modules.map((module) => module.id),
preprocessor_module: flow.value.preprocessor_module?.id ?? null,
failure_module: flow.value.failure_module?.id ?? null,
mocked: true
},
logs: 'Mock benchmark flow test run completed successfully.'
})
},
getLintErrors: async () => EMPTY_SCRIPT_LINT_RESULT
}
return {
helpers,
getFlow: () => flow,
getModules: () => flow.value.modules,
cleanup: async () => {
if (workspaceRoot) {
unregisterBenchmarkWorkspaceRunnables(workspaceRoot)
}
if (workspaceRoot) {
await rm(workspaceRoot, { recursive: true, force: true })
}
},
workspaceDir: workspaceRoot ?? null
}
}

View File

@@ -0,0 +1,107 @@
import { mkdtemp } from 'fs/promises'
import { tmpdir } from 'os'
import { join } from 'path'
import type { FlowModule } from '$lib/gen'
import type { AIProvider } from '$lib/gen/types.gen'
import type { ExtendedOpenFlow } from '$lib/components/flows/types'
import {
flowTools,
prepareFlowSystemMessage,
prepareFlowUserMessage,
type FlowAIChatHelpers
} from '../../../../../frontend/src/lib/components/copilot/chat/flow/core'
import type { Tool as ProductionTool } from '../../../../../frontend/src/lib/components/copilot/chat/shared'
import { createFlowFileHelpers, type FlowWorkspaceFixtures } from './fileHelpers'
import { runEval } from '../shared'
import type { ModeRunContext } from '../../../../core/types'
import type { TokenUsage } from '../shared/types'
export interface FlowFixture {
value?: {
modules?: FlowModule[]
preprocessor_module?: FlowModule
failure_module?: FlowModule
}
schema?: Record<string, unknown>
}
export interface FlowEvalResult {
success: boolean
flow: ExtendedOpenFlow
error?: string
assistantMessageCount: number
toolCallCount: number
toolsUsed: string[]
tokenUsage: TokenUsage
}
export interface FlowEvalOptions {
initialFlow?: FlowFixture
workspaceFixtures?: FlowWorkspaceFixtures
model?: string
maxIterations?: number
provider?: AIProvider
workspaceRoot?: string
runContext?: ModeRunContext
}
export async function runFlowEval(
userPrompt: string,
apiKey: string,
options?: FlowEvalOptions
): Promise<FlowEvalResult> {
const workspaceRoot =
options?.workspaceRoot ??
(await mkdtemp(join(tmpdir(), 'wmill-frontend-flow-benchmark-')))
const { helpers, getFlow, cleanup } = await createFlowFileHelpers(
options?.initialFlow?.value?.modules ?? [],
options?.initialFlow?.schema,
options?.initialFlow?.value?.preprocessor_module,
options?.initialFlow?.value?.failure_module,
workspaceRoot,
options?.workspaceFixtures
)
try {
const systemMessage = prepareFlowSystemMessage()
const tools = flowTools as ProductionTool<FlowAIChatHelpers>[]
const model = options?.model ?? 'claude-haiku-4-5-20251001'
const userMessage = prepareFlowUserMessage(
userPrompt,
helpers.getFlowAndSelectedId(),
[],
helpers.inlineScriptSession
)
const rawResult = await runEval({
userPrompt,
systemMessage,
userMessage,
tools,
helpers,
apiKey,
getOutput: getFlow,
onAssistantMessageStart: options?.runContext?.onAssistantMessageStart,
onAssistantToken: options?.runContext?.onAssistantChunk,
onAssistantMessageEnd: options?.runContext?.onAssistantMessageEnd,
options: {
maxIterations: options?.maxIterations,
model,
workspace: workspaceRoot,
provider: options?.provider
}
})
return {
flow: rawResult.output,
success: rawResult.success,
error: rawResult.error,
assistantMessageCount: rawResult.iterations,
toolCallCount: rawResult.toolCallsCount,
toolsUsed: rawResult.toolsCalled,
tokenUsage: rawResult.tokenUsage
}
} finally {
await cleanup()
}
}

View File

@@ -0,0 +1,73 @@
import { mkdir, rm, writeFile } from 'fs/promises'
import { dirname, join } from 'path'
import type { ScriptLang } from '../../../../../frontend/src/lib/gen/types.gen'
import type { ReviewChangesOpts } from '../../../../../frontend/src/lib/components/copilot/chat/monaco-adapter'
import type { ScriptChatHelpers } from '../../../../../frontend/src/lib/components/copilot/chat/script/core'
import { buildScriptLintResult } from './preview'
import { registerBenchmarkWorkspace, unregisterBenchmarkWorkspace } from '../../mockBackend'
export interface ScriptEvalState {
code: string
lang: ScriptLang | 'bunnative'
path: string
args: Record<string, any>
}
export async function createScriptFileHelpers(
initialScript: ScriptEvalState,
workspaceRoot?: string
): Promise<{
helpers: ScriptChatHelpers
getScript: () => ScriptEvalState
cleanup: () => Promise<void>
workspaceDir: string | null
}> {
let script = structuredClone(initialScript)
const scriptFilePath = workspaceRoot ? join(workspaceRoot, script.path) : null
async function persistScript(): Promise<void> {
if (!scriptFilePath) {
return
}
await mkdir(dirname(scriptFilePath), { recursive: true })
await writeFile(scriptFilePath, script.code, 'utf8')
}
await persistScript()
if (workspaceRoot) {
registerBenchmarkWorkspace(workspaceRoot)
}
const helpers: ScriptChatHelpers = {
getScriptOptions: () => ({
code: script.code,
lang: script.lang,
path: script.path,
args: structuredClone(script.args)
}),
applyCode: async (code: string, opts?: ReviewChangesOpts) => {
if (opts?.mode === 'revert') {
return
}
script = {
...script,
code
}
await persistScript()
},
getLintErrors: () => buildScriptLintResult(script.code, script.lang)
}
return {
helpers,
getScript: () => structuredClone(script),
cleanup: async () => {
if (workspaceRoot) {
unregisterBenchmarkWorkspace(workspaceRoot)
await rm(workspaceRoot, { recursive: true, force: true })
}
},
workspaceDir: workspaceRoot ?? null
}
}

View File

@@ -0,0 +1,96 @@
import ts from 'typescript'
import type { ScriptLang } from '../../../../../frontend/src/lib/gen/types.gen'
import type { ScriptLintResult } from '../../../../../frontend/src/lib/components/copilot/chat/shared'
export type ScriptPreviewLanguage = ScriptLang | 'bunnative'
const TS_LIKE_LANGUAGES = new Set<ScriptPreviewLanguage>(['bun', 'deno', 'nativets', 'bunnative'])
const JS_LIKE_LANGUAGES = new Set<ScriptPreviewLanguage>(['bun', 'deno', 'nativets', 'bunnative'])
function hasSupportedEntrypoint(code: string): boolean {
return (
/export\s+(async\s+)?function\s+main\s*\(/.test(code) ||
/export\s+(async\s+)?function\s+preprocessor\s*\(/.test(code)
)
}
function compilerOptionsForLanguage(lang: ScriptPreviewLanguage): ts.CompilerOptions | null {
if (!TS_LIKE_LANGUAGES.has(lang)) {
return null
}
return {
target: ts.ScriptTarget.ES2022,
module: ts.ModuleKind.ESNext,
moduleResolution: ts.ModuleResolutionKind.Bundler,
noEmit: true,
allowJs: true,
checkJs: false,
strict: false,
skipLibCheck: true
}
}
function getLineAndColumn(sourceText: string, start: number): { line: number; column: number } {
const prefix = sourceText.slice(0, Math.max(0, start))
const line = prefix.split('\n').length
const lastNewline = prefix.lastIndexOf('\n')
const column = lastNewline === -1 ? prefix.length + 1 : prefix.length - lastNewline
return { line, column }
}
export function buildScriptLintResult(
code: string,
lang: ScriptPreviewLanguage
): ScriptLintResult {
const diagnostics: ScriptLintResult['errors'] = []
const compilerOptions = compilerOptionsForLanguage(lang)
if (compilerOptions) {
const sourceFile = ts.createSourceFile(
'script.ts',
code,
ts.ScriptTarget.ES2022,
true,
JS_LIKE_LANGUAGES.has(lang) ? ts.ScriptKind.TS : ts.ScriptKind.JS
)
const output = ts.transpileModule(code, {
compilerOptions,
fileName: sourceFile.fileName,
reportDiagnostics: true
})
for (const diagnostic of output.diagnostics ?? []) {
const start = diagnostic.start ?? 0
const length = diagnostic.length ?? 1
const { line, column } = getLineAndColumn(code, start)
const message = ts.flattenDiagnosticMessageText(diagnostic.messageText, '\n')
diagnostics.push({
startLineNumber: line,
startColumn: column,
endLineNumber: line,
endColumn: column + Math.max(1, length),
message,
severity: 8
} as ScriptLintResult['errors'][number])
}
}
if (!hasSupportedEntrypoint(code)) {
diagnostics.push({
startLineNumber: 1,
startColumn: 1,
endLineNumber: 1,
endColumn: 1,
message: 'Script must export a main or preprocessor function.',
severity: 8
} as ScriptLintResult['errors'][number])
}
return {
errorCount: diagnostics.length,
warningCount: 0,
errors: diagnostics,
warnings: []
}
}

View File

@@ -0,0 +1,109 @@
import { mkdtemp } from 'fs/promises'
import { tmpdir } from 'os'
import { join } from 'path'
import type { AIProvider, AIProviderModel, ScriptLang } from '$lib/gen/types.gen'
import type { ContextElement } from '../../../../../frontend/src/lib/components/copilot/chat/context'
import {
prepareScriptSystemMessage,
prepareScriptTools,
prepareScriptUserMessage,
type ScriptChatHelpers
} from '../../../../../frontend/src/lib/components/copilot/chat/script/core'
import type { Tool as ProductionTool } from '../../../../../frontend/src/lib/components/copilot/chat/shared'
import { createScriptFileHelpers, type ScriptEvalState } from './fileHelpers'
import { runEval } from '../shared'
import type { ModeRunContext } from '../../../../core/types'
import type { TokenUsage } from '../shared/types'
export interface ScriptEvalResult {
success: boolean
script: ScriptEvalState
error?: string
assistantMessageCount: number
toolCallCount: number
toolsUsed: string[]
tokenUsage: TokenUsage
}
export interface ScriptEvalOptions {
initialScript: ScriptEvalState
model?: string
maxIterations?: number
provider?: AIProvider
workspaceRoot?: string
runContext?: ModeRunContext
}
function resolveModelProvider(
model: string,
provider?: AIProvider
): AIProviderModel {
if (provider) {
return { provider, model }
}
if (model.startsWith('claude')) {
return { provider: 'anthropic', model }
}
return { provider: 'openai', model }
}
export async function runScriptEval(
userPrompt: string,
apiKey: string,
options: ScriptEvalOptions
): Promise<ScriptEvalResult> {
const workspaceRoot =
options.workspaceRoot ?? (await mkdtemp(join(tmpdir(), 'wmill-frontend-script-benchmark-')))
const { helpers, getScript, cleanup } = await createScriptFileHelpers(
options.initialScript,
workspaceRoot
)
try {
const model = options.model ?? 'claude-haiku-4-5-20251001'
const modelProvider = resolveModelProvider(model, options.provider)
const selectedContext: ContextElement[] = []
const systemMessage = prepareScriptSystemMessage(
modelProvider,
options.initialScript.lang,
{}
)
const tools = prepareScriptTools(
modelProvider,
options.initialScript.lang,
selectedContext
) as ProductionTool<ScriptChatHelpers>[]
const userMessage = prepareScriptUserMessage(userPrompt, selectedContext)
const rawResult = await runEval({
userPrompt,
systemMessage,
userMessage,
tools,
helpers,
apiKey,
getOutput: getScript,
onAssistantMessageStart: options.runContext?.onAssistantMessageStart,
onAssistantToken: options.runContext?.onAssistantChunk,
onAssistantMessageEnd: options.runContext?.onAssistantMessageEnd,
options: {
maxIterations: options.maxIterations,
model,
workspace: workspaceRoot,
provider: modelProvider.provider
}
})
return {
script: rawResult.output,
success: rawResult.success,
error: rawResult.error,
assistantMessageCount: rawResult.iterations,
toolCallCount: rawResult.toolCallsCount,
toolsUsed: rawResult.toolsCalled,
tokenUsage: rawResult.tokenUsage
}
} finally {
await cleanup()
}
}

View File

@@ -1,29 +1,19 @@
import OpenAI from 'openai'
import Anthropic from '@anthropic-ai/sdk'
import type {
ChatCompletionMessageParam,
ChatCompletionSystemMessageParam
} from 'openai/resources/chat/completions.mjs'
import type { AIProvider, AIProviderModel } from '$lib/gen/types.gen'
import type { TokenUsage, ToolCallDetail, EvalRunnerOptions } from './types'
import type { Tool } from './baseVariants'
import { runChatLoop, type ChatClients } from '../../chatLoop'
import type { Tool as ProductionTool, ToolCallbacks } from '../../shared'
/**
* Result from a single eval run (before domain-specific evaluation).
*/
export interface RawEvalResult<TOutput> {
success: boolean
output: TOutput
error?: string
tokenUsage: TokenUsage
toolCallsCount: number
toolsCalled: string[]
toolCallDetails: ToolCallDetail[]
iterations: number
messages: ChatCompletionMessageParam[]
}
import type { AIProviderModel } from '$lib/gen/types.gen'
import type { TokenUsage, ToolCallDetail, EvalRunnerOptions, RawEvalResult } from './types'
import { runChatLoop, type ChatClients } from '../../../../../frontend/src/lib/components/copilot/chat/chatLoop'
import type {
Tool as ProductionTool,
ToolCallbacks
} from '../../../../../frontend/src/lib/components/copilot/chat/shared'
import {
createEvalClients,
type FrontendEvalProvider,
resolveEvalModelProvider
} from './providerConfig'
/**
* Parameters for running a base evaluation.
@@ -38,7 +28,7 @@ export interface RunEvalParams<THelpers, TOutput> {
/** Tool definitions for the LLM API (unused — derived from tools) */
toolDefs?: unknown
/** Full tool implementations for execution */
tools: Tool<THelpers>[]
tools: ProductionTool<THelpers>[]
/** Domain-specific helpers for tool execution */
helpers: THelpers
/** API key for the provider */
@@ -47,35 +37,9 @@ export interface RunEvalParams<THelpers, TOutput> {
getOutput: () => TOutput
/** Optional configuration */
options?: EvalRunnerOptions
}
/**
* Creates SDK clients for the given provider.
*/
function createEvalClients(provider: AIProvider, apiKey: string): ChatClients {
if (provider === 'anthropic') {
return {
openai: new OpenAI({ apiKey: 'unused' }),
anthropic: new Anthropic({ apiKey })
}
}
return {
openai: new OpenAI({ apiKey }),
anthropic: new Anthropic({ apiKey: 'unused' })
}
}
/**
* Resolves model string to AIProviderModel.
*/
function resolveModelProvider(
model: string,
provider?: AIProvider
): AIProviderModel {
if (provider) return { provider, model }
if (model.startsWith('claude')) return { provider: 'anthropic', model }
if (model.startsWith('gpt') || model.startsWith('o')) return { provider: 'openai', model }
return { provider: 'openai', model }
onAssistantMessageStart?: () => void
onAssistantToken?: (token: string) => void
onAssistantMessageEnd?: () => void
}
/**
@@ -92,16 +56,23 @@ export async function runEval<THelpers, TOutput>(
helpers,
apiKey,
getOutput,
options
options,
onAssistantMessageStart,
onAssistantToken,
onAssistantMessageEnd
} = params
let shouldEmitMessageStart = true
const model = options?.model ?? 'gpt-4o'
const maxIterations = options?.maxIterations ?? 20
const workspace = options?.workspace ?? 'test-workspace'
const provider = options?.provider
const modelProvider = resolveModelProvider(model, provider)
const clients = createEvalClients(modelProvider.provider, apiKey)
const modelProvider = resolveEvalModelProvider(
model,
provider as FrontendEvalProvider | undefined
) as AIProviderModel
const clients = createEvalClients(modelProvider.provider, apiKey) as ChatClients
const messages: ChatCompletionMessageParam[] = [userMessage]
let toolCallsCount = 0
@@ -128,7 +99,7 @@ export async function runEval<THelpers, TOutput>(
}
return tool.fn(p)
}
})) as ProductionTool<THelpers>[]
}))
// No-op callbacks for eval
const callbacks: ToolCallbacks & {
@@ -137,8 +108,19 @@ export async function runEval<THelpers, TOutput>(
} = {
setToolStatus: () => {},
removeToolStatus: () => {},
onNewToken: () => {},
onMessageEnd: () => {}
onNewToken: (token: string) => {
if (shouldEmitMessageStart) {
onAssistantMessageStart?.()
shouldEmitMessageStart = false
}
onAssistantToken?.(token)
},
onMessageEnd: () => {
if (!shouldEmitMessageStart) {
onAssistantMessageEnd?.()
}
shouldEmitMessageStart = true
}
}
const abortController = new AbortController()
@@ -161,7 +143,7 @@ export async function runEval<THelpers, TOutput>(
return {
success: true,
output: getOutput(),
tokenUsage: { prompt: 0, completion: 0, total: 0 },
tokenUsage: result.tokenUsage,
toolCallsCount,
toolsCalled,
toolCallDetails,

View File

@@ -0,0 +1,3 @@
export type { TokenUsage, ToolCallDetail, EvalRunnerOptions, RawEvalResult } from './types'
export type { RunEvalParams } from './baseEvalRunner'
export { runEval } from './baseEvalRunner'

View File

@@ -0,0 +1,41 @@
import { describe, expect, it } from "bun:test";
import {
buildOpenAICompatibleClientOptions,
resolveEvalModelProvider,
} from "./providerConfig";
describe("buildOpenAICompatibleClientOptions", () => {
it("adds Gemini's OpenAI-compatible base URL and client header", () => {
const options = buildOpenAICompatibleClientOptions("googleai", "gemini-test-key");
expect(options).toMatchObject({
apiKey: "gemini-test-key",
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
defaultHeaders: {
"x-goog-api-client": "windmill-ai-evals/1.0",
},
});
});
it("keeps the default OpenAI-compatible config for OpenAI", () => {
expect(buildOpenAICompatibleClientOptions("openai", "openai-test-key")).toEqual({
apiKey: "openai-test-key",
});
});
});
describe("resolveEvalModelProvider", () => {
it("infers googleai from Gemini model ids", () => {
expect(resolveEvalModelProvider("gemini-2.5-flash")).toEqual({
provider: "googleai",
model: "gemini-2.5-flash",
});
});
it("preserves an explicit provider", () => {
expect(resolveEvalModelProvider("gemini-2.5-pro", "googleai")).toEqual({
provider: "googleai",
model: "gemini-2.5-pro",
});
});
});

View File

@@ -0,0 +1,71 @@
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
import type { FrontendEvalModelConfig } from "../../../../core/models";
export type FrontendEvalProvider = FrontendEvalModelConfig["provider"];
export interface EvalClients {
openai: OpenAI;
anthropic: Anthropic;
}
export interface ResolvedEvalModelProvider {
provider: FrontendEvalProvider;
model: string;
}
const GEMINI_OPENAI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/";
const GEMINI_GOOG_API_CLIENT = "windmill-ai-evals/1.0";
export function buildOpenAICompatibleClientOptions(
provider: Exclude<FrontendEvalProvider, "anthropic">,
apiKey: string
): ConstructorParameters<typeof OpenAI>[0] {
if (provider === "googleai") {
return {
apiKey,
baseURL: GEMINI_OPENAI_BASE_URL,
defaultHeaders: {
"x-goog-api-client": GEMINI_GOOG_API_CLIENT,
},
};
}
return { apiKey };
}
export function createEvalClients(
provider: FrontendEvalProvider,
apiKey: string
): EvalClients {
if (provider === "anthropic") {
return {
openai: new OpenAI({ apiKey: "unused" }),
anthropic: new Anthropic({ apiKey }),
};
}
return {
openai: new OpenAI(buildOpenAICompatibleClientOptions(provider, apiKey)),
anthropic: new Anthropic({ apiKey: "unused" }),
};
}
export function resolveEvalModelProvider(
model: string,
provider?: FrontendEvalProvider
): ResolvedEvalModelProvider {
if (provider) {
return { provider, model };
}
if (model.startsWith("claude")) {
return { provider: "anthropic", model };
}
if (model.startsWith("gemini")) {
return { provider: "googleai", model };
}
if (model.startsWith("gpt") || model.startsWith("o")) {
return { provider: "openai", model };
}
return { provider: "openai", model };
}

View File

@@ -0,0 +1,32 @@
import type { ChatCompletionMessageParam } from 'openai/resources/chat/completions.mjs'
import type { AIProvider } from '$lib/gen/types.gen'
export interface TokenUsage {
prompt: number
completion: number
total: number
}
export interface ToolCallDetail {
name: string
arguments: Record<string, unknown>
}
export interface EvalRunnerOptions {
maxIterations?: number
model?: string
workspace?: string
provider?: AIProvider
}
export interface RawEvalResult<TOutput> {
success: boolean
output: TOutput
error?: string
tokenUsage: TokenUsage
toolCallsCount: number
toolsCalled: string[]
toolCallDetails: ToolCallDetail[]
iterations: number
messages: ChatCompletionMessageParam[]
}

View File

@@ -0,0 +1,270 @@
import { randomUUID } from 'node:crypto'
import type { CompletedJob, Flow, Script } from '../../../frontend/src/lib/gen'
import type { ScriptLang } from '../../../frontend/src/lib/gen/types.gen'
import { buildScriptLintResult } from './core/script/preview'
const BENCHMARK_TIMESTAMP = '1970-01-01T00:00:00.000Z'
export interface BenchmarkWorkspaceScript {
path: string
summary: string
description?: string
language: Script['language']
schema?: Record<string, unknown>
content: string
}
export interface BenchmarkWorkspaceFlow {
path: string
summary: string
description?: string
schema?: Record<string, unknown>
value: Flow['value']
}
export interface BenchmarkWorkspaceRunnables {
scripts?: BenchmarkWorkspaceScript[]
flows?: BenchmarkWorkspaceFlow[]
}
type BenchmarkCompletedJob = CompletedJob & { type: 'CompletedJob' }
const benchmarkWorkspaces = new Set<string>()
const benchmarkWorkspaceRunnables = new Map<string, BenchmarkWorkspaceRunnables>()
const benchmarkJobs = new Map<string, { workspace: string; job: BenchmarkCompletedJob }>()
export function resetBenchmarkMockBackend(): void {
benchmarkWorkspaces.clear()
benchmarkWorkspaceRunnables.clear()
benchmarkJobs.clear()
}
export function registerBenchmarkWorkspace(workspace: string): void {
benchmarkWorkspaces.add(workspace)
}
export function registerBenchmarkWorkspaceRunnables(
workspace: string,
runnables: BenchmarkWorkspaceRunnables
): void {
benchmarkWorkspaces.add(workspace)
benchmarkWorkspaceRunnables.set(workspace, runnables)
}
export function unregisterBenchmarkWorkspace(workspace: string): void {
benchmarkWorkspaces.delete(workspace)
benchmarkWorkspaceRunnables.delete(workspace)
for (const [jobId, entry] of benchmarkJobs.entries()) {
if (entry.workspace === workspace) {
benchmarkJobs.delete(jobId)
}
}
}
export function unregisterBenchmarkWorkspaceRunnables(workspace: string): void {
unregisterBenchmarkWorkspace(workspace)
}
export function hasBenchmarkWorkspace(workspace: string): boolean {
return benchmarkWorkspaces.has(workspace)
}
export function listBenchmarkScripts(workspace: string): Script[] | null {
const runnables = benchmarkWorkspaceRunnables.get(workspace)
if (!runnables) {
return null
}
return (runnables.scripts ?? []).map(buildBenchmarkScript)
}
export function listBenchmarkFlows(workspace: string): Flow[] | null {
const runnables = benchmarkWorkspaceRunnables.get(workspace)
if (!runnables) {
return null
}
return (runnables.flows ?? []).map(buildBenchmarkFlow)
}
export function getBenchmarkScriptByPath(workspace: string, path: string): Script | null {
const script = benchmarkWorkspaceRunnables
.get(workspace)
?.scripts?.find((entry) => entry.path === path)
return script ? buildBenchmarkScript(script) : null
}
export function getBenchmarkScriptByHash(workspace: string, hash: string): Script | null {
const script = benchmarkWorkspaceRunnables
.get(workspace)
?.scripts?.find((entry) => buildBenchmarkScriptHash(entry.path) === hash)
return script ? buildBenchmarkScript(script) : null
}
export function getBenchmarkFlowByPath(workspace: string, path: string): Flow | null {
const flow = benchmarkWorkspaceRunnables
.get(workspace)
?.flows?.find((entry) => entry.path === path)
return flow ? buildBenchmarkFlow(flow) : null
}
export function createBenchmarkCompletedJob(input: {
workspace: string
jobKind: CompletedJob['job_kind']
success?: boolean
result?: unknown
logs?: string
scriptPath?: string
scriptHash?: string
args?: Record<string, unknown>
}): string {
const jobId = `benchmark-job-${randomUUID()}`
const now = new Date().toISOString()
const job: BenchmarkCompletedJob = {
type: 'CompletedJob',
id: jobId,
workspace_id: input.workspace,
created_by: 'ai-evals',
created_at: now,
started_at: now,
completed_at: now,
duration_ms: 0,
success: input.success ?? true,
script_path: input.scriptPath,
script_hash: input.scriptHash,
args: input.args,
result: input.result,
logs: input.logs,
canceled: false,
job_kind: input.jobKind,
permissioned_as: 'u/ai-evals',
is_flow_step: false,
is_skipped: false,
email: 'ai-evals@local',
visible_to_owner: true,
tag: 'benchmark'
}
benchmarkJobs.set(jobId, { workspace: input.workspace, job })
return jobId
}
export function getBenchmarkCompletedJob(
workspace: string,
jobId: string
): BenchmarkCompletedJob | null {
const entry = benchmarkJobs.get(jobId)
if (!entry || entry.workspace !== workspace) {
return null
}
return structuredClone(entry.job)
}
export function runBenchmarkScriptPreview(input: {
workspace: string
requestBody: {
content?: string
language?: ScriptLang | 'bunnative'
args?: Record<string, unknown>
path?: string
}
}): string {
const content = input.requestBody.content ?? ''
const language = input.requestBody.language ?? 'bun'
const lintResult = buildScriptLintResult(content, language)
const success = lintResult.errorCount === 0
return createBenchmarkCompletedJob({
workspace: input.workspace,
jobKind: 'preview',
success,
scriptPath: input.requestBody.path,
args: input.requestBody.args,
result: success
? {
path: input.requestBody.path,
args: input.requestBody.args ?? {},
validated: true
}
: {
path: input.requestBody.path,
args: input.requestBody.args ?? {},
errorCount: lintResult.errorCount,
errors: lintResult.errors.map((entry) => ({
line: entry.startLineNumber,
message: entry.message
}))
}
})
}
export function runBenchmarkFlowByPath(input: {
workspace: string
path: string
args?: Record<string, unknown>
}): string {
const flow = getBenchmarkFlowByPath(input.workspace, input.path)
return createBenchmarkCompletedJob({
workspace: input.workspace,
jobKind: 'flowpreview',
success: flow !== null,
args: input.args,
result:
flow !== null
? {
path: input.path,
args: input.args ?? {},
mocked: true
}
: {
error: `Flow "${input.path}" not found in benchmark workspace`
},
logs:
flow !== null
? 'Mock benchmark flow run completed successfully.'
: `Flow "${input.path}" not found in benchmark workspace.`
})
}
function buildBenchmarkScriptHash(path: string): string {
return `benchmark:${path}`
}
function buildBenchmarkScript(script: BenchmarkWorkspaceScript): Script {
return {
workspace_id: 'benchmark',
hash: buildBenchmarkScriptHash(script.path),
path: script.path,
parent_hashes: [],
summary: script.summary,
description: script.description ?? '',
content: script.content,
created_by: 'benchmark',
created_at: BENCHMARK_TIMESTAMP,
archived: false,
schema: script.schema ?? {},
deleted: false,
is_template: false,
extra_perms: {},
language: script.language,
kind: 'script',
starred: false,
has_preprocessor: false,
modules: null
}
}
function buildBenchmarkFlow(flow: BenchmarkWorkspaceFlow): Flow {
return {
path: flow.path,
summary: flow.summary,
description: flow.description ?? '',
value: flow.value,
schema: flow.schema ?? {},
edited_by: 'benchmark',
edited_at: BENCHMARK_TIMESTAMP,
archived: false,
extra_perms: {}
} as Flow
}

View File

@@ -0,0 +1,133 @@
export type FrontendBenchmarkProgressSurface = 'flow' | 'app' | 'script'
export type FrontendBenchmarkProgressEvent =
| {
type: 'run-start'
surface: FrontendBenchmarkProgressSurface
totalCases: number
runs: number
concurrency: number
}
| {
type: 'attempt-start'
surface: FrontendBenchmarkProgressSurface
caseId: string
caseNumber: number
totalCases: number
attempt: number
runs: number
}
| {
type: 'attempt-finish'
surface: FrontendBenchmarkProgressSurface
caseId: string
caseNumber: number
totalCases: number
attempt: number
runs: number
passed: boolean
durationMs: number
judgeScore: number | null
error: string | null
}
| {
type: 'assistant-message-start'
surface: FrontendBenchmarkProgressSurface
caseId: string
caseNumber: number
totalCases: number
attempt: number
runs: number
}
| {
type: 'assistant-chunk'
surface: FrontendBenchmarkProgressSurface
caseId: string
caseNumber: number
totalCases: number
attempt: number
runs: number
chunk: string
}
| {
type: 'assistant-message-end'
surface: FrontendBenchmarkProgressSurface
caseId: string
caseNumber: number
totalCases: number
attempt: number
runs: number
}
export const FRONTEND_BENCHMARK_PROGRESS_PREFIX = 'WMILL_FRONTEND_AI_EVAL_PROGRESS '
export function emitFrontendBenchmarkProgress(event: FrontendBenchmarkProgressEvent): void {
process.stderr.write(
`${FRONTEND_BENCHMARK_PROGRESS_PREFIX}${JSON.stringify(event)}\n`
)
}
export function parseFrontendBenchmarkProgressLine(
line: string
): FrontendBenchmarkProgressEvent | null {
if (!line.startsWith(FRONTEND_BENCHMARK_PROGRESS_PREFIX)) {
return null
}
try {
const parsed = JSON.parse(
line.slice(FRONTEND_BENCHMARK_PROGRESS_PREFIX.length)
) as FrontendBenchmarkProgressEvent
return parsed?.type ? parsed : null
} catch {
return null
}
}
export function formatFrontendBenchmarkProgressEvent(
event: FrontendBenchmarkProgressEvent
): string {
switch (event.type) {
case 'run-start':
return `Running ${event.surface}: ${event.totalCases} cases x ${event.runs} run${event.runs === 1 ? '' : 's'}, concurrency ${event.concurrency}`
case 'attempt-start':
return `${formatCasePrefix(event.caseNumber, event.totalCases)} ${event.caseId} attempt ${event.attempt}/${event.runs}...`
case 'attempt-finish': {
const parts = [
`${formatCasePrefix(event.caseNumber, event.totalCases)} ${event.caseId} attempt ${event.attempt}/${event.runs} ${event.passed ? 'pass' : 'fail'}`,
formatDuration(event.durationMs)
]
if (event.judgeScore !== null) {
parts.push(`judge ${formatNumber(event.judgeScore)}`)
}
if (event.error) {
parts.push(truncateSingleLine(event.error, 120))
}
return parts.join(' | ')
}
case 'assistant-message-start':
case 'assistant-chunk':
case 'assistant-message-end':
return ''
}
}
function formatCasePrefix(caseNumber: number, totalCases: number): string {
return `[${caseNumber}/${totalCases}]`
}
function formatDuration(durationMs: number): string {
return `${formatNumber(durationMs / 1000)}s`
}
function formatNumber(value: number): string {
return Number.isInteger(value) ? String(value) : value.toFixed(1)
}
function truncateSingleLine(value: string, maxLength: number): string {
const normalized = value.replace(/\s+/g, ' ').trim()
if (normalized.length <= maxLength) {
return normalized
}
return `${normalized.slice(0, Math.max(0, maxLength - 3))}...`
}

View File

@@ -0,0 +1,218 @@
import { spawn } from 'node:child_process'
import { mkdtemp, readFile, rm } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import path from 'node:path'
import { fileURLToPath } from 'node:url'
import {
formatFrontendBenchmarkProgressEvent,
parseFrontendBenchmarkProgressLine
} from './progress'
import type { BenchmarkRunResult } from '../../core/types'
const REPO_ROOT = fileURLToPath(new URL('../../../', import.meta.url))
const FRONTEND_DIR = path.join(REPO_ROOT, 'frontend')
const FRONTEND_BENCHMARK_TEST = '../ai_evals/adapters/frontend/vitestAdapter.test.ts'
const FRONTEND_BENCHMARK_CONFIG = '../ai_evals/adapters/frontend/vitest.config.ts'
export type FrontendMode = 'flow' | 'app' | 'script'
export async function runFrontendBenchmarkAdapter(input: {
mode: FrontendMode
caseIds: string[]
runs: number
model?: string
verbose?: boolean
backendValidation?: string
}): Promise<BenchmarkRunResult> {
const tempDir = await mkdtemp(path.join(tmpdir(), 'wmill-frontend-benchmark-'))
const outputPath = path.join(tempDir, 'result.json')
try {
await runVitestBenchmark(
path.join(FRONTEND_DIR, 'node_modules', '.bin', 'vitest'),
[
'run',
FRONTEND_BENCHMARK_TEST,
'--project',
'server',
'--config',
FRONTEND_BENCHMARK_CONFIG
],
{
cwd: FRONTEND_DIR,
env: {
...process.env,
BROWSERSLIST_IGNORE_OLD_DATA: '1',
WMILL_FRONTEND_AI_EVAL_OUTPUT_PATH: outputPath,
WMILL_FRONTEND_AI_EVAL_MODE: input.mode,
WMILL_FRONTEND_AI_EVAL_CASE_IDS: JSON.stringify(input.caseIds),
WMILL_FRONTEND_AI_EVAL_RUNS: String(input.runs),
WMILL_FRONTEND_AI_EVAL_MODEL: input.model ?? "",
WMILL_FRONTEND_AI_EVAL_PROGRESS: '1',
WMILL_FRONTEND_AI_EVAL_VERBOSE: input.verbose ? '1' : '0',
WMILL_FRONTEND_AI_EVAL_BACKEND_VALIDATION: input.backendValidation ?? ''
}
}
)
const raw = await readFile(outputPath, 'utf8')
return JSON.parse(raw) as BenchmarkRunResult
} catch (error) {
throw new Error(`Frontend benchmark adapter failed:\n${toErrorMessage(error)}`)
} finally {
await rm(tempDir, { recursive: true, force: true })
}
}
async function runVitestBenchmark(
command: string,
args: string[],
options: {
cwd: string
env: NodeJS.ProcessEnv
}
): Promise<void> {
const child = spawn(command, args, {
cwd: options.cwd,
env: options.env,
stdio: ['ignore', 'pipe', 'pipe']
})
let stdout = ''
let stderr = ''
let stderrLineBuffer = ''
let assistantStreamOpen = false
child.stdout?.setEncoding('utf8')
child.stdout?.on('data', (chunk: string) => {
stdout += chunk
})
child.stderr?.setEncoding('utf8')
child.stderr?.on('data', (chunk: string) => {
stderrLineBuffer += chunk
const { remainder, passthrough, nextAssistantStreamOpen } = drainProgressLines(
stderrLineBuffer,
assistantStreamOpen
)
stderrLineBuffer = remainder
stderr += passthrough
assistantStreamOpen = nextAssistantStreamOpen
})
await new Promise<void>((resolve, reject) => {
child.once('error', reject)
child.once('close', (code) => {
if (stderrLineBuffer.length > 0) {
const {
remainder,
passthrough,
nextAssistantStreamOpen
} = drainProgressLines(`${stderrLineBuffer}\n`, assistantStreamOpen)
stderrLineBuffer = remainder
stderr += passthrough
assistantStreamOpen = nextAssistantStreamOpen
}
if (code === 0) {
if (assistantStreamOpen) {
process.stderr.write('\n')
}
resolve()
return
}
const details = [`vitest exited with code ${code}`, stdout, stderr].filter(Boolean).join('\n')
reject(new Error(details))
})
})
}
function drainProgressLines(buffer: string): {
remainder: string
passthrough: string
nextAssistantStreamOpen: boolean
}
function drainProgressLines(
buffer: string,
initialAssistantStreamOpen: boolean
): {
remainder: string
passthrough: string
nextAssistantStreamOpen: boolean
} {
let remainder = buffer
let passthrough = ''
let assistantStreamOpen = initialAssistantStreamOpen
while (true) {
const newlineIndex = remainder.indexOf('\n')
if (newlineIndex === -1) {
return { remainder, passthrough, nextAssistantStreamOpen: assistantStreamOpen }
}
const line = remainder.slice(0, newlineIndex).replace(/\r$/, '')
remainder = remainder.slice(newlineIndex + 1)
const progressEvent = parseFrontendBenchmarkProgressLine(line)
if (progressEvent) {
if (progressEvent.type === 'assistant-message-start') {
if (assistantStreamOpen) {
process.stderr.write('\n')
}
process.stderr.write(
`${formatCasePrefix(progressEvent.caseNumber, progressEvent.totalCases)} ${progressEvent.caseId} attempt ${progressEvent.attempt}/${progressEvent.runs} assistant:\n`
)
assistantStreamOpen = true
continue
}
if (progressEvent.type === 'assistant-chunk') {
process.stderr.write(progressEvent.chunk)
continue
}
if (progressEvent.type === 'assistant-message-end') {
if (assistantStreamOpen) {
process.stderr.write('\n')
}
assistantStreamOpen = false
continue
}
if (assistantStreamOpen) {
process.stderr.write('\n')
assistantStreamOpen = false
}
process.stderr.write(`${formatFrontendBenchmarkProgressEvent(progressEvent)}\n`)
continue
}
if (shouldSuppressFrontendStderrLine(line)) {
continue
}
passthrough += `${line}\n`
process.stderr.write(`${line}\n`)
}
}
function formatCasePrefix(caseNumber: number, totalCases: number): string {
return `[${caseNumber}/${totalCases}]`
}
function shouldSuppressFrontendStderrLine(line: string): boolean {
return (
line.startsWith('[baseline-browser-mapping] ') ||
line.startsWith('Browserslist: browsers data (caniuse-lite) is ') ||
line.includes('update-browserslist-db@latest') ||
line.includes('update-db#readme')
)
}
function toErrorMessage(error: unknown): string {
if (error instanceof Error) {
return error.message
}
return String(error)
}

View File

@@ -0,0 +1,28 @@
import { fileURLToPath } from 'node:url'
import frontendConfig from '../../../frontend/vite.config.js'
const FRONTEND_VITE_CONFIG_PATH = fileURLToPath(new URL('../../../frontend/vite.config.js', import.meta.url))
const FRONTEND_TEST_SETUP_PATH = fileURLToPath(
new URL('../../../frontend/src/lib/test-setup.ts', import.meta.url)
)
const ADAPTER_TEST_PATH = fileURLToPath(new URL('./vitestAdapter.test.ts', import.meta.url))
const config = {
...frontendConfig,
test: {
...frontendConfig.test,
projects: [
{
extends: FRONTEND_VITE_CONFIG_PATH,
test: {
name: 'server',
environment: 'node',
include: [ADAPTER_TEST_PATH],
setupFiles: [FRONTEND_TEST_SETUP_PATH]
}
}
]
}
}
export default config

View File

@@ -0,0 +1,165 @@
import { expect, it, vi } from 'vitest'
// @ts-ignore - Node.js fs/promises
import { mkdir, writeFile } from 'fs/promises'
// @ts-ignore - Node.js path
import { dirname, resolve } from 'path'
vi.mock('monaco-editor', () => ({
editor: {},
languages: {},
KeyCode: {},
Uri: {
parse: (value: string) => ({ toString: () => value })
},
MarkerSeverity: {
Error: 8,
Warning: 4,
Info: 2,
Hint: 1
}
}))
vi.mock('@codingame/monaco-vscode-standalone-typescript-language-features', () => ({
getTypeScriptWorker: async () => async () => ({}),
typescriptVersion: 'test'
}))
vi.mock('@codingame/monaco-vscode-languages-service-override', () => ({
default: () => ({})
}))
vi.mock('$lib/components/vscode', () => ({}))
vi.mock('$lib/gen', async () => {
const actual = await vi.importActual<any>('$lib/gen')
const {
getBenchmarkCompletedJob,
getBenchmarkFlowByPath,
getBenchmarkScriptByHash,
getBenchmarkScriptByPath,
hasBenchmarkWorkspace,
listBenchmarkFlows,
listBenchmarkScripts,
runBenchmarkFlowByPath,
runBenchmarkScriptPreview
} = await import('./mockBackend')
function wrapService<T extends object>(target: T, overrides: Record<string, unknown>): T {
return new Proxy(target, {
get(source, property, receiver) {
if (typeof property === 'string' && property in overrides) {
return overrides[property]
}
return Reflect.get(source, property, receiver)
}
})
}
return {
...actual,
ScriptService: wrapService(actual.ScriptService, {
listScripts: async (data: { workspace: string }) =>
hasBenchmarkWorkspace(data.workspace)
? (listBenchmarkScripts(data.workspace) ?? [])
: actual.ScriptService.listScripts(data),
getScriptByPath: async (data: { workspace: string; path: string }) => {
if (hasBenchmarkWorkspace(data.workspace)) {
const script = getBenchmarkScriptByPath(data.workspace, data.path)
if (!script) {
throw new Error(`Script "${data.path}" not found in benchmark workspace`)
}
return script
}
return actual.ScriptService.getScriptByPath(data)
},
getScriptByHash: async (data: { workspace: string; hash: string }) => {
if (hasBenchmarkWorkspace(data.workspace)) {
const script = getBenchmarkScriptByHash(data.workspace, data.hash)
if (!script) {
throw new Error(`Script hash "${data.hash}" not found in benchmark workspace`)
}
return script
}
return actual.ScriptService.getScriptByHash(data)
}
}),
FlowService: wrapService(actual.FlowService, {
listFlows: async (data: { workspace: string }) =>
hasBenchmarkWorkspace(data.workspace)
? (listBenchmarkFlows(data.workspace) ?? [])
: actual.FlowService.listFlows(data),
getFlowByPath: async (data: { workspace: string; path: string }) => {
if (hasBenchmarkWorkspace(data.workspace)) {
const flow = getBenchmarkFlowByPath(data.workspace, data.path)
if (!flow) {
throw new Error(`Flow "${data.path}" not found in benchmark workspace`)
}
return flow
}
return actual.FlowService.getFlowByPath(data)
}
}),
JobService: wrapService(actual.JobService, {
runScriptPreview: async (data: {
workspace: string
requestBody?: {
content?: string
language?: string
args?: Record<string, unknown>
path?: string
}
}) =>
hasBenchmarkWorkspace(data.workspace)
? runBenchmarkScriptPreview({
workspace: data.workspace,
requestBody: data.requestBody ?? {}
})
: actual.JobService.runScriptPreview(data),
runFlowByPath: async (data: {
workspace: string
path: string
requestBody?: Record<string, unknown>
}) =>
hasBenchmarkWorkspace(data.workspace)
? runBenchmarkFlowByPath({
workspace: data.workspace,
path: data.path,
args: data.requestBody
})
: actual.JobService.runFlowByPath(data),
getJob: async (data: { workspace: string; id: string }) => {
if (hasBenchmarkWorkspace(data.workspace)) {
const job = getBenchmarkCompletedJob(data.workspace, data.id)
if (!job) {
throw new Error(`Job "${data.id}" not found in benchmark workspace`)
}
return job
}
return actual.JobService.getJob(data)
}
})
}
})
const benchmarkOutputPath = process.env.WMILL_FRONTEND_AI_EVAL_OUTPUT_PATH
const benchmarkIt = benchmarkOutputPath ? it : it.skip
benchmarkIt(
'runs the frontend benchmark adapter from environment input',
async () => {
const { resetBenchmarkMockBackend } = await import('./mockBackend')
resetBenchmarkMockBackend()
const { runFrontendBenchmarkFromEnv } = await import('./benchmarkRunner')
try {
const payload = await runFrontendBenchmarkFromEnv()
const absoluteOutputPath = resolve(benchmarkOutputPath!)
await mkdir(dirname(absoluteOutputPath), { recursive: true })
await writeFile(absoluteOutputPath, JSON.stringify(payload, null, 2) + '\n', 'utf8')
expect(payload.cases.length).toBeGreaterThan(0)
} finally {
resetBenchmarkMockBackend()
}
},
600_000
)

313
ai_evals/bun.lock Normal file
View File

@@ -0,0 +1,313 @@
{
"lockfileVersion": 1,
"configVersion": 1,
"workspaces": {
"": {
"name": "windmill-ai-evals",
"dependencies": {
"@anthropic-ai/claude-agent-sdk": "^0.2.25",
"@anthropic-ai/sdk": "^0.39.0",
"commander": "^14.0.3",
"openai": "^6.9.1",
"yaml": "^2.8.3",
},
"devDependencies": {
"@types/bun": "latest",
"typescript": "^5.0.0",
},
},
},
"packages": {
"@anthropic-ai/claude-agent-sdk": ["@anthropic-ai/claude-agent-sdk@0.2.87", "", { "dependencies": { "@anthropic-ai/sdk": "^0.74.0", "@modelcontextprotocol/sdk": "^1.27.1" }, "optionalDependencies": { "@img/sharp-darwin-arm64": "^0.34.2", "@img/sharp-darwin-x64": "^0.34.2", "@img/sharp-linux-arm": "^0.34.2", "@img/sharp-linux-arm64": "^0.34.2", "@img/sharp-linux-x64": "^0.34.2", "@img/sharp-linuxmusl-arm64": "^0.34.2", "@img/sharp-linuxmusl-x64": "^0.34.2", "@img/sharp-win32-arm64": "^0.34.2", "@img/sharp-win32-x64": "^0.34.2" }, "peerDependencies": { "zod": "^4.0.0" } }, "sha512-WWmgBPxPhBOvNT0ujI8vPTI2lK+w5YEkEZ/y1mH0EDkK/0kBnxVJNhCtG5vnueiAViwLoUOFn66pbkDiivijdA=="],
"@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.39.0", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" } }, "sha512-eMyDIPRZbt1CCLErRCi3exlAvNkBtRe+kW5vvJyef93PmNr/clstYgHhtvmkxN82nlKgzyGPCyGxrm0JQ1ZIdg=="],
"@babel/runtime": ["@babel/runtime@7.29.2", "", {}, "sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g=="],
"@hono/node-server": ["@hono/node-server@1.19.12", "", { "peerDependencies": { "hono": "^4" } }, "sha512-txsUW4SQ1iilgE0l9/e9VQWmELXifEFvmdA1j6WFh/aFPj99hIntrSsq/if0UWyGVkmrRPKA1wCeP+UCr1B9Uw=="],
"@img/sharp-darwin-arm64": ["@img/sharp-darwin-arm64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-darwin-arm64": "1.2.4" }, "os": "darwin", "cpu": "arm64" }, "sha512-imtQ3WMJXbMY4fxb/Ndp6HBTNVtWCUI0WdobyheGf5+ad6xX8VIDO8u2xE4qc/fr08CKG/7dDseFtn6M6g/r3w=="],
"@img/sharp-darwin-x64": ["@img/sharp-darwin-x64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-darwin-x64": "1.2.4" }, "os": "darwin", "cpu": "x64" }, "sha512-YNEFAF/4KQ/PeW0N+r+aVVsoIY0/qxxikF2SWdp+NRkmMB7y9LBZAVqQ4yhGCm/H3H270OSykqmQMKLBhBJDEw=="],
"@img/sharp-libvips-darwin-arm64": ["@img/sharp-libvips-darwin-arm64@1.2.4", "", { "os": "darwin", "cpu": "arm64" }, "sha512-zqjjo7RatFfFoP0MkQ51jfuFZBnVE2pRiaydKJ1G/rHZvnsrHAOcQALIi9sA5co5xenQdTugCvtb1cuf78Vf4g=="],
"@img/sharp-libvips-darwin-x64": ["@img/sharp-libvips-darwin-x64@1.2.4", "", { "os": "darwin", "cpu": "x64" }, "sha512-1IOd5xfVhlGwX+zXv2N93k0yMONvUlANylbJw1eTah8K/Jtpi15KC+WSiaX/nBmbm2HxRM1gZ0nSdjSsrZbGKg=="],
"@img/sharp-libvips-linux-arm": ["@img/sharp-libvips-linux-arm@1.2.4", "", { "os": "linux", "cpu": "arm" }, "sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A=="],
"@img/sharp-libvips-linux-arm64": ["@img/sharp-libvips-linux-arm64@1.2.4", "", { "os": "linux", "cpu": "arm64" }, "sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw=="],
"@img/sharp-libvips-linux-x64": ["@img/sharp-libvips-linux-x64@1.2.4", "", { "os": "linux", "cpu": "x64" }, "sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw=="],
"@img/sharp-libvips-linuxmusl-arm64": ["@img/sharp-libvips-linuxmusl-arm64@1.2.4", "", { "os": "linux", "cpu": "arm64" }, "sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw=="],
"@img/sharp-libvips-linuxmusl-x64": ["@img/sharp-libvips-linuxmusl-x64@1.2.4", "", { "os": "linux", "cpu": "x64" }, "sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg=="],
"@img/sharp-linux-arm": ["@img/sharp-linux-arm@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-linux-arm": "1.2.4" }, "os": "linux", "cpu": "arm" }, "sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw=="],
"@img/sharp-linux-arm64": ["@img/sharp-linux-arm64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-linux-arm64": "1.2.4" }, "os": "linux", "cpu": "arm64" }, "sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg=="],
"@img/sharp-linux-x64": ["@img/sharp-linux-x64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-linux-x64": "1.2.4" }, "os": "linux", "cpu": "x64" }, "sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ=="],
"@img/sharp-linuxmusl-arm64": ["@img/sharp-linuxmusl-arm64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-linuxmusl-arm64": "1.2.4" }, "os": "linux", "cpu": "arm64" }, "sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg=="],
"@img/sharp-linuxmusl-x64": ["@img/sharp-linuxmusl-x64@0.34.5", "", { "optionalDependencies": { "@img/sharp-libvips-linuxmusl-x64": "1.2.4" }, "os": "linux", "cpu": "x64" }, "sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q=="],
"@img/sharp-win32-arm64": ["@img/sharp-win32-arm64@0.34.5", "", { "os": "win32", "cpu": "arm64" }, "sha512-WQ3AgWCWYSb2yt+IG8mnC6Jdk9Whs7O0gxphblsLvdhSpSTtmu69ZG1Gkb6NuvxsNACwiPV6cNSZNzt0KPsw7g=="],
"@img/sharp-win32-x64": ["@img/sharp-win32-x64@0.34.5", "", { "os": "win32", "cpu": "x64" }, "sha512-+29YMsqY2/9eFEiW93eqWnuLcWcufowXewwSNIT6UwZdUUCrM3oFjMWH/Z6/TMmb4hlFenmfAVbpWeup2jryCw=="],
"@modelcontextprotocol/sdk": ["@modelcontextprotocol/sdk@1.29.0", "", { "dependencies": { "@hono/node-server": "^1.19.9", "ajv": "^8.17.1", "ajv-formats": "^3.0.1", "content-type": "^1.0.5", "cors": "^2.8.5", "cross-spawn": "^7.0.5", "eventsource": "^3.0.2", "eventsource-parser": "^3.0.0", "express": "^5.2.1", "express-rate-limit": "^8.2.1", "hono": "^4.11.4", "jose": "^6.1.3", "json-schema-typed": "^8.0.2", "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", "zod": "^3.25 || ^4.0", "zod-to-json-schema": "^3.25.1" }, "peerDependencies": { "@cfworker/json-schema": "^4.1.1" }, "optionalPeers": ["@cfworker/json-schema"] }, "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ=="],
"@types/bun": ["@types/bun@1.3.11", "", { "dependencies": { "bun-types": "1.3.11" } }, "sha512-5vPne5QvtpjGpsGYXiFyycfpDF2ECyPcTSsFBMa0fraoxiQyMJ3SmuQIGhzPg2WJuWxVBoxWJ2kClYTcw/4fAg=="],
"@types/node": ["@types/node@18.19.130", "", { "dependencies": { "undici-types": "~5.26.4" } }, "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg=="],
"@types/node-fetch": ["@types/node-fetch@2.6.13", "", { "dependencies": { "@types/node": "*", "form-data": "^4.0.4" } }, "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw=="],
"abort-controller": ["abort-controller@3.0.0", "", { "dependencies": { "event-target-shim": "^5.0.0" } }, "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg=="],
"accepts": ["accepts@2.0.0", "", { "dependencies": { "mime-types": "^3.0.0", "negotiator": "^1.0.0" } }, "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng=="],
"agentkeepalive": ["agentkeepalive@4.6.0", "", { "dependencies": { "humanize-ms": "^1.2.1" } }, "sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ=="],
"ajv": ["ajv@8.18.0", "", { "dependencies": { "fast-deep-equal": "^3.1.3", "fast-uri": "^3.0.1", "json-schema-traverse": "^1.0.0", "require-from-string": "^2.0.2" } }, "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A=="],
"ajv-formats": ["ajv-formats@3.0.1", "", { "dependencies": { "ajv": "^8.0.0" } }, "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ=="],
"asynckit": ["asynckit@0.4.0", "", {}, "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="],
"body-parser": ["body-parser@2.2.2", "", { "dependencies": { "bytes": "^3.1.2", "content-type": "^1.0.5", "debug": "^4.4.3", "http-errors": "^2.0.0", "iconv-lite": "^0.7.0", "on-finished": "^2.4.1", "qs": "^6.14.1", "raw-body": "^3.0.1", "type-is": "^2.0.1" } }, "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA=="],
"bun-types": ["bun-types@1.3.11", "", { "dependencies": { "@types/node": "*" } }, "sha512-1KGPpoxQWl9f6wcZh57LvrPIInQMn2TQ7jsgxqpRzg+l0QPOFvJVH7HmvHo/AiPgwXy+/Thf6Ov3EdVn1vOabg=="],
"bytes": ["bytes@3.1.2", "", {}, "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg=="],
"call-bind-apply-helpers": ["call-bind-apply-helpers@1.0.2", "", { "dependencies": { "es-errors": "^1.3.0", "function-bind": "^1.1.2" } }, "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ=="],
"call-bound": ["call-bound@1.0.4", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "get-intrinsic": "^1.3.0" } }, "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg=="],
"combined-stream": ["combined-stream@1.0.8", "", { "dependencies": { "delayed-stream": "~1.0.0" } }, "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg=="],
"commander": ["commander@14.0.3", "", {}, "sha512-H+y0Jo/T1RZ9qPP4Eh1pkcQcLRglraJaSLoyOtHxu6AapkjWVCy2Sit1QQ4x3Dng8qDlSsZEet7g5Pq06MvTgw=="],
"content-disposition": ["content-disposition@1.0.1", "", {}, "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q=="],
"content-type": ["content-type@1.0.5", "", {}, "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA=="],
"cookie": ["cookie@0.7.2", "", {}, "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w=="],
"cookie-signature": ["cookie-signature@1.2.2", "", {}, "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg=="],
"cors": ["cors@2.8.6", "", { "dependencies": { "object-assign": "^4", "vary": "^1" } }, "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw=="],
"cross-spawn": ["cross-spawn@7.0.6", "", { "dependencies": { "path-key": "^3.1.0", "shebang-command": "^2.0.0", "which": "^2.0.1" } }, "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA=="],
"debug": ["debug@4.4.3", "", { "dependencies": { "ms": "^2.1.3" } }, "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA=="],
"delayed-stream": ["delayed-stream@1.0.0", "", {}, "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ=="],
"depd": ["depd@2.0.0", "", {}, "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw=="],
"dunder-proto": ["dunder-proto@1.0.1", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.1", "es-errors": "^1.3.0", "gopd": "^1.2.0" } }, "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A=="],
"ee-first": ["ee-first@1.1.1", "", {}, "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow=="],
"encodeurl": ["encodeurl@2.0.0", "", {}, "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg=="],
"es-define-property": ["es-define-property@1.0.1", "", {}, "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g=="],
"es-errors": ["es-errors@1.3.0", "", {}, "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw=="],
"es-object-atoms": ["es-object-atoms@1.1.1", "", { "dependencies": { "es-errors": "^1.3.0" } }, "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA=="],
"es-set-tostringtag": ["es-set-tostringtag@2.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "get-intrinsic": "^1.2.6", "has-tostringtag": "^1.0.2", "hasown": "^2.0.2" } }, "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA=="],
"escape-html": ["escape-html@1.0.3", "", {}, "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow=="],
"etag": ["etag@1.8.1", "", {}, "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg=="],
"event-target-shim": ["event-target-shim@5.0.1", "", {}, "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ=="],
"eventsource": ["eventsource@3.0.7", "", { "dependencies": { "eventsource-parser": "^3.0.1" } }, "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA=="],
"eventsource-parser": ["eventsource-parser@3.0.6", "", {}, "sha512-Vo1ab+QXPzZ4tCa8SwIHJFaSzy4R6SHf7BY79rFBDf0idraZWAkYrDjDj8uWaSm3S2TK+hJ7/t1CEmZ7jXw+pg=="],
"express": ["express@5.2.1", "", { "dependencies": { "accepts": "^2.0.0", "body-parser": "^2.2.1", "content-disposition": "^1.0.0", "content-type": "^1.0.5", "cookie": "^0.7.1", "cookie-signature": "^1.2.1", "debug": "^4.4.0", "depd": "^2.0.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "finalhandler": "^2.1.0", "fresh": "^2.0.0", "http-errors": "^2.0.0", "merge-descriptors": "^2.0.0", "mime-types": "^3.0.0", "on-finished": "^2.4.1", "once": "^1.4.0", "parseurl": "^1.3.3", "proxy-addr": "^2.0.7", "qs": "^6.14.0", "range-parser": "^1.2.1", "router": "^2.2.0", "send": "^1.1.0", "serve-static": "^2.2.0", "statuses": "^2.0.1", "type-is": "^2.0.1", "vary": "^1.1.2" } }, "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw=="],
"express-rate-limit": ["express-rate-limit@8.3.2", "", { "dependencies": { "ip-address": "10.1.0" }, "peerDependencies": { "express": ">= 4.11" } }, "sha512-77VmFeJkO0/rvimEDuUC5H30oqUC4EyOhyGccfqoLebB0oiEYfM7nwPrsDsBL1gsTpwfzX8SFy2MT3TDyRq+bg=="],
"fast-deep-equal": ["fast-deep-equal@3.1.3", "", {}, "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q=="],
"fast-uri": ["fast-uri@3.1.0", "", {}, "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA=="],
"finalhandler": ["finalhandler@2.1.1", "", { "dependencies": { "debug": "^4.4.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "on-finished": "^2.4.1", "parseurl": "^1.3.3", "statuses": "^2.0.1" } }, "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA=="],
"form-data": ["form-data@4.0.5", "", { "dependencies": { "asynckit": "^0.4.0", "combined-stream": "^1.0.8", "es-set-tostringtag": "^2.1.0", "hasown": "^2.0.2", "mime-types": "^2.1.12" } }, "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w=="],
"form-data-encoder": ["form-data-encoder@1.7.2", "", {}, "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="],
"formdata-node": ["formdata-node@4.4.1", "", { "dependencies": { "node-domexception": "1.0.0", "web-streams-polyfill": "4.0.0-beta.3" } }, "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ=="],
"forwarded": ["forwarded@0.2.0", "", {}, "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow=="],
"fresh": ["fresh@2.0.0", "", {}, "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A=="],
"function-bind": ["function-bind@1.1.2", "", {}, "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA=="],
"get-intrinsic": ["get-intrinsic@1.3.0", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "es-define-property": "^1.0.1", "es-errors": "^1.3.0", "es-object-atoms": "^1.1.1", "function-bind": "^1.1.2", "get-proto": "^1.0.1", "gopd": "^1.2.0", "has-symbols": "^1.1.0", "hasown": "^2.0.2", "math-intrinsics": "^1.1.0" } }, "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ=="],
"get-proto": ["get-proto@1.0.1", "", { "dependencies": { "dunder-proto": "^1.0.1", "es-object-atoms": "^1.0.0" } }, "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g=="],
"gopd": ["gopd@1.2.0", "", {}, "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg=="],
"has-symbols": ["has-symbols@1.1.0", "", {}, "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ=="],
"has-tostringtag": ["has-tostringtag@1.0.2", "", { "dependencies": { "has-symbols": "^1.0.3" } }, "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw=="],
"hasown": ["hasown@2.0.2", "", { "dependencies": { "function-bind": "^1.1.2" } }, "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ=="],
"hono": ["hono@4.12.9", "", {}, "sha512-wy3T8Zm2bsEvxKZM5w21VdHDDcwVS1yUFFY6i8UobSsKfFceT7TOwhbhfKsDyx7tYQlmRM5FLpIuYvNFyjctiA=="],
"http-errors": ["http-errors@2.0.1", "", { "dependencies": { "depd": "~2.0.0", "inherits": "~2.0.4", "setprototypeof": "~1.2.0", "statuses": "~2.0.2", "toidentifier": "~1.0.1" } }, "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ=="],
"humanize-ms": ["humanize-ms@1.2.1", "", { "dependencies": { "ms": "^2.0.0" } }, "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ=="],
"iconv-lite": ["iconv-lite@0.7.2", "", { "dependencies": { "safer-buffer": ">= 2.1.2 < 3.0.0" } }, "sha512-im9DjEDQ55s9fL4EYzOAv0yMqmMBSZp6G0VvFyTMPKWxiSBHUj9NW/qqLmXUwXrrM7AvqSlTCfvqRb0cM8yYqw=="],
"inherits": ["inherits@2.0.4", "", {}, "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ=="],
"ip-address": ["ip-address@10.1.0", "", {}, "sha512-XXADHxXmvT9+CRxhXg56LJovE+bmWnEWB78LB83VZTprKTmaC5QfruXocxzTZ2Kl0DNwKuBdlIhjL8LeY8Sf8Q=="],
"ipaddr.js": ["ipaddr.js@1.9.1", "", {}, "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g=="],
"is-promise": ["is-promise@4.0.0", "", {}, "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ=="],
"isexe": ["isexe@2.0.0", "", {}, "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw=="],
"jose": ["jose@6.2.2", "", {}, "sha512-d7kPDd34KO/YnzaDOlikGpOurfF0ByC2sEV4cANCtdqLlTfBlw2p14O/5d/zv40gJPbIQxfES3nSx1/oYNyuZQ=="],
"json-schema-to-ts": ["json-schema-to-ts@3.1.1", "", { "dependencies": { "@babel/runtime": "^7.18.3", "ts-algebra": "^2.0.0" } }, "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g=="],
"json-schema-traverse": ["json-schema-traverse@1.0.0", "", {}, "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="],
"json-schema-typed": ["json-schema-typed@8.0.2", "", {}, "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA=="],
"math-intrinsics": ["math-intrinsics@1.1.0", "", {}, "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g=="],
"media-typer": ["media-typer@1.1.0", "", {}, "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw=="],
"merge-descriptors": ["merge-descriptors@2.0.0", "", {}, "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g=="],
"mime-db": ["mime-db@1.54.0", "", {}, "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ=="],
"mime-types": ["mime-types@3.0.2", "", { "dependencies": { "mime-db": "^1.54.0" } }, "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A=="],
"ms": ["ms@2.1.3", "", {}, "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="],
"negotiator": ["negotiator@1.0.0", "", {}, "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg=="],
"node-domexception": ["node-domexception@1.0.0", "", {}, "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ=="],
"node-fetch": ["node-fetch@2.7.0", "", { "dependencies": { "whatwg-url": "^5.0.0" }, "peerDependencies": { "encoding": "^0.1.0" }, "optionalPeers": ["encoding"] }, "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A=="],
"object-assign": ["object-assign@4.1.1", "", {}, "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg=="],
"object-inspect": ["object-inspect@1.13.4", "", {}, "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew=="],
"on-finished": ["on-finished@2.4.1", "", { "dependencies": { "ee-first": "1.1.1" } }, "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg=="],
"once": ["once@1.4.0", "", { "dependencies": { "wrappy": "1" } }, "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w=="],
"openai": ["openai@6.34.0", "", { "peerDependencies": { "ws": "^8.18.0", "zod": "^3.25 || ^4.0" }, "optionalPeers": ["ws", "zod"], "bin": { "openai": "bin/cli" } }, "sha512-yEr2jdGf4tVFYG6ohmr3pF6VJuveP0EA/sS8TBx+4Eq5NT10alu5zg2dmxMXMgqpihRDQlFGpRt2XwsGj+Fyxw=="],
"parseurl": ["parseurl@1.3.3", "", {}, "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ=="],
"path-key": ["path-key@3.1.1", "", {}, "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q=="],
"path-to-regexp": ["path-to-regexp@8.4.1", "", {}, "sha512-fvU78fIjZ+SBM9YwCknCvKOUKkLVqtWDVctl0s7xIqfmfb38t2TT4ZU2gHm+Z8xGwgW+QWEU3oQSAzIbo89Ggw=="],
"pkce-challenge": ["pkce-challenge@5.0.1", "", {}, "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ=="],
"proxy-addr": ["proxy-addr@2.0.7", "", { "dependencies": { "forwarded": "0.2.0", "ipaddr.js": "1.9.1" } }, "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg=="],
"qs": ["qs@6.15.0", "", { "dependencies": { "side-channel": "^1.1.0" } }, "sha512-mAZTtNCeetKMH+pSjrb76NAM8V9a05I9aBZOHztWy/UqcJdQYNsf59vrRKWnojAT9Y+GbIvoTBC++CPHqpDBhQ=="],
"range-parser": ["range-parser@1.2.1", "", {}, "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg=="],
"raw-body": ["raw-body@3.0.2", "", { "dependencies": { "bytes": "~3.1.2", "http-errors": "~2.0.1", "iconv-lite": "~0.7.0", "unpipe": "~1.0.0" } }, "sha512-K5zQjDllxWkf7Z5xJdV0/B0WTNqx6vxG70zJE4N0kBs4LovmEYWJzQGxC9bS9RAKu3bgM40lrd5zoLJ12MQ5BA=="],
"require-from-string": ["require-from-string@2.0.2", "", {}, "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw=="],
"router": ["router@2.2.0", "", { "dependencies": { "debug": "^4.4.0", "depd": "^2.0.0", "is-promise": "^4.0.0", "parseurl": "^1.3.3", "path-to-regexp": "^8.0.0" } }, "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ=="],
"safer-buffer": ["safer-buffer@2.1.2", "", {}, "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg=="],
"send": ["send@1.2.1", "", { "dependencies": { "debug": "^4.4.3", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "fresh": "^2.0.0", "http-errors": "^2.0.1", "mime-types": "^3.0.2", "ms": "^2.1.3", "on-finished": "^2.4.1", "range-parser": "^1.2.1", "statuses": "^2.0.2" } }, "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ=="],
"serve-static": ["serve-static@2.2.1", "", { "dependencies": { "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "parseurl": "^1.3.3", "send": "^1.2.0" } }, "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw=="],
"setprototypeof": ["setprototypeof@1.2.0", "", {}, "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw=="],
"shebang-command": ["shebang-command@2.0.0", "", { "dependencies": { "shebang-regex": "^3.0.0" } }, "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA=="],
"shebang-regex": ["shebang-regex@3.0.0", "", {}, "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A=="],
"side-channel": ["side-channel@1.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3", "side-channel-list": "^1.0.0", "side-channel-map": "^1.0.1", "side-channel-weakmap": "^1.0.2" } }, "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw=="],
"side-channel-list": ["side-channel-list@1.0.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3" } }, "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA=="],
"side-channel-map": ["side-channel-map@1.0.1", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3" } }, "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA=="],
"side-channel-weakmap": ["side-channel-weakmap@1.0.2", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3", "side-channel-map": "^1.0.1" } }, "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A=="],
"statuses": ["statuses@2.0.2", "", {}, "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw=="],
"toidentifier": ["toidentifier@1.0.1", "", {}, "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA=="],
"tr46": ["tr46@0.0.3", "", {}, "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="],
"ts-algebra": ["ts-algebra@2.0.0", "", {}, "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw=="],
"type-is": ["type-is@2.0.1", "", { "dependencies": { "content-type": "^1.0.5", "media-typer": "^1.1.0", "mime-types": "^3.0.0" } }, "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw=="],
"typescript": ["typescript@5.9.3", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw=="],
"undici-types": ["undici-types@5.26.5", "", {}, "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA=="],
"unpipe": ["unpipe@1.0.0", "", {}, "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ=="],
"vary": ["vary@1.1.2", "", {}, "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg=="],
"web-streams-polyfill": ["web-streams-polyfill@4.0.0-beta.3", "", {}, "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug=="],
"webidl-conversions": ["webidl-conversions@3.0.1", "", {}, "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="],
"whatwg-url": ["whatwg-url@5.0.0", "", { "dependencies": { "tr46": "~0.0.3", "webidl-conversions": "^3.0.0" } }, "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw=="],
"which": ["which@2.0.2", "", { "dependencies": { "isexe": "^2.0.0" }, "bin": { "node-which": "./bin/node-which" } }, "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA=="],
"wrappy": ["wrappy@1.0.2", "", {}, "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ=="],
"yaml": ["yaml@2.8.3", "", { "bin": { "yaml": "bin.mjs" } }, "sha512-AvbaCLOO2Otw/lW5bmh9d/WEdcDFdQp2Z2ZUH3pX9U2ihyUY0nvLv7J6TrWowklRGPYbB/IuIMfYgxaCPg5Bpg=="],
"zod": ["zod@4.3.6", "", {}, "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg=="],
"zod-to-json-schema": ["zod-to-json-schema@3.25.2", "", { "peerDependencies": { "zod": "^3.25.28 || ^4" } }, "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA=="],
"@anthropic-ai/claude-agent-sdk/@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.74.0", "", { "dependencies": { "json-schema-to-ts": "^3.1.1" }, "peerDependencies": { "zod": "^3.25.0 || ^4.0.0" }, "optionalPeers": ["zod"], "bin": { "anthropic-ai-sdk": "bin/cli" } }, "sha512-srbJV7JKsc5cQ6eVuFzjZO7UR3xEPJqPamHFIe29bs38Ij2IripoAhC0S5NslNbaFUYqBKypmmpzMTpqfHEUDw=="],
"@types/node-fetch/@types/node": ["@types/node@25.5.0", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-jp2P3tQMSxWugkCUKLRPVUpGaL5MVFwF8RDuSRztfwgN1wmqJeMSbKlnEtQqU8UrhTmzEmZdu2I6v2dpp7XIxw=="],
"bun-types/@types/node": ["@types/node@25.5.0", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-jp2P3tQMSxWugkCUKLRPVUpGaL5MVFwF8RDuSRztfwgN1wmqJeMSbKlnEtQqU8UrhTmzEmZdu2I6v2dpp7XIxw=="],
"form-data/mime-types": ["mime-types@2.1.35", "", { "dependencies": { "mime-db": "1.52.0" } }, "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw=="],
"@types/node-fetch/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
"bun-types/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
"form-data/mime-types/mime-db": ["mime-db@1.52.0", "", {}, "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg=="],
}
}

93
ai_evals/cases/app.yaml Normal file
View File

@@ -0,0 +1,93 @@
- id: app-test1-counter-create
prompt: |-
Create a simple counter app with increment and decrement buttons.
judgeChecklist:
- shows the current count in the UI
- includes an increment button
- includes a decrement button
- clicking the buttons updates the count correctly
- id: app-test2-counter-reset
prompt: |-
Add a reset button that sets the counter back to 0
initial: ai_evals/fixtures/frontend/app/initial/test1_counter_app
judgeChecklist:
- adds a reset control to the existing counter app
- clicking reset sets the count back to 0
- keeps the existing increment and decrement behavior working
- id: app-test3-shopping-cart-quantity
prompt: |-
Add a quantity selector (+ and - buttons) to each cart item so users can adjust quantities without removing and re-adding items
initial: ai_evals/fixtures/frontend/app/initial/shopping_cart
judgeChecklist:
- each cart item has visible plus and minus quantity controls
- users can increase quantity without re-adding the product
- users can decrease quantity from the cart UI
- cart totals stay in sync with quantity changes
- id: app-test4-shopping-cart-discount
prompt: |-
Add a discount code input field in the cart.
When the code "SAVE10" is entered, apply a 10% discount to the total
initial: ai_evals/fixtures/frontend/app/initial/shopping_cart
judgeChecklist:
- adds a discount code input to the cart
- recognizes the code SAVE10
- applies a 10 percent discount to the displayed total
- keeps the rest of the cart behavior intact
- id: app-test5-file-manager-search
prompt: |-
Add a search bar in the toolbar that filters files and folders by name as the user types
initial: ai_evals/fixtures/frontend/app/initial/file_manager
judgeChecklist:
- adds a search input in the toolbar
- filters files and folders by name as the user types
- updates the visible file list from the search query
- keeps the rest of the file manager usable
- id: app-test6-file-manager-inline-rename
prompt: |-
Let users rename files and folders directly from the file list without leaving the page.
initial: ai_evals/fixtures/frontend/app/initial/file_manager
judgeChecklist:
- adds a visible rename action or inline edit mode in the file list
- lets users edit an item's name directly from the list
- saves the renamed item through the app's existing rename behavior
- refreshes the displayed name after a successful rename
- id: app-test7-file-manager-select-all
prompt: |-
Add a "Select All" checkbox in the file list header and individual checkboxes for each file.
Add a "Delete Selected" button that appears when items are selected
initial: ai_evals/fixtures/frontend/app/initial/file_manager
judgeChecklist:
- adds a select-all control in the file list header
- adds per-item selection controls
- shows a delete-selected action only when there is a selection
- deleting selected items updates the visible list
- id: app-test8-inventory-tracker-create
prompt: |-
Create an inventory tracker app for a small store.
Users should be able to add items with a name, sku, quantity, and price, search items by name or sku, and delete items.
The inventory should persist between sessions.
judgeChecklist:
- includes a form to add inventory items with name, sku, quantity, and price
- shows a list or table of saved inventory items
- supports searching or filtering by name or sku
- lets users delete existing inventory items
- persists the inventory data appropriately for a raw Windmill app
- id: app-test9-recipe-book-create
prompt: |-
Create a recipe book app where users can add recipes with a name, ingredients list, and instructions.
Include a search bar to filter recipes by name and the ability to delete recipes.
Recipes should persist between sessions.
judgeChecklist:
- includes a form to add recipes with name, ingredients, and instructions
- shows saved recipes in the app
- supports searching recipes by name
- lets users delete recipes
- persists recipes appropriately for a raw Windmill app

66
ai_evals/cases/cli.yaml Normal file
View File

@@ -0,0 +1,66 @@
- id: bun-hello-script
prompt: |-
Create a Windmill Bun script at `f/evals/hello.ts`.
It should take a `name` input and return a greeting object like `{ greeting: "Hello, Alice!" }`.
expected: ai_evals/fixtures/cli/expected/bun-hello-script
judgeChecklist:
- creates the requested Bun script at f/evals/hello.ts
- takes a name input
- returns an object containing the greeting
- id: bun-hello-flow
prompt: |-
Create a Windmill flow at `f/evals/hello__flow`.
It should take a `name` input and return a greeting object like `{ greeting: "Hello, Alice!" }`.
Put the step code in `hello.ts`.
expected: ai_evals/fixtures/cli/expected/bun-hello-flow
judgeChecklist:
- creates the requested flow folder with flow.yaml and hello.ts
- wires the name input into the flow step
- returns the greeting object
- id: python-add-numbers-script
prompt: |-
Add a Windmill Python script at `f/evals/add_numbers.py`.
It should take `a` and `b` as inputs and return `{ "total": a + b }`.
expected: ai_evals/fixtures/cli/expected/python-add-numbers-script
judgeChecklist:
- creates the requested Python script at f/evals/add_numbers.py
- takes `a` and `b` as inputs
- returns an object with total equal to a plus b
- id: bun-hello-script-uppercase
prompt: |-
Update `f/evals/hello.ts` so it accepts an optional `uppercase` boolean.
Keep returning `{ greeting: ... }`, but when `uppercase` is true the greeting should be uppercased before returning it.
initial: ai_evals/fixtures/cli/initial/bun-hello-script-uppercase
expected: ai_evals/fixtures/cli/expected/bun-hello-script-uppercase
judgeChecklist:
- updates the existing hello.ts file rather than creating a new script
- accepts an optional uppercase boolean input
- keeps returning an object with greeting
- uppercases the greeting when uppercase is true
- id: bun-hello-flow-punctuation
prompt: |-
Update the existing flow in `f/evals/hello__flow` so it also accepts an optional `punctuation` input.
The greeting should use that punctuation and default to `!` when it is missing.
initial: ai_evals/fixtures/cli/initial/bun-hello-flow-punctuation
expected: ai_evals/fixtures/cli/expected/bun-hello-flow-punctuation
judgeChecklist:
- updates the existing hello flow instead of creating a new one
- adds an optional punctuation input to the flow
- updates the step code so the returned greeting uses punctuation
- defaults punctuation to an exclamation mark when omitted
- id: flow-reuse-existing-script
prompt: |-
There is already a reusable greeting script at `f/lib/format_greeting.ts`.
Create a flow at `f/evals/reuse_greeting__flow` that takes a `name` input and reuses that existing script instead of duplicating the logic inline.
initial: ai_evals/fixtures/cli/initial/flow-reuse-existing-script
expected: ai_evals/fixtures/cli/expected/flow-reuse-existing-script
judgeChecklist:
- creates the requested flow at f/evals/reuse_greeting__flow
- reuses the existing script from f/lib by path
- does not duplicate the greeting logic in a new inline script
- wires the name input into the reused script

335
ai_evals/cases/flow.yaml Normal file
View File

@@ -0,0 +1,335 @@
- id: flow-test0-sum-two-numbers
prompt: |-
Create a flow that takes two numbers, `a` and `b`, and returns their sum.
Keep it simple and use a single step named `sum_numbers`.
expected: ai_evals/fixtures/frontend/flow/expected/test0_sum_two_numbers.json
runtime:
backendPreview:
args:
a: 4
b: 5
judgeChecklist:
- "the flow takes `a` and `b` as inputs"
- "the main step is named `sum_numbers`"
- the flow returns the sum of the two numbers
- id: flow-test1-reuse-existing-script
prompt: |-
I need a flow that adds two numbers.
If there is already a script in the workspace that does that, reuse it instead of rewriting the logic.
The flow should take `a` and `b` as inputs and use a single step named `sum_numbers`.
initial: ai_evals/fixtures/frontend/flow/initial/test1_reuse_existing_script_initial.json
expected: ai_evals/fixtures/frontend/flow/expected/test1_reuse_existing_script.json
runtime:
backendPreview:
args:
a: 2
b: 3
judgeChecklist:
- "the flow takes `a` and `b` as inputs"
- "the main step is named `sum_numbers`"
- the flow reuses the existing workspace script instead of rewriting the addition logic
- id: flow-test2-call-existing-subflow
prompt: |-
Create a parent flow that adds two numbers by reusing an existing flow in the workspace if one already exists.
The parent flow should take `a` and `b` as inputs and delegate the calculation instead of inlining it.
Use a single step named `call_add_numbers`.
initial: ai_evals/fixtures/frontend/flow/initial/test2_call_existing_subflow_initial.json
expected: ai_evals/fixtures/frontend/flow/expected/test2_call_existing_subflow.json
runtime:
backendPreview:
args:
a: 7
b: 8
judgeChecklist:
- "the parent flow takes `a` and `b` as inputs"
- "the main step is named `call_add_numbers`"
- the parent flow delegates to an existing workspace subflow instead of inlining the addition logic
- id: flow-test3-branchone-routing
prompt: |-
Create a flow that routes incoming support requests based on the customer's tier.
The input should contain a string field named `tier`.
Free, pro, and enterprise requests should go to different queues, and unknown tiers should fall back to a default queue.
Name the main routing step `route_by_tier`.
expected: ai_evals/fixtures/frontend/flow/expected/test3_branchone_routing.json
judgeChecklist:
- "the input schema includes a string field named `tier`"
- "the main routing step is named `route_by_tier`"
- free requests go to a free queue
- pro requests go to a pro queue
- enterprise requests go to an enterprise queue
- unknown tiers fall back to a default queue
- id: flow-test4-order-processing-loop
prompt: |-
Build an order-processing flow.
The input should include an order with:
- an `items` array containing `name`, `price`, and `quantity`
- `customer_email`
- `shipping_address`
The flow should:
- validate that every item has a positive price and quantity
- calculate the order total with 8% tax
- check inventory for each item using placeholder availability data
- create a shipment if everything is in stock, otherwise create a backorder
- send a confirmation using placeholder email logic
- return a final order summary with the status
validate:
schemaAnyOf:
- requiredPaths:
- order
- order.items
- order.customer_email
- order.shipping_address
- requiredPaths:
- items
- customer_email
- shipping_address
resolveResultsRefs: true
judgeChecklist:
- the flow validates that every item has a positive price and quantity
- the flow calculates the order total with 8% tax
- the flow checks inventory for each item using placeholder availability data
- the flow creates a shipment if everything is in stock, otherwise a backorder
- the flow sends a confirmation using placeholder email logic
- the flow returns a final order summary with the resulting status
- id: flow-test5-parallel-data-pipeline
prompt: |-
Create a data-processing flow for three external data sources.
It should:
- load a small placeholder configuration listing the three sources
- fetch placeholder records from each source
- clean and validate each source's records
- combine everything into one dataset
- compute an overall quality score
- store the result differently depending on the score:
- 90 or above goes to the primary database
- 70 to 89 goes to a secondary database with a warning
- below 70 goes to quarantine and triggers an alert
- return a processing report with total records, quality score, and destination
judgeChecklist:
- the flow loads a placeholder configuration listing three external sources
- the flow fetches placeholder records from each source
- the flow cleans and validates each source's records
- the flow combines everything into one dataset
- the flow computes an overall quality score
- scores of 90 or above go to the primary database
- scores from 70 to 89 go to a secondary database with a warning
- scores below 70 go to quarantine and trigger an alert
- the final report includes total records, quality score, and destination
- id: flow-test6-ai-agent-tools
prompt: |-
Create a customer support flow.
The input should include `customer_id` and `query_text`.
The flow should load the customer's profile and order history, then use an AI assistant to help with the request.
The assistant should be able to:
- look up orders
- check refund eligibility
- search FAQs
- open a support ticket when needed
After that, log the interaction and return the assistant's response.
judgeChecklist:
- "the input schema includes `customer_id` and `query_text`"
- the flow loads the customer's profile and order history
- the flow uses an AI assistant step
- the assistant can look up orders
- the assistant can check refund eligibility
- the assistant can search FAQs
- the assistant can open a support ticket
- the flow logs the interaction
- the final output returns the assistant response
- id: flow-test7-simple-modification
prompt: |-
Update this flow so it validates processed data before saving it.
After `process_data`, add a `validate_data` step that checks the data array is not empty.
If the array is empty, the flow should surface the message `No data to save` and prevent saving.
If validation passes, let the save continue normally.
Update `save_results` so it uses the validation outcome instead of bypassing it.
initial: ai_evals/fixtures/frontend/flow/initial/test5_initial.json
validate:
topLevelStepIds:
- fetch_data
- process_data
- validate_data
topLevelStepOrder:
- fetch_data
- process_data
- validate_data
topLevelStepTypes:
- id: fetch_data
type: rawscript
- id: process_data
type: rawscript
- id: validate_data
type: rawscript
judgeChecklist:
- the updated flow keeps the original fetch and process steps intact
- "a `validate_data` step is added after `process_data`"
- "`validate_data` checks that the processed data array is not empty"
- "when processed data is empty, the flow surfaces the message `No data to save` and does not save results"
- "`save_results` uses the validation outcome instead of reading `results.process_data` directly"
- "exact field names or wrapper object shape for the validation result are not important"
- id: flow-test8-branching-in-loop
prompt: |-
Update the order-processing logic inside `loop_orders` so different order types are handled differently.
For `express`, mark the order as priority and use a shipping cost of $15.99.
For `standard`, use a shipping cost of $5.99.
For `pickup`, mark it as no shipping required with a cost of $0.
Keep the existing processing as a fallback for unknown order types.
Each path should return the orderId, shipping cost, and shipping type.
initial: ai_evals/fixtures/frontend/flow/initial/test6_initial.json
judgeChecklist:
- "the existing `loop_orders` flow still handles per-order processing"
- exact branching topology is not required as long as `loop_orders` handles the order types correctly
- express orders are marked as priority and use a shipping cost of 15.99
- standard orders use a shipping cost of 5.99
- pickup orders use a shipping cost of 0 and are treated as no shipping required
- unknown order types still follow a fallback path
- "each processed order returns `orderId`, `shippingCost`, and `shippingType`"
- id: flow-test9-parallel-refactor
prompt: |-
Refactor this flow so the enrichment work no longer runs one step at a time.
`enrich_price`, `enrich_inventory`, and `enrich_reviews` should run independently.
Each one should return a fallback value if it fails.
Update `combine_data` so it merges the enrichment results and sets a `hasFallbacks` flag when any fallback was used.
Keep `get_item` as the first step and `return_result` as the last step.
initial: ai_evals/fixtures/frontend/flow/initial/test7_initial.json
validate:
topLevelStepIds:
- get_item
- combine_data
- return_result
topLevelStepOrder:
- get_item
- combine_data
- return_result
topLevelStepTypeCountsAtLeast:
- type: branchall
count: 1
topLevelStepTypes:
- id: get_item
type: rawscript
- id: combine_data
type: rawscript
- id: return_result
type: rawscript
moduleRules:
- id: enrich_price
- id: enrich_inventory
- id: enrich_reviews
judgeChecklist:
- "the updated flow keeps `get_item` as the first step"
- "the updated flow keeps `return_result` as the last step"
- "`enrich_price`, `enrich_inventory`, and `enrich_reviews` run independently rather than sequentially"
- each enrichment path returns a fallback value if it fails
- "`combine_data` merges the enrichment results"
- "`combine_data` sets `hasFallbacks` when any fallback was used"
- id: flow-test10-while-loop-counter
prompt: |-
Create a flow that keeps incrementing a counter until it reaches a target value.
The input should include a number field named `target`.
Use a top-level loop step named `count_until_target`.
Inside it, use a single step named `increment_counter` that increments the current counter.
The loop should stop once the counter reaches `target`.
After the loop, add a top-level step named `return_final_counter` that returns the last counter value.
validate:
exactTopLevelStepIds:
- count_until_target
- return_final_counter
topLevelStepOrder:
- count_until_target
- return_final_counter
topLevelStepTypes:
- id: count_until_target
type: whileloopflow
- id: return_final_counter
type: rawscript
moduleRules:
- id: count_until_target
hasStopAfterIf: true
hasStopAfterAllItersIf: false
exactImmediateChildStepIds:
- increment_counter
immediateChildStepTypes:
- id: increment_counter
type: rawscript
moduleFieldRules:
- id: count_until_target
path: stop_after_if.expr
equals: result >= flow_input.target
judgeChecklist:
- "the input schema includes a number field named `target`"
- "the top-level while loop step is named `count_until_target`"
- "`count_until_target` contains a single increment step named `increment_counter`"
- "`count_until_target` uses module-level `stop_after_if` to stop when the counter reaches `target`"
- "`increment_counter` uses `flow_input.iter.value` or an equivalent loop-state expression and falls back to `0` on the first iteration"
- "`return_final_counter` returns the final counter value"
- id: flow-test11-preprocessor-and-failure-handler
prompt: |-
Create an event-processing flow for a string payload.
Before the main processing runs, trim the payload and reject empty strings.
The main step should be named `process_event` and return a simple success object.
If anything fails, return a compact error object with the error message and the failing step id.
expected: ai_evals/fixtures/frontend/flow/expected/test11_preprocessor_failure.json
validate:
requireSpecialModules:
- preprocessor_module
- failure_module
judgeChecklist:
- the flow trims the payload before the main processing runs
- the flow rejects empty payload strings
- "the main step is named `process_event`"
- "`process_event` returns a simple success object"
- failures return a compact error object with the error message and failing step id
- id: flow-test12-approval-step
prompt: |-
Create a purchase approval flow.
The input should include `requester_email` and `amount`.
Add an approval step named `request_approval` that pauses the flow and asks the approver for a comment.
One approval should be enough to continue.
After approval, add a final step named `finalize_purchase` that returns an approved status object.
validate:
topLevelStepIds:
- request_approval
- finalize_purchase
topLevelStepOrder:
- request_approval
- finalize_purchase
topLevelStepTypes:
- id: finalize_purchase
type: rawscript
schemaRequiredPaths:
- requester_email
- amount
requireSuspendSteps:
- id: request_approval
requiredEvents: 1
resumeRequiredStringFieldAnyOf:
- comment
- approver_comment
judgeChecklist:
- "the flow includes an approval step named `request_approval`"
- "`request_approval` pauses the flow and asks the approver for a comment"
- one approval is enough to continue
- "the flow includes a final step named `finalize_purchase`"
- "`finalize_purchase` returns an approved status object after approval"

View File

@@ -0,0 +1,11 @@
- id: script-test1-greet-user
prompt: |-
Update the current Bun script so it takes the existing `name` input and returns a plain greeting string like `Hello, Alice!`.
Do not wrap the result in an object or array.
Keep it simple and do not add external dependencies.
initial: ai_evals/fixtures/frontend/script/initial/test1_empty_bun.json
expected: ai_evals/fixtures/frontend/script/expected/test1_greet_user.json
judgeChecklist:
- uses the existing `name` input
- returns a plain greeting string
- does not wrap the result in an object or array

314
ai_evals/cli/index.ts Normal file
View File

@@ -0,0 +1,314 @@
#!/usr/bin/env bun
import { Command, InvalidArgumentError } from "commander";
import { loadCases, loadSelectedCases } from "../core/cases";
import {
BACKEND_VALIDATION_MODES,
parseBackendValidationMode,
} from "../core/backendValidation";
import {
EVAL_MODELS,
type EvalModelSpec,
formatRunModelLabel,
getCliEvalModel,
getEvalModelHelpText,
resolveEvalModel,
} from "../core/models";
import {
appendHistoryRecord,
buildRunResult,
formatRunSummary,
resolveRunOutputPath,
writeRunArtifacts,
writeRunResult,
} from "../core/results";
import { runSuite } from "../core/runSuite";
import { EVAL_MODES, type EvalMode } from "../core/types";
import { DEFAULT_JUDGE_MODEL } from "../core/judge";
import { createCliModeRunner } from "../modes/cli";
import { runFrontendBenchmarkAdapter } from "../adapters/frontend/runtime";
async function main() {
const program = new Command()
.name("bun run cli --")
.description("Run AI eval cases against the current production prompts and guidance")
.showHelpAfterError()
.showSuggestionAfterError()
.addHelpText(
"after",
[
"",
"Examples:",
" bun run cli -- models",
" bun run cli -- cases",
" bun run cli -- cases flow",
" bun run cli -- run flow",
" bun run cli -- run flow --model 4o",
" bun run cli -- run flow --models haiku,opus,4o",
" bun run cli -- run flow flow-test0-sum-two-numbers --verbose",
" bun run cli -- run flow --record",
" bun run cli -- run flow --backend-validation preview",
" bun run cli -- run flow flow-test5-simple-modification --runs 3",
" bun run cli -- run cli bun-hello-script",
"",
"Models:",
getEvalModelHelpText(),
].join("\n")
);
program
.command("models")
.description("List available model aliases")
.action(() => {
handleModels();
});
program
.command("cases")
.description("List available cases")
.argument("[mode]", "cli, flow, script, or app", parseOptionalMode)
.action(async (mode?: EvalMode) => {
await handleCases(mode);
});
program
.command("run")
.description("Run one benchmark mode")
.argument("<mode>", "cli, flow, script, or app", parseMode)
.argument("[caseIds...]", "specific case ids to run")
.option("--runs <n>", "number of attempts per case", parsePositiveInteger, 1)
.option("--output <path>", "write the result JSON to this path")
.option("--model <name>", `model alias (${EVAL_MODELS.map((entry) => entry.id).join(", ")})`)
.option("--models <names>", "comma-separated model aliases to run sequentially")
.option("--verbose", "stream assistant output during frontend runs")
.option("--record", "append a compact summary line to ai_evals/history/<mode>.jsonl")
.option(
"--backend-validation <mode>",
`backend smoke validation (${BACKEND_VALIDATION_MODES.join(", ")})`
)
.action(
async (
mode: EvalMode,
caseIds: string[],
options: {
runs: number;
output?: string;
model?: string;
models?: string;
verbose?: boolean;
record?: boolean;
backendValidation?: string;
}
) => {
await handleRun({
mode,
caseIds,
runs: options.runs,
outputPath: options.output,
model: options.model,
models: options.models,
verbose: options.verbose ?? false,
record: options.record ?? false,
backendValidation: options.backendValidation,
});
}
);
await program.parseAsync(process.argv);
}
async function handleCases(mode?: EvalMode) {
const modes = mode ? [mode] : [...EVAL_MODES];
for (const entry of modes) {
const cases = await loadCases(entry);
process.stdout.write(`${entry} (${cases.length})\n`);
for (const evalCase of cases) {
process.stdout.write(`- ${evalCase.id}\n`);
}
process.stdout.write("\n");
}
}
function handleModels() {
process.stdout.write("Available models\n");
for (const model of EVAL_MODELS) {
const supports = [
...(model.frontend ? ["flow", "script", "app"] : []),
...(model.cli ? ["cli"] : []),
];
const aliases = [model.id, ...model.aliases.filter((alias) => alias !== model.id)];
process.stdout.write(`- ${model.id}: ${model.label}\n`);
process.stdout.write(` aliases: ${aliases.join(", ")}\n`);
process.stdout.write(` modes: ${supports.join(", ")}\n`);
}
process.stdout.write(`\nJudge model: ${DEFAULT_JUDGE_MODEL}\n`);
}
async function handleRun(input: {
mode: EvalMode;
caseIds: string[];
runs: number;
outputPath?: string;
model?: string;
models?: string;
verbose: boolean;
record: boolean;
backendValidation?: string;
}) {
if (input.record && input.caseIds.length > 0) {
throw new Error("--record only supports full-suite runs; omit case ids to record history");
}
if (input.model && input.models) {
throw new Error("Use either --model or --models, not both");
}
const selectedCases = await loadSelectedCases(input.mode, input.caseIds);
const models = resolveRequestedModels(input.mode, input.model, input.models);
const backendValidation = parseBackendValidationMode(
input.backendValidation ?? process.env.WMILL_AI_EVAL_BACKEND_VALIDATION
);
if (input.outputPath && models.length > 1) {
throw new Error("--output only supports a single model run");
}
if (backendValidation !== "off" && input.mode !== "flow" && input.mode !== "script") {
throw new Error("--backend-validation currently supports only flow and script modes");
}
const summaries: Array<{ label: string; passRate: number; averageDurationMs: number }> = [];
for (const [index, model] of models.entries()) {
const runModel = formatRunModelLabel(input.mode, model);
if (models.length > 1) {
process.stdout.write(
`${index > 0 ? "\n" : ""}=== ${input.mode} ${model.id} (${runModel}) ===\n`
);
}
process.stderr.write(`Starting ${input.mode} benchmark...\n`);
const result =
input.mode === "cli"
? await runCliBenchmark(selectedCases, input.runs, getCliEvalModel(model), runModel)
: await runFrontendBenchmarkAdapter({
mode: input.mode,
caseIds: input.caseIds,
runs: input.runs,
model: model.id,
verbose: input.verbose,
backendValidation,
});
const resolvedOutputPath =
models.length === 1
? resolveRunOutputPath(input.mode, input.outputPath)
: resolveRunOutputPath(input.mode);
const artifactsPath = await writeRunArtifacts(result, resolvedOutputPath);
const resultPath = await writeRunResult(result, resolvedOutputPath);
const historyPath = input.record ? await appendHistoryRecord(result) : null;
process.stdout.write(`${formatRunSummary(result)}\n`);
process.stdout.write(`Saved: ${resultPath}\n`);
if (artifactsPath) {
process.stdout.write(`Artifacts: ${artifactsPath}\n`);
}
if (historyPath) {
process.stdout.write(`Recorded: ${historyPath}\n`);
}
summaries.push({
label: `${model.id} (${runModel})`,
passRate: result.passRate,
averageDurationMs: result.averageDurationMs,
});
}
if (summaries.length > 1) {
process.stdout.write("\nModel summary\n");
for (const summary of summaries) {
process.stdout.write(
`- ${summary.label}: ${formatPercent(summary.passRate)} | ${Math.round(summary.averageDurationMs)}ms\n`
);
}
}
}
async function runCliBenchmark(
cases: Awaited<ReturnType<typeof loadSelectedCases>>,
runs: number,
model: ReturnType<typeof getCliEvalModel>,
runModel: string
) {
const caseResults = await runSuite({
modeRunner: createCliModeRunner(model),
cases,
runs,
runModel,
judgeModel: DEFAULT_JUDGE_MODEL,
});
return buildRunResult({
mode: "cli",
runs,
runModel,
judgeModel: DEFAULT_JUDGE_MODEL,
caseResults,
});
}
function parseMode(value: string): EvalMode {
if (EVAL_MODES.includes(value as EvalMode)) {
return value as EvalMode;
}
throw new InvalidArgumentError(`mode must be one of: ${EVAL_MODES.join(", ")}`);
}
function parseOptionalMode(value: string | undefined): EvalMode | undefined {
return value ? parseMode(value) : undefined;
}
function parsePositiveInteger(value: string): number {
const parsed = Number(value);
if (!Number.isInteger(parsed) || parsed <= 0) {
throw new InvalidArgumentError("must be a positive integer");
}
return parsed;
}
function resolveRequestedModels(
mode: EvalMode,
singleModel?: string,
multipleModels?: string
): EvalModelSpec[] {
if (!multipleModels) {
return [resolveEvalModel(mode, singleModel)];
}
const aliases = multipleModels
.split(",")
.map((value) => value.trim())
.filter(Boolean);
if (aliases.length === 0) {
throw new Error("--models requires at least one model alias");
}
const seen = new Set<string>();
const models: EvalModelSpec[] = [];
for (const alias of aliases) {
const model = resolveEvalModel(mode, alias);
if (seen.has(model.id)) {
continue;
}
seen.add(model.id);
models.push(model);
}
return models;
}
function formatPercent(value: number): string {
return `${(value * 100).toFixed(1)}%`;
}
void main().catch((error) => {
const message = error instanceof Error ? error.message : String(error);
process.stderr.write(`${message}\n`);
process.exit(1);
});

View File

@@ -0,0 +1,36 @@
import { describe, expect, it } from "bun:test";
import {
parseBackendValidationMode,
resolveBackendValidationSettings,
} from "./backendValidation";
describe("parseBackendValidationMode", () => {
it("defaults to off", () => {
expect(parseBackendValidationMode(undefined)).toBe("off");
expect(parseBackendValidationMode("0")).toBe("off");
expect(parseBackendValidationMode("false")).toBe("off");
});
it("accepts preview aliases", () => {
expect(parseBackendValidationMode("preview")).toBe("preview");
expect(parseBackendValidationMode("1")).toBe("preview");
expect(parseBackendValidationMode("true")).toBe("preview");
});
it("rejects unknown modes", () => {
expect(() => parseBackendValidationMode("maybe")).toThrow(
"Unsupported backend validation mode: maybe"
);
});
});
describe("resolveBackendValidationSettings", () => {
it("rejects unsupported eval modes", () => {
expect(() =>
resolveBackendValidationSettings({
evalMode: "app",
requestedMode: "preview",
})
).toThrow('Backend validation mode "preview" is only supported for flow and script evals');
});
});

View File

@@ -0,0 +1,104 @@
import type { EvalMode } from "./types";
export const BACKEND_VALIDATION_MODES = ["off", "preview"] as const;
export type BackendValidationMode = (typeof BACKEND_VALIDATION_MODES)[number];
export interface BackendValidationSettings {
mode: BackendValidationMode;
baseUrl: string;
email: string;
password: string;
keepWorkspaces: boolean;
workspaceOverride?: string;
workspacePrefix: string;
pollIntervalMs: number;
maxWaitMs: number;
}
export function parseBackendValidationMode(value?: string | null): BackendValidationMode {
const normalized = value?.trim().toLowerCase();
if (!normalized || normalized === "off" || normalized === "false" || normalized === "0") {
return "off";
}
if (normalized === "preview" || normalized === "true" || normalized === "1") {
return "preview";
}
throw new Error(
`Unsupported backend validation mode: ${value}. Use one of: ${BACKEND_VALIDATION_MODES.join(", ")}`
);
}
export function resolveBackendValidationSettings(input: {
evalMode: EvalMode;
requestedMode?: string | null;
}): BackendValidationSettings {
const mode = parseBackendValidationMode(
input.requestedMode ?? process.env.WMILL_AI_EVAL_BACKEND_VALIDATION
);
if (mode !== "off" && input.evalMode !== "flow" && input.evalMode !== "script") {
throw new Error(
`Backend validation mode "${mode}" is only supported for flow and script evals`
);
}
return {
mode,
baseUrl: normalizeBaseUrl(
process.env.WMILL_AI_EVAL_BACKEND_URL ??
process.env.WINDMILL_URL ??
process.env.WINDMILL_BASE_URL ??
process.env.REMOTE ??
"http://127.0.0.1:8000"
),
email: process.env.WMILL_AI_EVAL_BACKEND_EMAIL ?? "admin@windmill.dev",
password: process.env.WMILL_AI_EVAL_BACKEND_PASSWORD ?? "changeme",
keepWorkspaces: isTruthy(process.env.WMILL_AI_EVAL_KEEP_WORKSPACES),
workspaceOverride: sanitizeOptionalWorkspaceId(process.env.WMILL_AI_EVAL_BACKEND_WORKSPACE),
workspacePrefix: sanitizeWorkspacePrefix(
process.env.WMILL_AI_EVAL_WORKSPACE_PREFIX ?? "ai-evals"
),
pollIntervalMs: parsePositiveInteger(
process.env.WMILL_AI_EVAL_BACKEND_POLL_INTERVAL_MS,
2000
),
maxWaitMs: parsePositiveInteger(process.env.WMILL_AI_EVAL_BACKEND_MAX_WAIT_MS, 120000),
};
}
function normalizeBaseUrl(value: string): string {
return value.replace(/\/+$/, "");
}
function sanitizeWorkspacePrefix(value: string): string {
const sanitized = value
.trim()
.toLowerCase()
.replace(/[^a-z0-9-]+/g, "-")
.replace(/^-+|-+$/g, "");
return sanitized.length > 0 ? sanitized : "ai-evals";
}
function sanitizeOptionalWorkspaceId(value: string | undefined): string | undefined {
const trimmed = value?.trim();
return trimmed ? trimmed : undefined;
}
function isTruthy(value: string | undefined): boolean {
if (!value) {
return false;
}
return ["1", "true", "yes", "on"].includes(value.trim().toLowerCase());
}
function parsePositiveInteger(value: string | undefined, fallback: number): number {
if (!value) {
return fallback;
}
const parsed = Number(value);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}

View File

@@ -0,0 +1,18 @@
import { describe, expect, it } from "bun:test";
import { loadCases } from "./cases";
describe("loadCases", () => {
it("loads backend preview runtime config for opt-in flow cases", async () => {
const flowCases = await loadCases("flow");
const caseEntry = flowCases.find((entry) => entry.id === "flow-test1-reuse-existing-script");
expect(caseEntry?.runtime).toEqual({
backendPreview: {
args: {
a: 2,
b: 3,
},
},
});
});
});

73
ai_evals/core/cases.ts Normal file
View File

@@ -0,0 +1,73 @@
import { readFile } from "node:fs/promises";
import path from "node:path";
import { fileURLToPath } from "node:url";
import { parse } from "yaml";
import type { EvalCase, EvalCaseRuntimeSpec, EvalMode, FlowValidationSpec } from "./types";
const REPO_ROOT = fileURLToPath(new URL("../../", import.meta.url));
const CASES_DIR = path.join(REPO_ROOT, "ai_evals", "cases");
interface RawEvalCase {
id: string;
prompt: string;
initial?: string;
expected?: string;
validate?: FlowValidationSpec;
judgeChecklist?: string[];
runtime?: EvalCaseRuntimeSpec;
}
export function getRepoRoot(): string {
return REPO_ROOT;
}
export function getAiEvalsRoot(): string {
return path.join(REPO_ROOT, "ai_evals");
}
export async function loadCases(mode: EvalMode): Promise<EvalCase[]> {
const filePath = path.join(CASES_DIR, `${mode}.yaml`);
const raw = await readFile(filePath, "utf8");
const parsed = parse(raw);
if (!Array.isArray(parsed)) {
throw new Error(`Expected ${filePath} to contain a YAML list of cases`);
}
return parsed.map((entry) => ({
id: entry.id,
prompt: entry.prompt,
initialPath: resolveFixturePath(entry.initial),
expectedPath: resolveFixturePath(entry.expected),
validate: entry.validate,
judgeChecklist: entry.judgeChecklist,
runtime: entry.runtime,
}));
}
export async function loadSelectedCases(
mode: EvalMode,
selectedIds: string[]
): Promise<EvalCase[]> {
const allCases = await loadCases(mode);
if (selectedIds.length === 0) {
return allCases;
}
const caseMap = new Map(allCases.map((entry) => [entry.id, entry]));
const missing = selectedIds.filter((id) => !caseMap.has(id));
if (missing.length > 0) {
throw new Error(
`Unknown ${mode} case${missing.length === 1 ? "" : "s"}: ${missing.join(", ")}`
);
}
return selectedIds.map((id) => caseMap.get(id)!);
}
function resolveFixturePath(value: string | undefined): string | undefined {
if (!value) {
return undefined;
}
return path.isAbsolute(value) ? value : path.join(REPO_ROOT, value);
}

67
ai_evals/core/files.ts Normal file
View File

@@ -0,0 +1,67 @@
import { access, copyFile, mkdir, readdir, readFile } from "node:fs/promises";
import path from "node:path";
export async function exists(filePath: string): Promise<boolean> {
try {
await access(filePath);
return true;
} catch {
return false;
}
}
export async function readJsonFile<T>(filePath: string): Promise<T> {
const raw = await readFile(filePath, "utf8");
return JSON.parse(raw) as T;
}
export async function readDirectoryFiles(
rootDir: string,
options: {
ignore?: Set<string>;
} = {}
): Promise<Record<string, string>> {
const files: Record<string, string> = {};
await walkDirectory(rootDir, "", files, options.ignore ?? new Set());
return files;
}
export async function copyDirectory(sourceDir: string, targetDir: string): Promise<void> {
const entries = await readdir(sourceDir, { withFileTypes: true });
await mkdir(targetDir, { recursive: true });
for (const entry of entries) {
const sourcePath = path.join(sourceDir, entry.name);
const targetPath = path.join(targetDir, entry.name);
if (entry.isDirectory()) {
await copyDirectory(sourcePath, targetPath);
continue;
}
await mkdir(path.dirname(targetPath), { recursive: true });
await copyFile(sourcePath, targetPath);
}
}
async function walkDirectory(
absoluteDir: string,
relativeDir: string,
output: Record<string, string>,
ignore: Set<string>
): Promise<void> {
const entries = await readdir(absoluteDir, { withFileTypes: true });
for (const entry of entries) {
const relativePath = relativeDir ? `${relativeDir}/${entry.name}` : entry.name;
if (ignore.has(relativePath) || ignore.has(entry.name)) {
continue;
}
const absolutePath = path.join(absoluteDir, entry.name);
if (entry.isDirectory()) {
await walkDirectory(absolutePath, relativePath, output, ignore);
continue;
}
output[relativePath] = await readFile(absolutePath, "utf8");
}
}

151
ai_evals/core/judge.ts Normal file
View File

@@ -0,0 +1,151 @@
import Anthropic from "@anthropic-ai/sdk";
import type { EvalMode, JudgeResult } from "./types";
export const DEFAULT_JUDGE_MODEL = "claude-sonnet-4-6";
const JUDGE_TOOL_NAME = "submit_judgement";
export async function judgeOutput(input: {
mode: EvalMode;
prompt: string;
checklist?: string[];
initial?: unknown;
expected?: unknown;
actual: unknown;
model?: string;
}): Promise<JudgeResult> {
const apiKey = process.env.ANTHROPIC_API_KEY;
if (!apiKey) {
return {
success: false,
score: 0,
summary: "Judge unavailable",
error: "ANTHROPIC_API_KEY is not set",
};
}
const client = new Anthropic({ apiKey });
const model = input.model ?? DEFAULT_JUDGE_MODEL;
const system = [
"You evaluate benchmark outputs for Windmill AI generation.",
"Deterministic checks already run separately. Focus on whether the final output satisfies the user request.",
"If expected state is provided, treat it as a valid example and reward semantically equivalent outputs.",
"If a checklist is provided, treat it as the explicit acceptance criteria for this case.",
"Be strict about missing requested functionality.",
"When the prompt wording is ambiguous, prefer the checklist over inferred structural requirements.",
"Do not invent additional Windmill-specific constraints that are not explicit in the prompt, checklist, or expected state.",
"Do not lower the score just because the output uses a different but valid Windmill idiom, naming choice, or equivalent field shape.",
"Do not require exact ids, exact topology, or exact field names unless the prompt, checklist, or expected state clearly requires them.",
`Always respond by calling the ${JUDGE_TOOL_NAME} tool exactly once.`,
].join("\n\n");
const user = [
`Mode: ${input.mode}`,
"",
"User prompt:",
input.prompt,
"",
"Checklist:",
formatChecklist(input.checklist),
"",
"Initial state:",
formatJsonBlock(input.initial),
"",
"Expected state:",
formatJsonBlock(input.expected),
"",
"Actual result:",
formatJsonBlock(input.actual),
].join("\n");
try {
const response = await client.messages.create({
model,
max_tokens: 1024,
temperature: 0,
system,
messages: [{ role: "user", content: user }],
tools: [
{
name: JUDGE_TOOL_NAME,
description: "Submit the benchmark judgement as structured data.",
input_schema: {
type: "object",
properties: {
score: {
type: "integer",
minimum: 0,
maximum: 100,
},
summary: {
type: "string",
},
},
required: ["score", "summary"],
},
},
],
tool_choice: {
type: "tool",
name: JUDGE_TOOL_NAME,
disable_parallel_tool_use: true,
},
});
const toolUseBlock = response.content.find(
(block): block is Anthropic.ToolUseBlock =>
block.type === "tool_use" && block.name === JUDGE_TOOL_NAME
);
if (!toolUseBlock) {
return {
success: false,
score: 0,
summary: "Judge returned no tool output",
error: "Expected structured tool output from judge",
};
}
const parsed = toolUseBlock.input as {
score: number;
summary: string;
};
return {
success: true,
score: normalizeScore(parsed.score),
summary: parsed.summary,
};
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return {
success: false,
score: 0,
summary: "Judge failed",
error: message,
};
}
}
function formatJsonBlock(value: unknown): string {
if (value === undefined) {
return "(none)";
}
return JSON.stringify(value, null, 2);
}
function formatChecklist(checklist: string[] | undefined): string {
if (!checklist || checklist.length === 0) {
return "(none)";
}
return checklist.map((item) => `- ${item}`).join("\n");
}
function normalizeScore(value: number): number {
if (!Number.isFinite(value)) {
return 0;
}
return Math.max(0, Math.min(100, Math.round(value)));
}

View File

@@ -0,0 +1,29 @@
import { describe, expect, it } from "bun:test";
import { resolveEvalModel } from "./models";
describe("resolveEvalModel", () => {
it("supports Gemini aliases for frontend evals", () => {
expect(resolveEvalModel("flow", "gemini").frontend).toEqual({
provider: "googleai",
model: "gemini-2.5-flash",
});
expect(resolveEvalModel("app", "gemini-pro").frontend).toEqual({
provider: "googleai",
model: "gemini-2.5-pro",
});
expect(resolveEvalModel("script", "gemini-3-flash-preview").frontend).toEqual({
provider: "googleai",
model: "gemini-3-flash-preview",
});
expect(resolveEvalModel("flow", "gemini-3.1-pro-preview").frontend).toEqual({
provider: "googleai",
model: "gemini-3.1-pro-preview",
});
});
it("rejects Gemini aliases for cli evals", () => {
expect(() => resolveEvalModel("cli", "gemini")).toThrow(
"Model gemini-flash is not supported for cli mode"
);
});
});

185
ai_evals/core/models.ts Normal file
View File

@@ -0,0 +1,185 @@
import type { EvalMode } from "./types";
export interface FrontendEvalModelConfig {
provider: "anthropic" | "openai" | "googleai";
model: string;
}
export interface CliEvalModelConfig {
provider: "anthropic";
model: string;
}
export interface EvalModelSpec {
id: string;
label: string;
aliases: string[];
frontend?: FrontendEvalModelConfig;
cli?: CliEvalModelConfig;
}
export const EVAL_MODELS: EvalModelSpec[] = [
{
id: "haiku",
label: "Claude Haiku 4.5",
aliases: [
"haiku",
"haiku-4.5",
"claude-haiku",
"claude-haiku-4.5",
"claude-haiku-4-5",
"claude-haiku-4-5-20251001",
],
frontend: {
provider: "anthropic",
model: "claude-haiku-4-5-20251001",
},
cli: {
provider: "anthropic",
model: "haiku",
},
},
{
id: "sonnet",
label: "Claude Sonnet 4.5",
aliases: [
"sonnet",
"sonnet-4.5",
"claude-sonnet",
"claude-sonnet-4.5",
"claude-sonnet-4-5",
"claude-sonnet-4-5-20250929",
],
frontend: {
provider: "anthropic",
model: "claude-sonnet-4-5-20250929",
},
cli: {
provider: "anthropic",
model: "sonnet",
},
},
{
id: "opus",
label: "Claude Opus 4.6",
aliases: [
"opus",
"opus-4.6",
"claude-opus",
"claude-opus-4.6",
"claude-opus-4-6",
],
frontend: {
provider: "anthropic",
model: "claude-opus-4-6",
},
cli: {
provider: "anthropic",
model: "opus",
},
},
{
id: "4o",
label: "GPT-4o",
aliases: ["4o", "gpt-4o"],
frontend: {
provider: "openai",
model: "gpt-4o",
},
},
{
id: "gemini-flash",
label: "Gemini 2.5 Flash",
aliases: ["gemini", "gemini-flash", "gemini-2.5-flash"],
frontend: {
provider: "googleai",
model: "gemini-2.5-flash",
},
},
{
id: "gemini-pro",
label: "Gemini 2.5 Pro",
aliases: ["gemini-pro", "gemini-2.5-pro"],
frontend: {
provider: "googleai",
model: "gemini-2.5-pro",
},
},
{
id: "gemini-3-flash-preview",
label: "Gemini 3 Flash Preview",
aliases: ["gemini-3-flash-preview", "gemini-3-flash"],
frontend: {
provider: "googleai",
model: "gemini-3-flash-preview",
},
},
{
id: "gemini-3.1-pro-preview",
label: "Gemini 3.1 Pro Preview",
aliases: ["gemini-3.1-pro-preview", "gemini-3.1-pro", "gemini-3-pro-preview"],
frontend: {
provider: "googleai",
model: "gemini-3.1-pro-preview",
},
},
];
export function resolveEvalModel(mode: EvalMode, alias?: string): EvalModelSpec {
const spec = alias ? findEvalModel(alias) : getDefaultEvalModel(mode);
if (!spec) {
throw new Error(`Unknown model: ${alias}`);
}
if (mode === "cli" && !spec.cli) {
throw new Error(`Model ${spec.id} is not supported for cli mode`);
}
if (mode !== "cli" && !spec.frontend) {
throw new Error(`Model ${spec.id} is not supported for ${mode} mode`);
}
return spec;
}
export function getEvalModelHelpText(): string {
return EVAL_MODELS.map((model) => {
const modes = [
...(model.frontend ? ["flow", "script", "app"] : []),
...(model.cli ? ["cli"] : []),
];
return ` ${model.id.padEnd(8)} ${model.label} (${modes.join(", ")})`;
}).join("\n");
}
export function formatRunModelLabel(mode: EvalMode, model: EvalModelSpec): string {
if (mode === "cli") {
return `${model.cli!.provider}:${model.cli!.model}`;
}
return `${model.frontend!.provider}:${model.frontend!.model}`;
}
export function getFrontendEvalModel(model: EvalModelSpec): FrontendEvalModelConfig {
if (!model.frontend) {
throw new Error(`Model ${model.id} does not support frontend evals`);
}
return model.frontend;
}
export function getCliEvalModel(model: EvalModelSpec): CliEvalModelConfig {
if (!model.cli) {
throw new Error(`Model ${model.id} does not support cli evals`);
}
return model.cli;
}
function getDefaultEvalModel(mode: EvalMode): EvalModelSpec {
return mode === "cli" ? EVAL_MODELS[0]! : EVAL_MODELS[0]!;
}
function findEvalModel(alias: string): EvalModelSpec | undefined {
const normalized = alias.trim().toLowerCase();
return EVAL_MODELS.find((model) =>
[model.id, ...model.aliases].some((candidate) => candidate.toLowerCase() === normalized)
);
}

296
ai_evals/core/results.ts Normal file
View File

@@ -0,0 +1,296 @@
import { appendFile, mkdir, rm, writeFile } from "node:fs/promises";
import path from "node:path";
import { execFileSync } from "node:child_process";
import { getAiEvalsRoot, getRepoRoot } from "./cases";
import type {
BenchmarkArtifactFile,
BenchmarkCaseResult,
BenchmarkRunResult,
BenchmarkTokenUsage,
EvalMode,
} from "./types";
export async function writeRunResult(
result: BenchmarkRunResult,
outputPath?: string
): Promise<string> {
const targetPath = resolveRunOutputPath(result.mode, outputPath);
await mkdir(path.dirname(targetPath), { recursive: true });
await writeFile(targetPath, JSON.stringify(toSerializableRunResult(result), null, 2) + "\n", "utf8");
return targetPath;
}
export async function appendHistoryRecord(
result: BenchmarkRunResult,
historyPath = resolveHistoryPath(result.mode)
): Promise<string> {
await mkdir(path.dirname(historyPath), { recursive: true });
await appendFile(historyPath, JSON.stringify(toHistoryRecord(result)) + "\n", "utf8");
return historyPath;
}
export async function writeRunArtifacts(
result: BenchmarkRunResult,
outputPath?: string
): Promise<string | null> {
const targetPath = resolveRunOutputPath(result.mode, outputPath);
const artifactRoot = defaultArtifactsRoot(targetPath);
await rm(artifactRoot, { recursive: true, force: true });
let wroteArtifacts = false;
for (const caseResult of result.cases) {
for (const attempt of caseResult.attempts) {
const artifactFiles = attempt.artifactFiles ?? [];
if (artifactFiles.length === 0) {
attempt.artifactsPath = null;
continue;
}
const attemptDir = path.join(artifactRoot, caseResult.id, `attempt-${attempt.attempt}`);
await writeArtifactFiles(attemptDir, artifactFiles);
attempt.artifactsPath = attemptDir;
wroteArtifacts = true;
}
}
result.artifactsPath = wroteArtifacts ? artifactRoot : null;
return result.artifactsPath ?? null;
}
export function buildRunResult(input: {
mode: EvalMode;
runs: number;
runModel: string | null;
judgeModel: string | null;
caseResults: BenchmarkCaseResult[];
}): BenchmarkRunResult {
const attemptCount = input.caseResults.reduce((sum, entry) => sum + entry.attempts.length, 0);
const passedAttempts = input.caseResults.reduce(
(sum, entry) => sum + entry.attempts.filter((attempt) => attempt.passed).length,
0
);
const durationTotal = input.caseResults.reduce(
(sum, entry) => sum + entry.attempts.reduce((inner, attempt) => inner + attempt.durationMs, 0),
0
);
const tokenUsageTotal = input.caseResults.reduce<BenchmarkTokenUsage | null>(
(sum, entry) => {
for (const attempt of entry.attempts) {
if (!attempt.tokenUsage) {
continue;
}
sum ??= { prompt: 0, completion: 0, total: 0 };
sum.prompt += attempt.tokenUsage.prompt;
sum.completion += attempt.tokenUsage.completion;
sum.total += attempt.tokenUsage.total;
}
return sum;
},
null
);
return {
version: 1,
mode: input.mode,
createdAt: new Date().toISOString(),
gitSha: getGitSha(),
runs: input.runs,
runModel: input.runModel,
judgeModel: input.judgeModel,
caseCount: input.caseResults.length,
attemptCount,
passedAttempts,
passRate: attemptCount === 0 ? 0 : passedAttempts / attemptCount,
averageDurationMs: attemptCount === 0 ? 0 : durationTotal / attemptCount,
totalTokenUsage: tokenUsageTotal,
averageTokenUsagePerAttempt:
attemptCount === 0 || !tokenUsageTotal
? null
: {
prompt: tokenUsageTotal.prompt / attemptCount,
completion: tokenUsageTotal.completion / attemptCount,
total: tokenUsageTotal.total / attemptCount,
},
cases: input.caseResults,
};
}
export function formatRunSummary(result: BenchmarkRunResult): string {
const lines = [
`${result.mode} benchmark complete`,
`Pass rate: ${formatPercent(result.passRate)} (${result.passedAttempts}/${result.attemptCount})`,
`Average duration: ${Math.round(result.averageDurationMs)}ms`,
];
const failures = collectFailures(result);
if (failures.length > 0) {
lines.push("Failures:");
for (const entry of failures.slice(0, 10)) {
lines.push(`- ${entry}`);
}
}
return lines.join("\n");
}
function collectFailures(result: BenchmarkRunResult): string[] {
const failures: string[] = [];
for (const caseResult of result.cases) {
for (const attempt of caseResult.attempts) {
if (attempt.passed) {
continue;
}
const failedChecks = attempt.checks.filter((check) => !check.passed).map((check) => check.name);
failures.push(
`${caseResult.id} attempt ${attempt.attempt}: ${failedChecks.join(", ") || attempt.error || "failed"}`
);
}
}
return failures;
}
function defaultFileName(mode: EvalMode): string {
return `${new Date().toISOString().replaceAll(":", "-")}__${mode}.json`;
}
export function resolveRunOutputPath(mode: EvalMode, outputPath?: string): string {
return outputPath ?? path.join(getAiEvalsRoot(), "results", defaultFileName(mode));
}
export function resolveHistoryPath(mode: EvalMode): string {
return path.join(getAiEvalsRoot(), "history", `${mode}.jsonl`);
}
function defaultArtifactsRoot(resultPath: string): string {
return resultPath.endsWith(".json")
? resultPath.slice(0, -".json".length)
: `${resultPath}.artifacts`;
}
async function writeArtifactFiles(
rootDir: string,
files: BenchmarkArtifactFile[]
): Promise<void> {
for (const file of files) {
const relativePath = normalizeArtifactPath(file.path);
const targetPath = path.join(rootDir, relativePath);
await mkdir(path.dirname(targetPath), { recursive: true });
await writeFile(targetPath, file.content, "utf8");
}
}
function normalizeArtifactPath(filePath: string): string {
const normalized = filePath.replaceAll("\\", "/").replace(/^\/+/, "");
const parts = normalized.split("/").filter(Boolean);
if (parts.length === 0 || parts.some((part) => part === "." || part === "..")) {
throw new Error(`Invalid artifact path: ${filePath}`);
}
return parts.join("/");
}
function toSerializableRunResult(result: BenchmarkRunResult): BenchmarkRunResult {
return {
...result,
cases: result.cases.map((caseResult) => ({
...caseResult,
attempts: caseResult.attempts.map(({ artifactFiles, ...attempt }) => attempt),
})),
};
}
function toHistoryRecord(result: BenchmarkRunResult) {
const judgeScores = result.cases.flatMap((caseResult) =>
caseResult.attempts.flatMap((attempt) =>
typeof attempt.judgeScore === "number" ? [attempt.judgeScore] : []
)
);
return {
createdAt: result.createdAt,
gitSha: result.gitSha,
mode: result.mode,
runs: result.runs,
runModel: result.runModel,
judgeModel: result.judgeModel,
caseCount: result.caseCount,
attemptCount: result.attemptCount,
passedAttempts: result.passedAttempts,
passRate: result.passRate,
averageDurationMs: result.averageDurationMs,
averageJudgeScore:
judgeScores.length === 0
? null
: judgeScores.reduce((sum, score) => sum + score, 0) / judgeScores.length,
averageTokenUsagePerAttempt: result.averageTokenUsagePerAttempt ?? null,
failedCaseIds: Array.from(
new Set(
result.cases
.filter((caseResult) => caseResult.attempts.some((attempt) => !attempt.passed))
.map((caseResult) => caseResult.id)
)
),
cases: result.cases.map((caseResult) => {
const attemptCount = caseResult.attempts.length;
const passedAttempts = caseResult.attempts.filter((attempt) => attempt.passed).length;
const totalDurationMs = caseResult.attempts.reduce(
(sum, attempt) => sum + attempt.durationMs,
0
);
const judgeScores = caseResult.attempts.flatMap((attempt) =>
typeof attempt.judgeScore === "number" ? [attempt.judgeScore] : []
);
const totalTokenUsage = caseResult.attempts.reduce<BenchmarkTokenUsage | null>(
(sum, attempt) => {
if (!attempt.tokenUsage) {
return sum;
}
sum ??= { prompt: 0, completion: 0, total: 0 };
sum.prompt += attempt.tokenUsage.prompt;
sum.completion += attempt.tokenUsage.completion;
sum.total += attempt.tokenUsage.total;
return sum;
},
null
);
return {
id: caseResult.id,
attemptCount,
passedAttempts,
passRate: attemptCount === 0 ? 0 : passedAttempts / attemptCount,
averageDurationMs: attemptCount === 0 ? 0 : totalDurationMs / attemptCount,
averageJudgeScore:
judgeScores.length === 0
? null
: judgeScores.reduce((sum, score) => sum + score, 0) / judgeScores.length,
averageTokenUsagePerAttempt:
attemptCount === 0 || !totalTokenUsage
? null
: {
prompt: totalTokenUsage.prompt / attemptCount,
completion: totalTokenUsage.completion / attemptCount,
total: totalTokenUsage.total / attemptCount,
},
};
}),
};
}
function getGitSha(): string | null {
try {
return execFileSync("git", ["rev-parse", "HEAD"], {
cwd: getRepoRoot(),
encoding: "utf8",
stdio: ["ignore", "pipe", "ignore"],
}).trim();
} catch {
return null;
}
}
function formatPercent(value: number): string {
return `${(value * 100).toFixed(1)}%`;
}

301
ai_evals/core/runSuite.ts Normal file
View File

@@ -0,0 +1,301 @@
import { judgeOutput, DEFAULT_JUDGE_MODEL } from "./judge";
import type {
BenchmarkAttemptResult,
BenchmarkCaseResult,
BenchmarkCheck,
EvalCase,
FrontendBenchmarkProgressEvent,
ModeRunner,
} from "./types";
export async function runSuite<TInitial, TExpected, TActual>(input: {
modeRunner: ModeRunner<TInitial, TExpected, TActual>;
cases: EvalCase[];
runs: number;
runModel: string | null;
judgeModel?: string | null;
concurrency?: number;
verbose?: boolean;
onProgress?: (event: FrontendBenchmarkProgressEvent) => void;
}): Promise<BenchmarkCaseResult[]> {
const judgeModel = input.judgeModel ?? DEFAULT_JUDGE_MODEL;
const concurrency = Math.max(1, input.concurrency ?? input.modeRunner.concurrency);
const results = new Array<BenchmarkCaseResult>(input.cases.length);
let cursor = 0;
if (input.modeRunner.mode !== "cli") {
input.onProgress?.({
type: "run-start",
surface: input.modeRunner.mode,
totalCases: input.cases.length,
runs: input.runs,
concurrency,
});
}
async function worker(): Promise<void> {
while (true) {
const caseIndex = cursor++;
if (caseIndex >= input.cases.length) {
return;
}
const evalCase = input.cases[caseIndex];
results[caseIndex] = {
id: evalCase.id,
prompt: evalCase.prompt,
initialPath: evalCase.initialPath,
expectedPath: evalCase.expectedPath,
attempts: await runCaseAttempts({
caseIndex,
evalCase,
runs: input.runs,
judgeModel,
judgeThreshold: input.modeRunner.judgeThreshold ?? 80,
modeRunner: input.modeRunner,
totalCases: input.cases.length,
verbose: input.verbose ?? false,
onProgress: input.onProgress,
}),
};
}
}
await Promise.all(
Array.from({ length: Math.min(concurrency, input.cases.length) }, () => worker())
);
return results;
}
async function runCaseAttempts<TInitial, TExpected, TActual>(input: {
caseIndex: number;
evalCase: EvalCase;
runs: number;
judgeModel: string;
judgeThreshold: number;
modeRunner: ModeRunner<TInitial, TExpected, TActual>;
totalCases: number;
verbose: boolean;
onProgress?: (event: FrontendBenchmarkProgressEvent) => void;
}): Promise<BenchmarkAttemptResult[]> {
const attempts: BenchmarkAttemptResult[] = [];
const surface = input.modeRunner.mode === "cli" ? null : input.modeRunner.mode;
for (let attempt = 1; attempt <= input.runs; attempt += 1) {
if (surface) {
input.onProgress?.({
type: "attempt-start",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
});
}
const startedAt = Date.now();
try {
const initial = await input.modeRunner.loadInitial(input.evalCase.initialPath);
const expected = await input.modeRunner.loadExpected(input.evalCase.expectedPath);
const run = await input.modeRunner.run(input.evalCase.prompt, initial, {
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
verbose: input.verbose,
onAssistantMessageStart: input.verbose && surface
? () =>
input.onProgress?.({
type: "assistant-message-start",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
})
: undefined,
onAssistantChunk: input.verbose && surface
? (chunk: string) =>
input.onProgress?.({
type: "assistant-chunk",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
chunk,
})
: undefined,
onAssistantMessageEnd: input.verbose && surface
? () =>
input.onProgress?.({
type: "assistant-message-end",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
})
: undefined,
});
const checks: BenchmarkCheck[] = [
buildCheck("run succeeded", run.success, run.error),
...input.modeRunner.validate({
evalCase: input.evalCase,
prompt: input.evalCase.prompt,
initial,
expected,
actual: run.actual,
run,
}),
];
const artifactFiles = input.modeRunner.buildArtifacts?.(run.actual) ?? [];
if (run.success && input.modeRunner.backendValidate) {
try {
const backendValidation = await input.modeRunner.backendValidate({
evalCase: input.evalCase,
prompt: input.evalCase.prompt,
initial,
expected,
actual: run.actual,
run,
context: {
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
verbose: input.verbose,
onAssistantMessageStart: undefined,
onAssistantChunk: undefined,
onAssistantMessageEnd: undefined,
},
});
if (backendValidation) {
checks.push(...backendValidation.checks);
artifactFiles.push(...(backendValidation.artifactFiles ?? []));
}
} catch (error) {
checks.push(
buildCheck(
"backend validation succeeded",
false,
error instanceof Error ? error.message : String(error)
)
);
}
}
let judgeScore: number | null = null;
let judgeSummary: string | null = null;
if (run.success) {
const judge = await judgeOutput({
mode: input.modeRunner.mode,
prompt: input.evalCase.prompt,
checklist: input.evalCase.judgeChecklist,
initial,
expected: input.modeRunner.mode === "cli" ? undefined : expected,
actual: run.actual,
model: input.judgeModel,
});
judgeScore = judge.success ? judge.score : null;
judgeSummary = judge.summary;
checks.push(buildCheck("judge succeeded", judge.success, judge.error));
checks.push(
buildCheck(
`judge score >= ${input.judgeThreshold}`,
(judgeScore ?? 0) >= input.judgeThreshold,
judge.success ? `score=${judgeScore}` : judge.error
)
);
}
const attemptResult: BenchmarkAttemptResult = {
attempt,
passed: checks.every((check) => check.passed),
durationMs: Date.now() - startedAt,
assistantMessageCount: run.assistantMessageCount,
toolCallCount: run.toolCallCount,
toolsUsed: uniqueStrings(run.toolsUsed),
skillsInvoked: uniqueStrings(run.skillsInvoked),
checks,
judgeScore,
judgeSummary,
error: run.error ?? null,
tokenUsage: run.tokenUsage ?? null,
artifactsPath: null,
artifactFiles,
};
if (surface) {
input.onProgress?.({
type: "attempt-finish",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
passed: attemptResult.passed,
durationMs: attemptResult.durationMs,
judgeScore: attemptResult.judgeScore,
error: attemptResult.error,
});
}
attempts.push(attemptResult);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
const failedAttempt: BenchmarkAttemptResult = {
attempt,
passed: false,
durationMs: Date.now() - startedAt,
assistantMessageCount: 0,
toolCallCount: 0,
toolsUsed: [],
skillsInvoked: [],
checks: [buildCheck("run crashed", false, message)],
judgeScore: null,
judgeSummary: null,
error: message,
tokenUsage: null,
};
if (surface) {
input.onProgress?.({
type: "attempt-finish",
surface,
caseId: input.evalCase.id,
caseNumber: input.caseIndex + 1,
totalCases: input.totalCases,
attempt,
runs: input.runs,
passed: false,
durationMs: failedAttempt.durationMs,
judgeScore: null,
error: message,
});
}
attempts.push(failedAttempt);
}
}
return attempts;
}
function buildCheck(name: string, passed: boolean, details?: string): BenchmarkCheck {
return details ? { name, passed, details } : { name, passed };
}
function uniqueStrings(values: string[]): string[] {
return [...new Set(values)];
}

255
ai_evals/core/types.ts Normal file
View File

@@ -0,0 +1,255 @@
export const EVAL_MODES = ["cli", "flow", "script", "app"] as const;
export type EvalMode = (typeof EVAL_MODES)[number];
export interface EvalCaseRuntimeBackendPreview {
args?: Record<string, unknown>;
timeoutSeconds?: number;
}
export interface EvalCaseRuntimeSpec {
backendPreview?: EvalCaseRuntimeBackendPreview;
}
export interface FlowValidationSpec {
schemaRequiredPaths?: string[];
schemaAnyOf?: Array<{
requiredPaths: string[];
}>;
exactTopLevelStepIds?: string[];
topLevelStepIds?: string[];
topLevelStepOrder?: string[];
topLevelStepTypeCountsAtLeast?: Array<{
type: string;
count: number;
}>;
topLevelStepTypes?: Array<{
id: string;
type: string;
}>;
moduleRules?: Array<{
id: string;
hasStopAfterIf?: boolean;
hasStopAfterAllItersIf?: boolean;
immediateChildStepIds?: string[];
exactImmediateChildStepIds?: string[];
immediateChildStepTypes?: Array<{
id: string;
type: string;
}>;
requiredInputTransforms?: Array<{
type?: string;
expr?: string;
exprAnyOf?: string[];
value?: string | number | boolean | null;
}>;
}>;
moduleFieldRules?: Array<{
id: string;
path: string;
equals: string | number | boolean | null;
}>;
resolveResultsRefs?: boolean;
requireSpecialModules?: Array<"preprocessor_module" | "failure_module">;
requireSuspendSteps?: Array<{
id: string;
requiredEvents?: number;
resumeRequiredStringFieldAnyOf?: string[];
}>;
}
export interface EvalCase {
id: string;
prompt: string;
initialPath?: string;
expectedPath?: string;
validate?: FlowValidationSpec;
judgeChecklist?: string[];
runtime?: EvalCaseRuntimeSpec;
}
export interface BenchmarkCheck {
name: string;
passed: boolean;
details?: string;
}
export interface JudgeResult {
success: boolean;
score: number;
summary: string;
error?: string;
}
export interface BenchmarkArtifactFile {
path: string;
content: string;
}
export interface BackendValidationResult {
checks: BenchmarkCheck[];
artifactFiles?: BenchmarkArtifactFile[];
}
export interface BenchmarkTokenUsage {
prompt: number;
completion: number;
total: number;
}
export interface ModeRunOutput<TActual> {
success: boolean;
actual: TActual;
error?: string;
assistantMessageCount: number;
toolCallCount: number;
toolsUsed: string[];
skillsInvoked: string[];
tokenUsage?: BenchmarkTokenUsage | null;
}
export interface ModeRunContext {
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
verbose: boolean;
onAssistantMessageStart?: () => void;
onAssistantChunk?: (chunk: string) => void;
onAssistantMessageEnd?: () => void;
}
export interface ModeRunner<TInitial, TExpected, TActual> {
mode: EvalMode;
concurrency: number;
judgeThreshold?: number;
loadInitial(path?: string): Promise<TInitial | undefined>;
loadExpected(path?: string): Promise<TExpected | undefined>;
run(
prompt: string,
initial: TInitial | undefined,
context: ModeRunContext
): Promise<ModeRunOutput<TActual>>;
validate(input: {
evalCase: EvalCase;
prompt: string;
initial: TInitial | undefined;
expected: TExpected | undefined;
actual: TActual;
run: ModeRunOutput<TActual>;
}): BenchmarkCheck[];
backendValidate?(input: {
evalCase: EvalCase;
prompt: string;
initial: TInitial | undefined;
expected: TExpected | undefined;
actual: TActual;
run: ModeRunOutput<TActual>;
context: ModeRunContext;
}): Promise<BackendValidationResult | null>;
buildArtifacts?(actual: TActual): BenchmarkArtifactFile[];
}
export interface BenchmarkAttemptResult {
attempt: number;
passed: boolean;
durationMs: number;
assistantMessageCount: number;
toolCallCount: number;
toolsUsed: string[];
skillsInvoked: string[];
checks: BenchmarkCheck[];
judgeScore: number | null;
judgeSummary: string | null;
error: string | null;
tokenUsage?: BenchmarkTokenUsage | null;
artifactsPath?: string | null;
artifactFiles?: BenchmarkArtifactFile[];
}
export interface BenchmarkCaseResult {
id: string;
prompt: string;
initialPath?: string;
expectedPath?: string;
attempts: BenchmarkAttemptResult[];
}
export interface BenchmarkRunResult {
version: 1;
mode: EvalMode;
createdAt: string;
gitSha: string | null;
runs: number;
runModel: string | null;
judgeModel: string | null;
caseCount: number;
attemptCount: number;
passedAttempts: number;
passRate: number;
averageDurationMs: number;
totalTokenUsage?: BenchmarkTokenUsage | null;
averageTokenUsagePerAttempt?: BenchmarkTokenUsage | null;
artifactsPath?: string | null;
cases: BenchmarkCaseResult[];
}
export type FrontendBenchmarkProgressEvent =
| {
type: "run-start";
surface: Exclude<EvalMode, "cli">;
totalCases: number;
runs: number;
concurrency: number;
}
| {
type: "attempt-start";
surface: Exclude<EvalMode, "cli">;
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
}
| {
type: "attempt-finish";
surface: Exclude<EvalMode, "cli">;
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
passed: boolean;
durationMs: number;
judgeScore: number | null;
error: string | null;
}
| {
type: "assistant-message-start";
surface: Exclude<EvalMode, "cli">;
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
}
| {
type: "assistant-chunk";
surface: Exclude<EvalMode, "cli">;
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
chunk: string;
}
| {
type: "assistant-message-end";
surface: Exclude<EvalMode, "cli">;
caseId: string;
caseNumber: number;
totalCases: number;
attempt: number;
runs: number;
};

View File

@@ -0,0 +1,36 @@
import { describe, expect, it } from "bun:test";
import { validateScriptState } from "./validators";
describe("validateScriptState", () => {
it("accepts semantically equivalent script implementations", () => {
const checks = validateScriptState({
actual: {
path: "f/evals/greet_user.ts",
lang: "bun",
code: "export async function main(name: string): Promise<string> {\n return `Hello, ${name}!`;\n}\n",
},
expected: {
path: "f/evals/greet_user.ts",
lang: "bun",
code: "export async function main(name: string) {\n\treturn `Hello, ${name}!`\n}\n",
},
});
expect(checks.every((check) => check.passed)).toBe(true);
});
it("still requires an exported main entrypoint", () => {
const checks = validateScriptState({
actual: {
path: "f/evals/greet_user.ts",
lang: "bun",
code: "async function main(name: string) {\n return `Hello, ${name}!`;\n}\n",
},
});
expect(checks).toContainEqual({
name: "script exports entrypoint",
passed: false,
});
});
});

1281
ai_evals/core/validators.ts Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,2 @@
main(name: string)
greeting: `Hello, ${name}!`

View File

@@ -0,0 +1,3 @@
export async function main(name: string) {
return { greeting: `Hello, ${name}!` };
}

View File

@@ -0,0 +1,2 @@
type: script
path: f/lib/format_greeting

View File

@@ -0,0 +1,3 @@
export async function main(name: string) {
return { greeting: `Hello, ${name}!` };
}

View File

@@ -0,0 +1,2 @@
def main(
return {"total": a + b}

View File

@@ -0,0 +1,20 @@
summary: Simple greeting flow
schema:
type: object
properties:
name:
type: string
description: Name to greet
required:
- name
value:
modules:
- id: hello_step
value:
type: rawscript
language: bun
content: !inline hello.ts
input_transforms:
name:
type: javascript
expr: flow_input.name

View File

@@ -0,0 +1,3 @@
export async function main(name: string) {
return { greeting: `Hello, ${name}!` };
}

View File

@@ -0,0 +1,3 @@
export async function main(name: string) {
return { greeting: `Hello, ${name}!` };
}

View File

@@ -0,0 +1,3 @@
export async function main(name: string) {
return { greeting: `Hello, ${name}!` };
}

Some files were not shown because too many files have changed in this diff Show More