When the Wizard and the Validator Disagree
Defense in Depth for Sony Eclipse Bracket Scripts
A user ran a corona bracket script on a Sony ZV-E10 using the libgphoto2 fallback path, and the safety guard that was supposed to catch unsafe scripts failed on four separate fronts. This is the postmortem.
TL;DR
A user ran a corona bracket script on a Sony ZV-E10 (libgphoto2 fallback) and hit four bugs in sequence. Bug 1: the UI capped brackets at 3 frames but the arm-time validator rejected ≥3, so the wizard let users build exactly what the runtime would refuse. Bug 2: the renderer snapshotted the camera backend once at init, so the Sony lazy bridge-swap was invisible to the UI — the gphoto2 bracket restriction wasn't shown at all. Bug 3: the wizard defaulted to "grouped" bracket mode on every open, meaning a user who toggled "sequential" last session had to re-toggle it every time — and a grouped script on gphoto2 collapses back into the exact burst pattern the driver crashes on. Bug 4: the first version of the executor's safety override was silent; pre-existing grouped scripts ran as sequential on Sony+gphoto2 with no log entry and no warning. All four are fixed in PR #116.
Background
The previous Sony post ended on what seemed like a solved problem: libgphoto2's Sony PC Remote driver terminates on back-to-back captures — fixed by guarding the burst-rate test path and surfacing a manual FPS entry instead. The IPC handler for camera:measureSlotFps now refuses to run the burst-rate test on a Sony+gphoto2 slot. The Equipment page hides the test button and shows a "enter FPS manually" prompt instead. Problem closed.
That guard protected the FPS measurement flow. What it didn't protect was the bracket execution flow — the actual sequence of captures fired during totality. The libgphoto2 Sony driver crashes the same way on N back-to-back gp_camera_capture calls during a corona bracket sequence. The burst-rate guard was the right fix for its specific code path. It didn't cover the code path that users run during an eclipse.
When a user ran a corona bracket script on a ZV-E10 and the bridge crashed mid-sequence, the arm-time validator did catch it — but only after the wizard had already let the user build the script and review the timeline. The error appeared at the moment of arming, not at the moment of choosing the wrong option. That latency is the shape of a deeper problem.
This post traces the four bugs that PR #116 fixed, each with its own failure mode and its own lesson about where constraints need to be enforced.
Bug 1: The UI Cap and the Runtime Validator Disagreed
The wizard's bracket-frame selector had a banner for Sony+gphoto2 slots: "Sony+gphoto2 limits brackets to 3 frames." The UI enforced this by capping the frame count at 3. Users could build a 3-frame corona bracket script, review the timeline, and click through to the Arm step.
The arm-time validator independently enforced a different rule: it rejected any totality sequence that contained ≥3 captures in rapid succession. Its error message read "reduce to 2 shots." So the wizard let you build exactly the configuration the runtime would refuse, and you didn't find out until you tried to arm.
The root cause is textbook: two layers each implementing the same constraint independently. They started at the same number and drifted. The fix was two-part. First, lower the cap to 2 (one under-exposed frame, one center frame — the safe minimum for corona bracketing). Second, pull the constraint logic into a shared module that both the arm-time validator and the wizard's Timeline Review step import from. The wizard's "Continue" button is now hard-blocked with the same error copy the validator would show — the user sees the problem at the step where they can still change their mind.
The lesson: when two layers enforce the same constraint, they must share a source of truth. Independent implementations of "the rule" silently drift. A shared validator module is the cheapest way to guarantee they stay aligned.
A bug I introduced fixing this one
The updated banner read: "Sony+gphoto2 limits brackets to 2 frames (one under/over pair, no center frame)." That's wrong. The bracket math is:
// computeBracketShutterSpeeds(base, 2, evStep) const half = Math.floor(2 / 2); // = 1 offsets = [0, 1].map(i => (i - half) * evStep) // i=0 → (0-1)*evStep = -evStep (under-exposed) // i=1 → (1-1)*evStep = 0 (center)
With 2 frames and half = floor(2/2) = 1, the offsets are -evStep and 0 — that's under + center, not under + over. The user's execution log confirmed it: L29 [1/2]=1/250 (under), L29 [2/2]=1/125 (center). The banner advertised a pair the code never produces.
Wrong copy in a safety banner is worse than no copy. It tells users the system is doing something it isn't, which erodes trust in every other thing the UI says. The banner now reads "limits brackets to 2 frames (one stop under, one at base exposure)." The phrasing derives directly from what the algorithm produces, not from a hand-written description of what we thought it produced.
Bug 2: Stale Backend Field in the Renderer
The wizard's Sony+gphoto2 restrictions — the banner, the frame cap, the bracket-mode restrictions — all depend on the renderer knowing that a given slot is running via libgphoto2. That information lives in a backend field on each camera slot, exposed through the useCameraStore Zustand store.
The store populated this field once, at init, via a single getAllSlotInfos() IPC call. The problem: SonyCameraService selects its bridge lazily — it tries CrSDK first, and only swaps to gphoto2 inside the getCameras() call when CrSDK returns zero devices. That swap happens after the initial getAllSlotInfos() snapshot.
So on a ZV-E10 (which CrSDK doesn't support), the renderer fetched backend: 'sony_remote' at init, the service swapped to gphoto2 in the background, and the renderer's store stayed on 'sony_remote' forever. The sonyGphoto2BracketCap flag computed from that field stayed false. The wizard showed no banner. The user couldn't see the restriction until they hit arm-time.
The fix is in scanSlot and connectSlot, which now re-fetch via a targeted camera:getSlotInfo IPC call and merge the live backend into the store:
// camera-store.ts — scanSlot (after connect/scan completes) const liveInfo = await window.api.invoke('camera:getSlotInfo', slotId); if (liveInfo) { set(state => { state.slots[slotId].backend = liveInfo.backend; state.slots[slotId].vendor = liveInfo.vendor; }); }
camera:getSlotInfo asks the main process for the current slot state — including whichever backend SonyCameraService settled on after its lazy swap. The renderer now always has the live backend, not a startup snapshot.
The lesson: reactive UI state must subscribe to the source of truth, not snapshot it at startup. Lazy bridge swaps are invisible to a snapshot-once architecture. Whenever a service can change a property after its initial connection, the store needs an explicit re-fetch path tied to the event that causes the change.
Bug 3: Wizard Default Resets to "Grouped" on Every Open
eclipseClick's bracket executor has two modes for the totality bracket sequence: grouped (fire all N bracket lines back-to-back as a continuous burst before moving to the next script line) and sequential (fire one capture per bracket line, interleave with other script lines). Grouped is the right default for Canon and Nikon — it minimizes gap time inside a bracket set. For Sony+gphoto2, grouped mode collapses N bracket lines into N rapid-fire gp_camera_capture calls on the same settings — exactly the burst pattern the libgphoto2 Sony driver crashes on.
The wizard defaulted totalityBracketMode to 'grouped' on every fresh open — meaning every new script opened in a fresh wizard session started in the crash mode. A user who had correctly toggled "Sequential" on a previous script had to re-toggle it for every new script. The user's reported expectation was clear: "I asked it not to group" — they had toggled it in a prior session, and the next script quietly reverted.
The arm-time burst validator catches single-line bracketSteps ≥ 3, which Bug 1 addressed. But the grouped executor path fuses bracket lines from different script lines together — a cross-line fusion the arm validator couldn't see, because the validator analyzes each script line in isolation.
The fix: for Sony+gphoto2 slots, the wizard auto-selects sequential and disables the Group toggle with an explanatory tooltip. The executor also force-overrides any pre-existing grouped metadata on a Sony+gphoto2 slot. The grouping check in the executor looks like this:
// bracket-executor.ts const scriptWantsGrouped = script.metadata.totalityBracketMode === 'grouped'; const sonyGphoto2 = service.getBackend() === 'gphoto2' && service.vendor === 'sony'; const groupingAllowed = scriptWantsGrouped && !sonyGphoto2;
Lesson 1: defaults that don't persist across sessions silently regress user intent. If a user has to re-set the same option every time they open the wizard, the default is wrong for them. Prefer persisting the last-used value; when that's not feasible, auto-select the safe value for the active hardware and surface the choice visibly.
Bug 4: The Fix to Bug 3 Had Its Own Bug
The first version of the executor override was silent. When groupingAllowed evaluated to false on Sony+gphoto2, the executor just ran sequentially — no log entry, nothing in the execution-log UI. The user might have a saved script with totalityBracketMode: 'grouped' from before the fix. That script would now run as sequential with no indication that anything changed. From the user's perspective: the script appears to work, but produces a capture pattern they didn't configure and can't explain.
The model reviewer flagged this during the PR: "any pre-existing script with grouped metadata runs as sequential on Sony+gphoto2 with no log, no warning, nothing in the execution-log UI. The user already burned a test cycle being confused about grouping." That's a trust problem. Users who observe that the system behaves differently from what they configured will start second-guessing every outcome.
The fix: the executor pushes a yellow warning to the execution log, and a FileLogger.info line, the first time the override fires per arm cycle:
[WARN] Script requested grouped bracket mode, but Sony+gphoto2 does not support back-to-back captures. Running as sequential instead. Update the script to avoid this warning.
Lesson 2: when you override a user's setting for safety, surface the override prominently. A silent correction is indistinguishable from a bug. A loud correction — even a yellow warning — tells the user exactly what happened and what to do about it. Invisible safety mechanisms erode predictability.
Defense in Depth as Architecture
After the four fixes, there are now three distinct layers preventing unsafe bracket scripts from running on Sony+gphoto2:
Layer 1: Wizard gate (UX, wizard Timeline Review step) ├ Checks: frame count ≥3, grouped mode on gphoto2 slot ├ Action: hard-blocks "Continue", shows error copy └ Catches: user error at the moment of authoring Layer 2: Arm-time validator (last-chance, ipc.ts) ├ Checks: bracketSteps ≥3 in any totality sequence line ├ Action: rejects arm() with a structured error └ Catches: scripts loaded from disk, API-created scripts Layer 3: Executor override (defense, bracket-executor.ts) ├ Checks: grouped metadata on a gphoto2 Sony slot ├ Action: forces sequential + emits yellow warning to execution log └ Catches: pre-existing scripts, race conditions in state
Each layer catches a different class of failure that the layers above it can miss. The wizard gate only runs for scripts built in the current session — it can't inspect a script file loaded from disk. The arm-time validator checks the final script object but works line-by-line; it can't see the cross-line fusion that grouped mode creates. The executor override is the last-chance catch: it runs against the live backend at execution time, regardless of how the script was produced.
None of these layers can be removed without re-introducing a specific bug class. The wizard gate is the cheapest fix — catch it before the user invests more time. The arm-time validator is the authoritative pre-flight check. The executor override is the safety net when the others are bypassed or absent. Together they form a complete defense.
Test Design: One Test That Wouldn't Pass
PR #116 adds 11 new tests across sony-gphoto2-arm-block.test.ts and slot-camera-store.test.ts. Most passed on the first run. One didn't, and it's worth explaining why.
The test for the executor warning message checked executor.logEntries after calling arm(). It passed when run in isolation. It failed when run after any other test in the file, because arm() clears logEntries at the start of each arm cycle. Any prior test that triggered the same executor path had already consumed and flushed the warning.
The fix was to capture warnings via the sendFn spy — the third constructor argument to the executor, which accumulates push notifications across arm cycles rather than clearing with each one. The test now asserts against the accumulated sendFn calls instead of the per-cycle log. This pattern also happens to be the correct way to test any "first-time-per-cycle" warning: the in-cycle log is ephemeral by design; the notification channel is the durable signal.
What This Didn't Fix
The four bugs above are all application-layer issues — wizard state, renderer store freshness, executor logic, logging. The underlying libgphoto2 bridge has its own open limitations that are separate work:
- Post-capture gate sized for fast cards. The bridge inserts a 500ms
SONY_POST_CAPTURE_GATE_USdelay between captures, sized for UHS-II V90 cards. Slower cards still silently drop captures — the gate is insufficient and there's no dynamic backpressure mechanism to extend it when the card is busy. - No capture confirmation.
gp_camera_trigger_captureis fire-and-forget. The bridge doesn't wait for aGP_EVENT_FILE_ADDEDevent before reporting success, so a capture that the camera acknowledges but silently fails to write will look successful in the execution log. - F-number set failures bypass the busy-retry loop. When the executor sets aperture between bracket steps, libgphoto2 sometimes returns
-2 GP_ERROR_BAD_PARAMETERSrather than-110 GP_ERROR_IO_USB_FIND. The busy-retry loop only handles-110;-2falls through without a retry and silently skips the property set.
These are bridge-side problems that require changes in nikon-camera-bridge-native/ (the gPhoto2Bridge executable). The application layer can't compensate for a bridge that drops captures silently or skips property sets — the fix has to happen at the protocol level. That work is tracked separately from this PR.