mcu(health): poll PD15 + dispatch ERROR_FPGA_DSP_STALL (AUDIT-S10 follow-up)

AUDIT-S10 (commit `58154a6`) split the FPGA's six-flag aggregate
gpio_dig5 into two MCU-visible bits: gpio_dig5 keeps signal-saturation
(AGC reacts), gpio_dig7 (PD15) carries control-fault classes
(range_decim_watchdog | cic_fir_overrun). Until now the MCU did NOT
poll PD15, so DSP control faults were invisible to the recovery
dispatcher.

Changes:

- New `ERROR_FPGA_DSP_STALL` enum value placed AFTER ERROR_WATCHDOG_TIMEOUT
  so the dispatcher routes to attemptErrorRecovery (FPGA reset pulse) not
  Emergency_Stop. Updated error_strings[] in lockstep (static_assert
  enforces).

- checkSystemHealth section 10 polls PD15 at 1 Hz with 2-sample debounce.
  `last_dsp_check` is committed BEFORE the early return per AUDIT-CAL
  pattern, so a flapping fault never bypasses the rate-limit. Streak
  counter resets to 0 after firing (armed for next post-recovery
  assertion) AND resets naturally when PD15 returns LOW.

- attemptErrorRecovery: ERROR_FPGA_DSP_STALL fans into the existing
  ERROR_FPGA_COMM PD12 reset case (stacked case labels, same body). No
  MCU-driven reset_monitors path exists; full bitstream reload clears
  all sticky monitors as a side effect.

Tests:
- tests/test_audit_s10_dsp_stall_polling.c (NEW, 7 scenarios, 7/7 PASS):
  T1 healthy 60s, T2 single-sample glitch blocked by debounce, T3
  sustained fault fires once, T4 post-fire rate-limit holds within
  window, T5 sustained fault rate bounded (29 errors / 60s -- MCU-N1
  latch at error_count>10 fires in ~22s, gives operator time to
  intervene), T6 counter-test demos no-debounce false-positive on
  glitch, T7 HAL_GetTick 32-bit wrap.
- MCU host suite 35/35 PASS (was 34/34; +1 new, 0 regressions).
This commit is contained in:
Jason
2026-04-29 23:42:21 +05:45
parent 853d2a5fd9
commit 534905263f
3 changed files with 313 additions and 10 deletions
@@ -75,6 +75,7 @@ TESTS_STANDALONE := test_bug12_pa_cal_loop_inverted \
test_audit_c17_bmp180_sentinel_and_cast \
test_audit_cal_bmp180_begin \
test_audit_imu_watchdog_cadence \
test_audit_s10_dsp_stall_polling \
test_gap3_iwdg_config \
test_gap3_temperature_max \
test_gap3_idq_periodic_reread \
@@ -195,6 +196,9 @@ test_audit_cal_bmp180_begin: test_audit_cal_bmp180_begin.c
test_audit_imu_watchdog_cadence: test_audit_imu_watchdog_cadence.c
$(CC) $(CFLAGS) $< -o $@
test_audit_s10_dsp_stall_polling: test_audit_s10_dsp_stall_polling.c
$(CC) $(CFLAGS) $< -o $@
# Gap-3 safety tests -- mock-only (needs spy log for GPIO sequence)
test_gap3_emergency_stop_rails: test_gap3_emergency_stop_rails.c $(MOCK_OBJS)
$(CC) $(CFLAGS) $(INCLUDES) $< $(MOCK_OBJS) -o $@