mcu(health): commit rate-limit window before early returns (AUDIT-CAL follow-up)

checkSystemHealth() had three watchdog blocks with the identical
"last_X_check not updated on error path" bug — same root cause as
AUDIT-CAL (BMP180 fix in commit 95aed35), distinct sites:

  AD9523 clock check   (5 s)  main.cpp:693-705
  ADAR1000 comm check  (2 s)  main.cpp:729-749
  IMU comm check       (10 s) main.cpp:752-760

Pre-fix, each block placed `last_X_check = HAL_GetTick();` below the
early-return path, so once the underlying check (STATUS0/1 RESET,
SCRATCHPAD verify fail, GY85_Update false) started failing, the
rate-limit window never engaged. Every subsequent iteration of the
main while(1) loop re-fired the corresponding ERROR_*. With
error_count > 10 latching system_emergency_state per MCU-N1, the
radar would trip into SAFE-MODE within ~10 main-loop iterations of
the first transient — far short of the intended ~100-150 s grace
window meant for operator intervention or attemptErrorRecovery
to succeed. ADAR1000 comm-failure also re-ran the 16 ms blocking
SPI verify (4 devices × 4 ms HAL_Delay) per iteration → chirp jitter.

Fix at all three sites: move the timestamp update INTO the if-block
and BEFORE any sub-check call. Mirrors the AUDIT-CAL post-fix
BMP180 block at main.cpp:771-780. ADAR1000 overtemp check stays
per-loop (unchanged) — over-temperature must remain responsive.

Test: tests/test_audit_imu_watchdog_cadence.c (6 tests, 6/6 PASS)
exercises the post-fix predicate against simulated HAL_GetTick()
ticks and a controllable GY85_Update() mock; counter-test runs the
pre-fix predicate to demonstrate the regression. Test uses IMU as
representative; AD9523 (5 s) and ADAR1000 (2 s) sites have identical
control flow.

Verification: full MCU host suite 34/34 PASS (was 33/33; +1 new test,
0 regressions).
This commit is contained in:
Jason
2026-04-29 20:57:50 +05:45
parent 1f307f77a9
commit 1b1b5f4fb2
3 changed files with 276 additions and 5 deletions
@@ -74,6 +74,7 @@ TESTS_STANDALONE := test_bug12_pa_cal_loop_inverted \
test_mcu_a4_ocxo_warm_restart \
test_audit_c17_bmp180_sentinel_and_cast \
test_audit_cal_bmp180_begin \
test_audit_imu_watchdog_cadence \
test_gap3_iwdg_config \
test_gap3_temperature_max \
test_gap3_idq_periodic_reread \
@@ -191,6 +192,9 @@ test_audit_c17_bmp180_sentinel_and_cast: test_audit_c17_bmp180_sentinel_and_cast
test_audit_cal_bmp180_begin: test_audit_cal_bmp180_begin.c
$(CC) $(CFLAGS) $< -o $@
test_audit_imu_watchdog_cadence: test_audit_imu_watchdog_cadence.c
$(CC) $(CFLAGS) $< -o $@
# Gap-3 safety tests -- mock-only (needs spy log for GPIO sequence)
test_gap3_emergency_stop_rails: test_gap3_emergency_stop_rails.c $(MOCK_OBJS)
$(CC) $(CFLAGS) $(INCLUDES) $< $(MOCK_OBJS) -o $@