API hygiene. setADTR1107Mode flips ADTR1107 PA/LNA bias registers but
does NOT touch the per-channel ADAR1000 RX/TX enable bits. Production
always reaches it through setAllDevicesTXMode / setAllDevicesRXMode,
which emit both halves. Leaving setADTR1107Mode public after F-6.1
removed the other public mode-switch wrappers invited a future caller
to invoke it directly and end up in a mismatched bias-vs-enable state.
Move the declaration to the private section with a short comment
explaining why the wrappers are the only sanctioned entry point.
Latent bit-mask hygiene gap. setADTR1107Mode(TX) was asserting BIAS_EN
(bit 5) without first clearing LNA_BIAS_OUT_EN (bit 4); the RX branch
mirrored the bug. On any TX→RX→TX (or symmetric) transition through
this register both PA and LNA bias outputs would end up enabled
simultaneously. Production today only ever calls one direction at boot
and the opposite at shutdown — never both during normal operation —
so the bug was unreachable, but a future per-chirp SPI mode switch
would trip it.
Now each branch resetBit's the opposite enable before asserting its
own. 1 line per branch loop (not 1 per device — used the existing
for-dev loop).
Stage-6 ADTR1107 audit cleanup. Delete 4 unused public methods plus
their 2 internal helpers from ADAR1000_Manager.cpp/h. Production boot
goes through main.cpp's C-style systemPowerUpSequence(), so the C++
ADAR1000Manager::powerUpSystem / powerDownSystem / switchToTXMode /
switchToRXMode wrappers had zero call sites; the same was true of the
setPABias / setLNABias helpers, only ever invoked from the dead
switchTo* paths. -130 LOC, no behavioral change.
kPaBiasRxSafe is intentionally KEPT — it is live-used inside
setADTR1107Mode(RX) as the safe PA bias when transitioning to RX.
F-5.1: revert PWM scaffolding to binary DELADJ. Schematic-verified:
PG7/PG13 on STM32F746ZGT7 have no TIM3 alternate function (Port G AFs are
FMC/ETH/USART6/SAI2/SDMMC2 — no TIMx routes), and the FreqSynth-board
DELADJ net has only a 200 kOhm pulldown (R22, R35) — no series-R +
shunt-C LPF for PWM-to-DC. The 3979693 (Bug #5) + c466021 (B15) PWM
scaffolding was a false-fix; 5fbe97f's original honest TODO matched the
actual hardware. Delete htim3, MX_TIM3_Init, start/stop_deladj_pwm,
phase_ps_to_duty_cycle. Rewrite test_bug5 for binary; delete test_bug15.
F-5.2: split ADF4382A ref_div per device. RX 10.38 GHz / 300 MHz = 34.6 is
fractional mode, but ADF4382_PFD_FREQ_FRAC_MAX = 250 MHz — driver does
not reject the out-of-spec config, ldwin_pw silently left at 0. Set
rx_param.ref_div = 2 -> PFD = 150 MHz, in spec. TX unchanged (integer).
F-5.3: free prior tx_dev/rx_dev in Manager_Init before re-allocating. The
recovery dispatch on TX/RX unlock calls Manager_Init again; previous
adf4382_dev allocations were leaking. Mirrors F-4.5 fix for AD9523.
F-5.4: fix upstream adf4382_remove() — only freed dev struct on FAILED SPI
removal (success path leaked) and always returned 0. Now: NULL guard,
unconditional free, propagate ret.
F-5.8: lock-detect uses register reg[0x58] LOCKED bit as authoritative.
GPIO disagreement still logged via DIAG_WARN but no longer flips the
result — a mis-routed GPIO LKDET would otherwise trigger false-unlock
recovery loops.
F-5.10: drop stale "EZSYNC" diagnostic string (post-C-14a residue).
Bench-side checks for first power-on:
- Scope PG13 (TX_DELADJ) and PG7 (RX_DELADJ) — both should be HIGH (3.3V)
after SetPhaseShift(500,500) runs at boot.
- Confirm both ADF4382A LOs lock with PFD=150 MHz on RX (was 300 MHz).
Lock-time may be slightly longer; phase-noise sidebands shift.
- Confirm no false-unlock storms on the recovery path — the GPIO LKDET
disagreement DIAG_WARN should no longer flip the lock decision.
Regression: tests/ make test 34/34 PASS (was 35/35 baseline; -1 from
test_bug15 deletion as planned).
The F-4.1+4.2+4.7 patch (ddc0df4) made ad9523_init() run before the
user pdata overrides, which means pll1_bypass_en=0 (the previous
override) is now actually honoured by the driver. Combined with the
fact that pll1_charge_pump_current_nA and pll1_feedback_div were
never set in main.cpp, PLL1 would be expected active but couldn't
lock (CP=0) — ad9523_status() with bypass_en=0 checks PLL1+REFA+REFB
bits, so the failure surfaces, returns -1, and configure_ad9523()
halts boot at main.cpp:1742.
Option A: set pll1_bypass_en=1. VCXO free-runs on its own crystal
stability; ad9523_status() skips PLL1 checks. Boot path is now
clean. Trade-off: VCXO frequency drifts with temperature (~±20 ppm
over -40°C..+85°C for typical XO) — acceptable for first-flight
checkout, but eventual production should re-enable PLL1 (Option B,
deferred to F-4.3/4.4 with measured loop-filter values).
Comment notes the deferral and what's needed before flipping to
bypass=0 (CP current + loop filter rzero tuned to VCXO Kvco).
Regression: 86/0.
F-4.5: ad9523_setup() malloc's both an ad9523_dev and a no_os SPI
descriptor (ad9523.c:430,435). Previously the dev pointer was local
to configure_ad9523() and fell out of scope on return — every
recovery cycle (ERROR_AD9523_CLOCK → re-run configure_ad9523) leaked
one struct + one SPI desc. STM32F7 heap is bounded; sustained
brown-out flapping would eventually exhaust it. Move dev to a file-
scope `g_ad9523_dev` and call ad9523_remove() at the top of
configure_ad9523() to free the previous instance before re-setup.
Initial boot path is unaffected (g_ad9523_dev=NULL → remove call
gated by NULL check).
F-4.6: ad9523_setup() called ad9523_calibrate() but discarded its
return value (ad9523.c:707). VCO calibration can fail silently — if
the target VCO is outside the 3.6-4.0 GHz band (e.g. F-4.1 wipe left
PLL2 N=16, target 1.6 GHz), calibrate would report failure but setup
still proceeded to ad9523_status(), where PLL2_LD might pass
spuriously. Capture and propagate the calibrate return so a failed
calibration aborts setup with a clear non-zero status code instead
of being absorbed.
Both fixes are mechanical and don't change correct-path behaviour.
Regression: 86/0 (mocks bypass real driver, so F-4.6 is not covered
by tests; F-4.5 changes are in main.cpp and don't trip mocked
configure_ad9523).
Three coupled bugs in configure_ad9523() that together prevented the
AD9523 from producing the labelled output frequencies:
F-4.1: ad9523_init() unconditionally overwrites every field in the
caller's pdata (vcxo_freq=0, pll1_bypass_en=1, pll2_ndiv_b_cnt=4,
all channel fields). Calling it AFTER customization wiped every user
value. Reorder: call ad9523_init() before the pdata.X = Y block; user
overrides land on top of ADI defaults instead of being wiped.
F-4.2: pll2_vco_diff_m1 / m2 are required (range 3..5 per datasheet)
but were left at 0 from memset. The driver's AD_IFE() macro promotes
m=0 to M_PWR_DOWN_EN, killing channels 4-9 (ADC, SYNC, FPGA system
clock, DAC). Set m1=m2=3 explicitly.
F-4.7: AD9523 has no VCO-direct path for OUT4-OUT9; channels source
M1 or M2 only (datasheet + ad9523_vco_out_map register definitions
confirmed). With VCO 3.6 GHz and m1=3, channel dividers see 1.2 GHz,
not 3.6 GHz — every channel_divider in main.cpp was 3x too large.
Updated values:
OUT0/1 (ADF4382A REF, 300 MHz): /12 -> /4
OUT4/5 (ADC + FPGA_ADC, 400 MHz): /9 -> /3
OUT6 (FPGA SYSCLK, 100 MHz): /36 -> /12
OUT7 (FPGA TEST, 20 MHz): /180 -> /60
OUT8/9 (SYNC, 60 MHz): /60 -> /20
OUT10/11 (DAC, 120 MHz): /30 -> /10
m1=3 is the unique choice for this labelled frequency set (m1=4 fails
OUT4, m1=5 fails OUT0/1).
PLL1 (F-4.3/4.4) is not addressed here — pll1_bypass_en=0 with
pll1_charge_pump_current_nA still 0 means PLL1 won't lock and status()
will report it. Decide bypass strategy before bench.
Test mocks (ad_driver_mock.c) bypass the real driver, so this is not
caught by make. Regression: 86/0 (unchanged).
Bench-verify OUT4=400MHz and OUT6=100MHz with scope before trusting
downstream. F-1.10 (which crystal is fitted on X5/X6) goes in the
same bench session — F-4.7 resolution shows 100 MHz VCXO is the only
math-coherent choice regardless of BOM document.
Production firmware never used SYNC_METHOD_EZSYNC — both callsites
(main.cpp:938 recovery, main.cpp:1955 boot) pass SYNC_METHOD_TIMED.
The original audit C-14 flagged TX/RX SPI skew in EZSync's trigger
sequence, but the path was dead from production; only test_bug3
referenced it for spy-harness regression coverage.
Removed:
- SYNC_METHOD_EZSYNC enum value
- ADF4382A_SetupEZSync function (and declaration)
- ADF4382A_TriggerEZSync function (and declaration)
- EZSync branch in ADF4382A_Manager_Init (collapsed to unconditional
SetupTimedSync call)
- test_bug3_timed_sync_noop.c Test C (EZSync regression coverage)
Production header and test shim header both cleaned. SyncMethod enum
kept as single-value to avoid touching the 7 other test callers that
pass SYNC_METHOD_TIMED.
Residual concern (separate from original C-14): ADF4382A_TriggerTimedSync
uses the same TX-then-RX sw_sync SPI sequencing pattern as the deleted
EZSync trigger. ~5 µs SPI gap between TX-armed and RX-armed means TX
and RX may capture different SYNCP/SYNCN edges (60 MHz cycle = 16.7 ns,
~300 edges in the gap). External SYNCP only provides simultaneity if
both devices are armed before a common edge. Hardware bench-test
required to confirm operational tolerance; cannot fix in firmware
without DMA SPI burst rewrite.
Regression: 86/0 (matches baseline).
main.cpp pre-PR-F constants caused two issues:
- m_max = 32 disagreed with RP_CHIRPS_PER_FRAME = 48 (3 sub-frames * 16);
getStatusString reported "32 chirps/position" to the GUI, false telemetry.
- PRI MEDIUM = 161 us (PR-Q.1 stagger) was missing entirely; the MCU only
knew SHORT=175 / LONG=167. T2 was also stuck at the pre-PR-E 0.5 us
SHORT chirp width; PR-E switched to 1.0 us.
Fixes:
- m_max 32 -> 48; T2 0.5 -> 1.0; new T_MEDIUM=5.0, PRI_MEDIUM=161.0 constants.
- Big doc-comment above runRadarPulseSequence states the production stance:
FPGA cold-resets to mode 2'b01 (auto-scan) so the MCU's chirp GPIO toggles
are no-ops; pass-through mode 2'b00 needs a 3-PRI loop the MCU does not
yet emit, so mode-00 is operationally unsupported until that's built.
- Removed the redundant /* */ block-comment shadow of the same constants
that had `T2` defined twice (typo for `PRI2`); pure dead-code cleanup.
- test_bug16_runradar_shadows_globals.c m_max 32 -> 48 with refreshed
arithmetic comment; binary still PASSes all 4 checks (g_m wraps to 1
each iter regardless of m_max value).
No GPIO timing change (would need hardware verification). Audit P-5 closes
with the documented mode-01 stance; rebuilding the loop for mode-00 stays
on the backlog if/when pass-through becomes a deployment requirement.
AUDIT-S10 (commit `58154a6`) split the FPGA's six-flag aggregate
gpio_dig5 into two MCU-visible bits: gpio_dig5 keeps signal-saturation
(AGC reacts), gpio_dig7 (PD15) carries control-fault classes
(range_decim_watchdog | cic_fir_overrun). Until now the MCU did NOT
poll PD15, so DSP control faults were invisible to the recovery
dispatcher.
Changes:
- New `ERROR_FPGA_DSP_STALL` enum value placed AFTER ERROR_WATCHDOG_TIMEOUT
so the dispatcher routes to attemptErrorRecovery (FPGA reset pulse) not
Emergency_Stop. Updated error_strings[] in lockstep (static_assert
enforces).
- checkSystemHealth section 10 polls PD15 at 1 Hz with 2-sample debounce.
`last_dsp_check` is committed BEFORE the early return per AUDIT-CAL
pattern, so a flapping fault never bypasses the rate-limit. Streak
counter resets to 0 after firing (armed for next post-recovery
assertion) AND resets naturally when PD15 returns LOW.
- attemptErrorRecovery: ERROR_FPGA_DSP_STALL fans into the existing
ERROR_FPGA_COMM PD12 reset case (stacked case labels, same body). No
MCU-driven reset_monitors path exists; full bitstream reload clears
all sticky monitors as a side effect.
Tests:
- tests/test_audit_s10_dsp_stall_polling.c (NEW, 7 scenarios, 7/7 PASS):
T1 healthy 60s, T2 single-sample glitch blocked by debounce, T3
sustained fault fires once, T4 post-fire rate-limit holds within
window, T5 sustained fault rate bounded (29 errors / 60s -- MCU-N1
latch at error_count>10 fires in ~22s, gives operator time to
intervene), T6 counter-test demos no-debounce false-positive on
glitch, T7 HAL_GetTick 32-bit wrap.
- MCU host suite 35/35 PASS (was 34/34; +1 new, 0 regressions).
checkSystemHealth() had three watchdog blocks with the identical
"last_X_check not updated on error path" bug — same root cause as
AUDIT-CAL (BMP180 fix in commit 95aed35), distinct sites:
AD9523 clock check (5 s) main.cpp:693-705
ADAR1000 comm check (2 s) main.cpp:729-749
IMU comm check (10 s) main.cpp:752-760
Pre-fix, each block placed `last_X_check = HAL_GetTick();` below the
early-return path, so once the underlying check (STATUS0/1 RESET,
SCRATCHPAD verify fail, GY85_Update false) started failing, the
rate-limit window never engaged. Every subsequent iteration of the
main while(1) loop re-fired the corresponding ERROR_*. With
error_count > 10 latching system_emergency_state per MCU-N1, the
radar would trip into SAFE-MODE within ~10 main-loop iterations of
the first transient — far short of the intended ~100-150 s grace
window meant for operator intervention or attemptErrorRecovery
to succeed. ADAR1000 comm-failure also re-ran the 16 ms blocking
SPI verify (4 devices × 4 ms HAL_Delay) per iteration → chirp jitter.
Fix at all three sites: move the timestamp update INTO the if-block
and BEFORE any sub-check call. Mirrors the AUDIT-CAL post-fix
BMP180 block at main.cpp:771-780. ADAR1000 overtemp check stays
per-loop (unchanged) — over-temperature must remain responsive.
Test: tests/test_audit_imu_watchdog_cadence.c (6 tests, 6/6 PASS)
exercises the post-fix predicate against simulated HAL_GetTick()
ticks and a controllable GY85_Update() mock; counter-test runs the
pre-fix predicate to demonstrate the regression. Test uses IMU as
representative; AD9523 (5 s) and ADAR1000 (2 s) sites have identical
control flow.
Verification: full MCU host suite 34/34 PASS (was 33/33; +1 new test,
0 regressions).
The BMP180 driver had no public init method and never called
readCalibrationCoefficients() from anywhere -- _calCoeff ran at the
C++ in-class member-initializer defaults (all zeros) at runtime.
Consequence chain:
- computeB5(UT) short-circuited via 0/0 (Cortex-M7 SDIV with
SCB->CCR.DIV_0_TRP=0 returns 0 silently -- system_stm32f7xx.c does
not enable the trap)
- getPressure() always tripped the `if (B4 == 0)` guard, returning
the I2C-error sentinel (post-AUDIT-C17: INT32_MIN; pre-: 255)
- health watchdog at main.cpp:758 fired ERROR_BMP180_COMM every
main-loop iteration because last_bmp_check was only updated on the
success path, so the 15 s rate-limit never engaged once the check
started failing
- error_count > 10 latched system_emergency_state = true (per the
MCU-N1 fix), driving SAFE-MODE within ~25 s of every boot
Fix:
- Added BMP180::begin() public method: probes chip ID, then reads the
11 factory cal coefficients (registers 0xAA..0xBE step 2). Returns
true only on full success; false on chip-ID mismatch or any I2C
failure mid-loop.
- main.cpp BAROMETER INIT calls myBMP.begin() with up to 3 retries
(50 ms backoff) and sets a file-scope bmp180_operational flag.
Altitude-baseline loop now gated on success -- failure path leaves
RADAR_Altitude at 0.0f instead of letting pow(negative, fractional)
propagate NaN into gps_data telemetry.
- Health watchdog gates BMP180 check on bmp180_operational AND
updates last_bmp_check regardless of the error path. A single bad
pressure reading no longer tight-loops into SAFE-MODE; legit sensor
failure now takes the intended ~150 s (10 errors x 15 s) before
the MCU-N1 latch trips, giving the operator time to intervene.
Verification:
- new test_audit_cal_bmp180_begin.c, 3/3 PASS:
T1 every coefficient loaded in order with correct signed/unsigned types
T2 chip-mismatch and I2C-fail short-circuit semantics correct
T3 regression demo: zero-cal computeB5 returns 0 for any UT (the
silent-fail mode); datasheet cal reproduces 15.0 C
- full MCU regression 33/33 PASS (was 32/32; +1 new test, 0 regressions)
Bug introduced in 5fbe97f (initial upload of the driver from the
Arduino enjoyneering79 BMP180 library -- the begin()/init pattern from
the upstream Arduino version was lost in the STM32 port). Latent until
this audit cycle.
BMP180_ERROR=255 was an in-band sentinel returned by uint16_t I/O helpers
(read16, readRawTemperature) on I2C failure. 255 is also a valid uint16
register reading (0x00FF appears across the calibration block and is
reachable as a raw temperature/pressure sample), so a sensor failure was
indistinguishable from a real reading.
getTemperature() additionally narrowed the uint16_t raw read to int16_t
before passing to computeB5(). Raw bit-patterns >= 0x8000 (reachable across
the BMP180 -40..+85 C operating window) flipped to negative int16_t and
sign-extended into computeB5(), producing temperature errors of order
100s of C (e.g. -347 C instead of +51 C for raw UT = 0x8000).
Fix:
- Internal I/O helpers (read8/read16/readRawTemperature/readRawPressure)
now return bool and pass the value through an out-param. None of the
new sentinels collide with valid sensor output:
* getTemperature -> NaN on error
* getPressure -> INT32_MIN on error
* getSeaLevelPressure -> INT32_MIN on error
- getTemperature() keeps raw as uint16_t and widens value-preservingly
via (int32_t)raw before computeB5().
- readRawPressure() reads XLSB through the bool-out-param contract;
previously OR'd in 0xFF on I2C fail, silently corrupting the LSB.
Verification: test_audit_c17_bmp180_sentinel_and_cast 4/4 PASS, including
datasheet UT=27898 -> 15.0 C reproduction and 64/64 finite outputs across
a full uint16 sweep (vs 32/32 collapses in the upper half under the buggy
narrowing). Full MCU regression 32/32 PASS.
Caller-side: no external code references BMP180_ERROR; main.cpp's existing
range check at the health-watchdog catches INT32_MIN via the < 30000.0
branch.
Every boot waited the full 180 s OCXO warmup soak — even an
IWDG/SYSRESETREQ reset that takes seconds and leaves the OCXO oven hot
lost three minutes of bringup time.
Added BKPSRAM slot 3 (magic 0xCA1C1F1E) with warmup_persist_set/check
helpers next to the existing MCU-A2/A7 BKPSRAM block. Cold-boot path
now arms the flag at the end of the full 180 s soak; subsequent boots
that find the flag still set know the OCXO oven is still hot and the
crystal is settled, so they wait 5 s and move on. Power-cycle clears
BKPSRAM and forces the full soak again — safe default, operator can't
accidentally skip the warmup by yanking and re-applying power.
Added test_mcu_a4_ocxo_warm_restart (7 cases): cold boot soaks 180 s
and sets the flag; warm reset is 5 s; 5 consecutive warm resets stay
fast; power-cycle restores the cold path; cold-after-power-cycle
re-arms the bypass; pre-fix regression confirms 10 warm restarts save
1750 s vs the old always-180-s path. MCU regression now 82/82.
The magnetometer yaw correction used a hardcoded -0.61 deg literal baked
in for one deployment site. Yaw_Sensor was wrong by (site_decl + 0.61)
deg at every other site whenever the UM982 dual-antenna heading was
unavailable.
Backed the value with BKPSRAM (slots 1+2 — slot 0 is the MCU-A7
emergency flag) and exposed set_mag_declination_deg / get_mag_declination_deg.
Default returns the legacy -0.61 deg when no override has been written so
the original site stays correct out of the box; a host command (or
future GPS-derived auto-calibration) writes the new site value once and
it persists across every reset path until main-power removal.
Hardened with a +/-30 deg range clamp on both write AND read paths — real
magnetic declinations are roughly +/-25 deg worldwide, so a wider value
indicates a calibration error or BKPSRAM corruption (VBAT brown-out, bit
flip) rather than a legitimate site. Defensive read-side clamp prevents
a corrupted slot from propagating a wild heading offset.
Replaced the single use site at the magnetometer yaw computation with
the getter; legacy global Mag_Declination retained and kept in sync by
the setter for any external linkage.
Added test_mcu_a2_mag_declination (10 cases): default, set/get,
persistence across reset, power-cycle clear, write-side clamp both
directions, plausible-site passthrough, defensive read-side clamp on
corruption, wrong-magic fallback, pre-fix bearing-error regression.
MCU regression now 81/81.
attemptErrorRecovery() previously fell through to the default log-only
branch for both ERROR_AD9523_CLOCK and ERROR_FPGA_COMM. checkSystemHealth
keeps re-firing the same error every pass with no recovery action ever
attempted, so the system limps along until escalation kicks in.
ERROR_AD9523_CLOCK: AD9523_RESET_ASSERT, 10 ms settle, then re-run
configure_ad9523() (releases reset, selects REFB, reprograms, waits for
lock). On second failure we log and let the next health pass re-fire so
a transient brown-out on the 100 MHz reference does not drop straight
into Emergency_Stop.
ERROR_FPGA_COMM: pulse PD12 LOW->10 ms->HIGH (matches the boot reset
pattern). PA rails left untouched at runtime; brief adar_tr_x undefined
window is acceptable vs. losing the radar entirely.
Added test_mcu_a6_recovery_dispatch (11 cases) covering both new
handlers, all existing routes, the default branch, a pre-fix regression
check, and an explicit assertion that RF_PA_OVERCURRENT escalates
upstream (handleSystemError) rather than recovering inline. MCU
regression now 80/80.
The boot-time Idq calibration walks DAC_val from 126 down toward the
1.680 A target. Mid-walk readings sit well above the 2.5 A overcurrent
threshold by design, and a channel that hits the safety_counter timeout
(50 iters) can be left above the window. Without a gate, the next
checkSystemHealth() pass would trip ERROR_RF_PA_OVERCURRENT and route
straight into Emergency_Stop, killing the system mid-bringup.
Added a `pa_calibration_in_progress` flag set TRUE around both DAC1 and
DAC2 cal walks. checkSystemHealth's Idq window short-circuits while the
flag is set; bias-fault and overcurrent thresholds remain fully active
once the walk completes, so any genuinely stuck-high channel surfaces on
the very next health pass and routes through the normal handler.
Other health checks (lock, comm, temperature, watchdog) stay live during
cal — no behavioural change to anything except the Idq window.
Added test_mcu_a5_pa_cal_gate (7 cases): mid-walk masking, post-cal
re-arming, stuck-high channel surfacing after gate clears, bias-fault
gating, PowerAmplifier=false short-circuit, and a pre-fix regression
case showing the buggy path would have tripped overcurrent mid-walk.
MCU regression now 79/79.
Emergency_Stop's hold loop refreshed IWDG forever, so any reset path that
DID fire (SYSRESETREQ from another fault, brown-out) would re-run
startup and re-energize the PA rails — there was no record that the
system had been in emergency state. Watchdog defeat in the hold loop
masked the problem.
BKPSRAM gives us a flag that survives every reset path but is lost on
main-power removal — exactly the recovery semantics we want:
power-cycle is the deliberate operator action that clears emergency,
every other reset stays in safe-hold.
- Added emergency_persist_set/check helpers (BKPSRAM @ 0x40024000,
magic 0xDEAD5A5A); enable PWR + backup-access + BKPSRAM clock.
- Emergency_Stop now writes the flag BEFORE the rail-cut sequence so
even an interrupted shutdown still leaves the persisted state set.
- main() checks the flag immediately after MX_IWDG_Init and before
any PA enable code; if set, calls Emergency_Stop directly. GPIO
init has already forced all PA enables LOW, so the safe-hold path
is reached without a single PA rail going hot.
Hold-loop IWDG refresh kept intentionally: a healthy hold loop does not
need to cycle the MCU, but if the loop itself wedges (stack corruption,
bus fault), refresh stops, IWDG fires, and the persist flag routes the
reset right back into safe-hold.
Added test_mcu_a7_emergency_persist (6 cases) modelling BKPSRAM
persistence vs power-cycle, including a regression check that exercises
the pre-fix "no persistence" boot to confirm it would have re-energized
the PAs. MCU regression now 78/78.
Cooling-fan trip in main.cpp's periodic temperature block was a 25 C dev
stub that latched the fan ON at room temperature on every boot. Replaced
with production thermal control: ON at 70 C, OFF at 60 C. The 10 C
dead-band prevents relay/fan chatter near the threshold; the 70 C ON
point sits below the 75 C SAFE-mode gate in checkSystemHealth() so the
fan engages before the system shuts down.
Driven from the existing `temperature` global (max of 8 sensors,
populated just above by the GAP-3 fix) instead of re-OR'ing the eight
Temperature_N variables — single source of truth, and the diag now
prints the actual peak temperature on each transition.
Added test_mcu_a1_cooling_hysteresis (9 cases) covering cold-start,
upward crossing, dead-band hold, downward crossing, and a regression
guard at 30 C that would have engaged the fan under the old stub.
MCU regression now 77/77.
runRadarPulseSequence was redeclaring `int m, n, y` at function scope,
which shadowed the file-scope `uint8_t m, n, y` globals at lines
~190-192 that getStatusString reports to the GUI as
BeamPos|Azimuth|ChirpCount. The function's increments updated only the
locals, then discarded them — so telemetry was permanently frozen at
"BeamPos:1|Azimuth:1|ChirpCount:1" no matter how many beam positions
or revolutions had elapsed.
Fix: drop the three local declarations; the body already references
m/n/y by name, so removing the locals lets the writes hit the globals.
A comment documents the pitfall so the locals do not get re-added by
a future cleanup. Numeric ranges are safe (m_max=32, n_max=31,
y_max=50, all fit in uint8_t).
Test: new standalone test_bug16_runradar_shadows_globals.c reproduces
both the buggy (locals shadow globals) and fixed (globals advance)
patterns and asserts the expected post-sweep values
(g_n=16, g_m=1 wraps each iter, g_y=2 after one revolution).
MCU regression: 76/76 (was 75).
MCU-N4: delay_us(us) reset TIM1 then waited for the counter to reach `us`,
but TIM1 ARR is 0xffff-1 (~65 ms at the 1 MHz tick). Any caller passing
us > 65534 spun forever after the first wrap — a real hazard with the PA
energized. Chunk requests larger than ARR into ARR-sized waits, then the
remainder in the existing single wait. Current callers (T1, PRI1-T1,
Guard, 500us spots) are all well under the bound; this is defensive.
GUI-S4: radar_protocol.STREAM_CONTROL was annotated "3-bit stream enable
mask"; the FPGA accepts usb_cmd_value[5:0] = 6 bits. The wire protocol
already carried the full 32-bit value field, so the upper bits were
reachable via Custom Command — only the comment was wrong. Updated to
match radar_system_top.v:1004.
Verified: 75/75 MCU tests pass; 83/83 v7 GUI tests pass (covered by GUI-C3 commit).
FPGA — RX chain
matched_filter_multi_segment.v: drop the gratuitous /4 scaling on
DDC sign-extended input (was ddc_i[17:2] + ddc_i[1]); use
ddc_i[15:0] directly. fft_engine has INTERNAL_W=32 with
saturating 16-bit output, so full 16-bit input is safe. Restores
~12 dB of MF input dynamic range.
radar_receiver_final.v: remove latency_buffer (count-N-pulses-then-
prime FIFO that left frame 1 with all-zero ref). Replaced with
a single-FF alignment register on ref_i/ref_q that matches the
1-FF stage multi_segment ST_PROCESSING uses on adc_data.
Verified by tb/tb_rxb_fullchain_latency.v — autocorrelation peak
at bin 0 with peak/mean ~88x.
doppler_processor.v / mti_canceller.v / cfar_ca.v /
range_bin_decimator.v / radar_receiver_final.v / radar_system_top.v
/ usb_data_interface_ft2232h.v: switch port and parameter widths
from RP_NUM_RANGE_BINS / RP_RANGE_BIN_BITS (always 512 / 9-bit)
to RP_MAX_OUTPUT_BINS / RP_RANGE_BIN_WIDTH_MAX (auto-scales:
50T 512 / 9-bit, 200T 4096 / 12-bit). Unblocks 200T 20 km mode
at the RX module boundary; USB wire-protocol extension still
pending.
radar_receiver_final.v: doppler_frame_done_prev reset value 0 -> 1
to prevent false done pulse on cycle 1 when level signal is
HIGH at reset.
matched_filter_processing_chain.v: delete the broken `ifdef
SIMULATION inline behavioural FFT (482 lines removed). It
produced wrong-bin peaks and 100-1000x weak magnitudes. Chain
now uses production fft_engine.v + frequency_matched_filter.v
in both iverilog and Vivado. Iverilog tests are ~38x slower per
chain pass but produce correct results. Misleading "OK with
Xilinx IP" comments at three test sites updated since the FFT
is in-house, not an IP placeholder.
FPGA — testbenches
tb/tb_rxb_latency_measure.v (new): measures chain internal pipeline
depth (~2057 cycles, chirp-agnostic).
tb/tb_rxb_fullchain_latency.v (new): full-chain autocorrelation
verification — drives ddc with the same chirp samples the loader
serves as ref, finds peak position and peak/mean.
tb/tb_matched_filter_processing_chain.v: wait timeouts bumped
50000 -> 500000 cycles to accommodate production FFT pipeline.
MCU
main.cpp checkSystemHealthStatus: latch system_emergency_state on
the error_count > 10 path so the SAFE-MODE blink loop in main()
actually engages (was bypassed because predicate was false).
main.cpp: move FPGA reset BEFORE the if(PowerAmplifier) block so
adar_tr_x is driven LOW (RX commanded externally) before PA Vdd
reaches 22 V. Old reset block at the original location removed.
main.cpp MX_GPIO_Init: add GPIO_PIN_12 (FPGA reset) to the
explicit WritePin(LOW) list so the safe initial state is no
longer implicit.
main.cpp checkSystemHealth: rate-limit ADAR1000
verifyDeviceCommunication (HAL_Delay 1ms x 4 devices = 4 ms
blocking SPI burst per main-loop iteration) from every-loop to
every 2 s. readTemperature stays per-loop so over-temp
detection latency is unchanged.
USBHandler.cpp processSettingsData: dispatch threshold bumped
74 -> 82 (matches parser minimum); buffer drained after parse
attempt (slide remaining bytes left) so a false END find no
longer sticks the buffer until 256-byte overflow.
GUI
radar_protocol.py: NUM_RANGE_BINS 64 -> 512 (matches FPGA
RP_NUM_RANGE_BINS); NUM_CELLS 2048 -> 16384.
radar_protocol.py _ingest_sample: honor FPGA frame_start bit for
resync after a USB drop; capture range_profile[rbin] once per
range bin at dbin == 0 (FPGA emits the same range_i/range_q for
all 32 Doppler cells of a given range bin; previous accumulator
inflated the profile 32x).
v7/models.py RadarSettings: range_resolution 24 -> 6 m (matches
c/(2*100MHz)*4); max_distance and coverage_radius 1536 -> 3072 m;
map_size 2000 -> 4000.
v7/models.py WaveformConfig: n_range_bins 64 -> 512, fft_size
1024 -> 2048, decimation_factor 16 -> 4.
GUI_V65_Tk.py: _RANGE_PER_BIN math and stale "~24 m / ~1536 m"
comments updated.
test_v7.py: assertion values updated to match new defaults.
Tests
test_ddc_cosim_fuzz.py: remove unused os/tempfile imports, wrap
three long lines for ruff E501 compliance.
Per-chirp T/R switching is owned by the FPGA plfm_chirp_controller
driving adar_tr_x pins (TR_SOURCE=1 in REG_SW_CONTROL, already set by
initializeSingleDevice). The MCU's SPI RMW path via fastTXMode/
fastRXMode/pulseTXMode/pulseRXMode/setADTR1107Control was:
(a) architecturally redundant — raced the FPGA-driven TR line,
(b) toggled the wrong bit (TR_SOURCE instead of TR_SPI),
(c) in setFastSwitchMode(true) bundled a datasheet-violating
PA+LNA-simultaneously-biased side effect.
Removed methods and their backing state (fast_switch_mode_,
switch_settling_time_us_). Call sites in executeChirpSequence /
runRadarPulseSequence updated to rely on the FPGA chirp FSM (GPIOD_8
new_chirp trigger unchanged).
Tests: adds CMSIS-Core DWT/CoreDebug/SystemCoreClock stubs to
stm32_hal_mock so F-4.7's DWT-based delayUs() compiles under the host
mock build. SystemCoreClock=0 makes the busy-wait exit immediately.
The `broadcast=1` path on adarWrite() emitted the 0x08 broadcast opcode
but setChipSelect() only asserts one device's CS line, so only the single
selected chip ever saw the frame. The opcode path has also never been
validated on silicon. Until a HIL test confirms multi-CS semantics, route
broadcast=1 through a unicast loop over all devices so caller intent
(all four take the write) is preserved and the dead opcode path becomes
unreachable. Logs a DIAG_WARN on entry for visibility.
Addresses the remaining actionable items from
docs/DEVELOP_AUDIT_2026-04-19.md after commit 3f47d1e.
XDC (dead waivers — F-0.4, F-0.5, F-0.6, F-0.7):
- ft_clkout_IBUF CLOCK_DEDICATED_ROUTE now uses hierarchical filter;
flat net name did not exist post-synth.
- reset_sync_reg[*] false-path rewritten to walk hierarchy and filter
on CLR/PRE pins.
- adc_clk_mmcm.xdc ft601_clk_in references replaced with foreach-loop
over real USB clock names, gated on -quiet existence.
- MMCM LOCKED waiver uses REF_PIN_NAME filter instead of the
previously-missing u_core/ literal path.
CDC (F-1.1, F-1.2, F-1.3):
- Documented the quasi-static-bus stability invariant above the
FT601 cmd_valid toggle block.
- cdc_adc_to_processing gains an `overrun` output; the two CIC->FIR
instances feed a sticky cdc_cic_fir_overrun flag surfaced on
gpio_dig5 so silent sample drops become visible to the MCU.
- Removed the dead mixers_enable synchronizer in ddc_400m.v; the _sync
output was unused and every caller ties the port to 1'b1.
Diagnostics (F-6.4):
- range_bin_decimator watchdog_timeout plumbed through receiver
and top-level, OR'd into gpio_dig5.
ADAR (F-4.7):
- delayUs() replaced with DWT cycle counter; self-initialising
TRCENA/CYCCNTENA, overflow-safe unsigned subtraction.
Regression: tb_cdc_modules.v 57/57 passes under iverilog after
the cdc_modules.v change. Remote Vivado verification in progress.
Addresses findings from docs/DEVELOP_AUDIT_2026-04-19.md:
P0 source-level:
- F-4.3 ADAR1000_Manager::adarSetTxPhase now writes REG_LOAD_WORKING
with LD_WRK_REGS_LDTX_OVERRIDE (0x02) instead of 0x01. Previous value
toggled the LDRX latch on a TX-phase write, so host TX phase updates
never reached the working registers.
- F-6.1 DDC mixer_saturation / filter_overflow / diagnostics were deleted
at the receiver boundary. Now plumbed to new outputs on
radar_receiver_final (ddc_overflow_any, ddc_saturation_count) and
aggregated into gpio_dig5 in radar_system_top. Added mark_debug
attributes for ILA visibility. Test/debug inputs tied low explicitly.
- F-0.8 adc_clk_mmcm.xdc set_clock_uncertainty: removed invalid -add
flag (Vivado silently rejected it, applying zero guardband). Now uses
absolute 0.150 ns which covers 53 ps jitter + ~100 ps PVT margin.
P1:
- F-4.2 adarSetBit / adarResetBit reject broadcast=ON — the RMW sampled
a single device but wrote to all four, clobbering the other three's
state.
- F-4.4 initializeSingleDevice returns false and leaves initialized=false
when scratchpad verification fails; previously marked the device
initialized anyway so downstream PA enable could drive a dead bus.
- F-6.2 FIR I/Q filter_overflow ports, previously unconnected, now OR'd
into the module-level filter_overflow output.
- F-6.3 mti_canceller exposes 8-bit saturation counter. Saturation was
previously invisible and produces spurious Doppler harmonics.
Verification:
- 27/27 iverilog testbenches pass
- 228/228 pytest pass (cross-layer contract + cosim)
- MCU unit tests 51/51 + 24/24 pass
- Remote Vivado 2025.2 build: bitstream writes; 400 MHz mixer pipeline
now shows WNS -0.109 ns which MATCHES the audit's F-0.9 prediction
that the design only closed because F-0.8's guardband was silently
dropped. ft_clkout F-0.9 remains a show-stopper (requires MRCC pin
move), tracked separately.
Not addressed in this PR (larger scope, follow-up tickets):
F-0.4, F-0.5, F-0.6, F-0.7, F-0.9, F-1.1, F-1.2, F-2.2, F-3.2, F-4.1,
F-4.7, F-6.4, F-6.5.
Three conflicts — all resolved in favor of develop, which has a more
refined version of the same work this branch introduced:
- radar_system_top.v: develop's cleaner USB_MODE=1 comment (same value).
- run_regression.sh: develop's ${SYSTEM_RTL[@]} refactor + added
USB_MODE=1 test variants.
- tb/radar_system_tb.v: develop's ifdef USB_MODE_1 to dump the correct
USB instance based on mode.
The 400 MHz reset fan-out fix (nco_400m_enhanced, cic_decimator_4x_enhanced,
ddc_400m) and ADAR1000 channel-indexing fix remain intact on this branch.
The four channel-indexed ADAR1000 setters (adarSetRxPhase, adarSetTxPhase,
adarSetRxVgaGain, adarSetTxVgaGain) computed their register offset as
`(channel & 0x03) * stride`, which silently aliased CH4 (channel=4 ->
mask=0) onto CH1 and shifted CH1..CH3 by one. The API contract (1-based
CH1..CH4) is documented in ADAR1000_AGC.cpp:76 and matches the ADI
datasheet; every existing caller already passes `ch + 1`.
Fix: subtract 1 before masking -- `((channel - 1) & 0x03) * stride` --
and reject `channel < 1 || channel > 4` early with a DIAG message so a
future stale 0-based caller fails loudly instead of writing to CH4.
Adds TestTier1Adar1000ChannelRegisterRoundTrip (9 tests) which closes
the loop independently of the driver:
- parses the ADI register map directly from ADAR1000_Manager.h,
- verifies the datasheet stride invariants (gain=1, phase=2),
- auto-discovers every C++ TU under MCU_LIB_DIR / MCU_CODE_DIR so a
new caller cannot silently escape the round-trip check,
- asserts every caller's channel argument evaluates to {1,2,3,4} for
ch in {0,1,2,3} (catches bare 0-based or literal-0 callers at CI
time before the runtime bounds-check would silently drop them),
- round-trips each (caller, ch) through the helper arithmetic and
checks the final address equals REG_CH{ch+1}_*.
Adversarially validated: reverting any one helper, all four helpers,
corrupting the parsed register map, injecting a bare-ch caller, and
auto-discovering a literal-0 caller in a fresh TU each cause the
expected (and only the expected) test to fail.
Stacked on fix/adar1000-vm-tables (PR #107).
The ADAR1000 vector-modulator I/Q lookup tables VM_I[128] and VM_Q[128]
were declared but defined as empty initialiser lists since the first
commit (5fbe97f). Every call to adarSetRxPhase / adarSetTxPhase therefore
wrote (I=0x00, Q=0x00) to registers 0x21/0x23 (Rx) and 0x32/0x34 (Tx)
regardless of the requested phase state, leaving beam steering completely
non-functional in firmware.
This commit:
* Populates VM_I[128] and VM_Q[128] from ADAR1000 datasheet Rev. B
Tables 13-16 (p.34) on a uniform 2.8125 deg grid (360 / 128 states).
Byte format: bits[7:6] reserved 0, bit[5] polarity (1 = positive
lobe), bits[4:0] 5-bit unsigned magnitude - exactly as specified.
* Removes VM_GAIN[128] declaration and (empty) definition. The
ADAR1000 has no separate VM gain register; per-channel VGA gain is
set via CHx_RX_GAIN (0x10-0x13) / CHx_TX_GAIN (0x1C-0x1F) by
adarSetRxVgaGain / adarSetTxVgaGain. VM_GAIN was never populated,
never read anywhere in the firmware, and its presence falsely
suggested a missing scaling step in the signal path.
* Adds 9_Firmware/tests/cross_layer/adar1000_vm_reference.py: an
independently-derived ground-truth module containing the full
datasheet table plus byte-format / uniform-grid / quadrant-symmetry
/ cardinal-point invariant checkers and a tolerant C array parser.
* Adds TestTier2Adar1000VmTableGroundTruth (9 tests) to
test_cross_layer_contract.py, including a tokenising C/C++
comment+string stripper used by the VM_GAIN reintroduction guard,
and an adversarial self-test that corrupts one byte and asserts
the comparison detects it (defends against silent bypass via
future fixture/parser refactors).
Adversarially validated: removing the firmware definitions, flipping
a single byte, or reintroducing VM_GAIN as code each cause the suite
to fail; restoring causes it to pass. VM_GAIN appearing inside string
literals or comments correctly does NOT trip the guard.
Closes the empty-table half of the ADAR1000 phase-control bug class.
The separate channel-rotation issue (#90) will be addressed in a
follow-up PR.
Refs: 7_Components Datasheets and Application notes/ADAR1000.pdf
Rev. B Tables 13-16 p.34
The firmware uses the C++ ADAR1000_Manager class exclusively. The C-style
driver pair (adar1000.c, 693 LoC; adar1000.h, 294 LoC) has no external
call sites:
grep -rn "Adar_Set|Adar_Read|Adar_Write|Adar_Soft" 9_Firmware
grep -rn "AdarDevice|AdarBiasCurrents|AdarDeviceInfo" 9_Firmware
Both return hits only inside adar1000.c/h themselves. ADAR1000_Manager.h
has its own copies of REG_CH1_*, REG_INTERFACE_CONFIG_A, etc. and does
not include adar1000.h. main.cpp had a lone #include "adar1000.h" but
referenced no symbols from it; the REG_* macros it uses resolve through
ADAR1000_Manager.h on the next line.
No behaviour change: the deleted code was unreachable.
Side note on #90: adar1000.c contained a second copy of the
REG_CH1_* + (channel & 0x03) channel-rotation pattern tracked in #90
(lines 349, 397-398, 472, 520-521). This commit does not fix#90 --
the live path in ADAR1000_Manager.cpp still needs the channel-index
fix -- but it removes the dormant copy so the bug has one less place
to hide.
Verification:
- 9_Firmware/9_1_Microcontroller/tests: make clean && make -> all passing
(51/51 UM982 GPS, 24/24 driver, 13/13 ADAR1000_AGC, bugs #1-15, Gap-3
fixes 1-5, safety fixes)
- 9_Firmware/tests/cross_layer: 29 passed
- grep -rn "adar1000\.h|adar1000\.c|Adar_|AdarDevice" 9_Firmware: 0 hits
Resolve cross-layer AGC control mismatch where opcode 0x28 only
controlled the FPGA inner-loop AGC but the STM32 outer-loop AGC
(ADAR1000_AGC) ran independently with its own enable state.
FPGA: Drive gpio_dig6 from host_agc_enable instead of tied low,
making the FPGA register the single source of truth for AGC state.
MCU: Change ADAR1000_AGC constructor default from enabled(true) to
enabled(false) so boot state matches FPGA reset default (AGC off).
Read DIG_6 GPIO every frame with 2-frame confirmation debounce to
sync outerAgc.enabled — prevents single-sample glitch from causing
spurious AGC state transitions.
Tests: Update MCU unit tests for new default, add 6 cross-layer
contract tests verifying the FPGA-MCU-GUI AGC invariant chain.
- Add ERROR_COUNT sentinel to SystemError_t enum
- Change error_strings[] to static const char* const
- Add static_assert to enforce enum/array sync at compile time
- Add runtime bounds check with fallback for invalid error codes
- Add all missing test binary names to .gitignore
The gap3, agc, and gps test binaries (Mach-O executables compiled on macOS)
were accidentally tracked. CI runs on Linux and fails with 'Exec format error'.
Removed from index and added to .gitignore.
FPGA-001: The previous fix derived frame boundaries from chirp_counter==0,
but that counter comes from plfm_chirp_controller_enhanced which overflows
to N (not wrapping at chirps_per_elev). This caused frame pulses only on
6-bit rollover (every 64 chirps) instead of every N chirps. Now wires the
CDC-synchronized tx_new_chirp_frame_sync signal from the transmitter into
radar_receiver_final, giving correct per-frame timing for any N.
STM32-004: Changed ad9523_init() failure path from Error_Handler() to
return -1, matching the pattern used by ad9523_setup() and ad9523_status()
in the same function. Both halt the system, but return -1 keeps IRQs
enabled for diagnostic output.
checkSystemHealth()'s internal watchdog (pre-fix step 9) had two linked
defects that, combined with the previous commit's escalation of
ERROR_WATCHDOG_TIMEOUT to Emergency_Stop(), would false-latch AERIS-10:
1. Cold-start false trip:
static uint32_t last_health_check = 0;
if (HAL_GetTick() - last_health_check > 60000) { trip; }
On the first call, last_health_check == 0, so the subtraction
against a seeded-zero sentinel exceeds 60 000 ms as soon as the MCU
has been up >60 s -- normal after the ADAR1000 / AD9523 / ADF4382
init sequence -- and the watchdog trips spuriously.
2. Stale timestamp after early returns:
last_health_check = HAL_GetTick(); // at END of function
Every earlier sub-check (IMU, BMP180, GPS, PA Idq, temperature) has
an `if (fault) return current_error;` path that skips the update.
After ~60 s of transient faults, the next clean call compares
against a long-stale last_health_check and trips.
With ERROR_WATCHDOG_TIMEOUT now escalating to Emergency_Stop(), either
failure mode would cut the RF rails on a perfectly healthy system.
Fix: move the watchdog check to function ENTRY. A dedicated cold-start
branch seeds the timestamp on the first call without checking. On every
subsequent call, the elapsed delta is captured first and
last_health_check is updated BEFORE any sub-check runs, so early returns
no longer leave a stale value. 32-bit tick-wrap semantics are preserved
because the subtraction remains on uint32_t.
Add test_gap3_health_watchdog_cold_start.c covering cold-start, paced
main-loop, stall detection, boundary (exactly 60 000 ms), recovery
after trip, and 32-bit HAL_GetTick() wrap -- wired into tests/Makefile
alongside the existing gap-3 safety tests.
STM32-006: Remove blocking do-while loop that waited for legacy GUI start
flag — production V7 PyQt GUI never sends it, hanging the MCU at boot.
STM32-004: Check ad9523_init() return code and call Error_Handler() on
failure, matching the pattern used by all other hardware init calls.
FPGA-001: Simplify frame boundary detection to only trigger on
chirp_counter wrap-to-zero. Previous conditions checking == N and == 2N
were unreachable dead code (counter wraps at N-1). Now correct for any
chirps_per_elev value.
- Rename ERROR_STEPPER_FAULT → ERROR_STEPPER_MOTOR to match main.cpp enum
- Update critical-error predicate to include ERROR_TEMPERATURE_HIGH and
ERROR_WATCHDOG_TIMEOUT (was testing stale pre-fix logic)
- Test 4 now asserts overtemp DOES trigger e-stop (previously asserted opposite)
- Add Test 5 (watchdog triggers e-stop) and Test 6 (memory alloc does not)
- Add ERROR_MEMORY_ALLOC and ERROR_WATCHDOG_TIMEOUT to local enum
- 7 tests, all pass
The previous commit accidentally introduced the literal 2-byte sequence
'\r' at the end of two backslash-continuation lines (TESTS_STANDALONE
and the .PHONY list). GNU make on Linux treats that as text rather than
a line continuation, which orphans the following line with leading
spaces and aborts CI with:
Makefile:68: *** missing separator (did you mean TAB instead of 8 spaces?)
Strip the extraneous 'r' so each continuation ends with a real backslash
+ LF.
handleSystemError() only called Emergency_Stop() for error codes in
[ERROR_RF_PA_OVERCURRENT .. ERROR_POWER_SUPPLY] (9..13). Two critical
faults were left out of the gate and fell through to attemptErrorRecovery()'s
default log-and-continue branch:
- ERROR_TEMPERATURE_HIGH (14): raised by checkSystemHealth() when the
hottest of 8 PA thermal sensors exceeds 75 C. Without cutting bias
(DAC CLR) and the PA 5V0/5V5/RFPA_VDD rails, the 10 W GaN QPA2962
stages remain biased in an overtemperature state -- a thermal-runaway
path in AERIS-10E.
- ERROR_WATCHDOG_TIMEOUT (16): indicates the health-check loop has
stalled (>60 s since last pass). Transmitter state is unknown;
relying on IWDG to reset the MCU re-runs startup and re-energises
the PA rails rather than latching the safe state.
Fix: extend the critical-error predicate so these two codes also trigger
Emergency_Stop(). Add test_gap3_overtemp_emergency_stop.c covering all
17 SystemError_t values (must-trigger and must-not-trigger), wired into
tests/Makefile alongside the existing gap-3 safety tests.