Commit Graph

23 Commits

Author SHA1 Message Date
Jason 4a102e30fe fix(mcu): MCU-A6 — recovery handlers for AD9523_CLOCK and FPGA_COMM
attemptErrorRecovery() previously fell through to the default log-only
branch for both ERROR_AD9523_CLOCK and ERROR_FPGA_COMM. checkSystemHealth
keeps re-firing the same error every pass with no recovery action ever
attempted, so the system limps along until escalation kicks in.

ERROR_AD9523_CLOCK: AD9523_RESET_ASSERT, 10 ms settle, then re-run
configure_ad9523() (releases reset, selects REFB, reprograms, waits for
lock). On second failure we log and let the next health pass re-fire so
a transient brown-out on the 100 MHz reference does not drop straight
into Emergency_Stop.

ERROR_FPGA_COMM: pulse PD12 LOW->10 ms->HIGH (matches the boot reset
pattern). PA rails left untouched at runtime; brief adar_tr_x undefined
window is acceptable vs. losing the radar entirely.

Added test_mcu_a6_recovery_dispatch (11 cases) covering both new
handlers, all existing routes, the default branch, a pre-fix regression
check, and an explicit assertion that RF_PA_OVERCURRENT escalates
upstream (handleSystemError) rather than recovering inline. MCU
regression now 80/80.
2026-04-28 09:26:35 +05:45
Jason 1317a91e01 fix(mcu): MCU-A5 — gate Idq health-window during PA calibration walk
The boot-time Idq calibration walks DAC_val from 126 down toward the
1.680 A target. Mid-walk readings sit well above the 2.5 A overcurrent
threshold by design, and a channel that hits the safety_counter timeout
(50 iters) can be left above the window. Without a gate, the next
checkSystemHealth() pass would trip ERROR_RF_PA_OVERCURRENT and route
straight into Emergency_Stop, killing the system mid-bringup.

Added a `pa_calibration_in_progress` flag set TRUE around both DAC1 and
DAC2 cal walks. checkSystemHealth's Idq window short-circuits while the
flag is set; bias-fault and overcurrent thresholds remain fully active
once the walk completes, so any genuinely stuck-high channel surfaces on
the very next health pass and routes through the normal handler.

Other health checks (lock, comm, temperature, watchdog) stay live during
cal — no behavioural change to anything except the Idq window.

Added test_mcu_a5_pa_cal_gate (7 cases): mid-walk masking, post-cal
re-arming, stuck-high channel surfacing after gate clears, bias-fault
gating, PowerAmplifier=false short-circuit, and a pre-fix regression
case showing the buggy path would have tripped overcurrent mid-walk.
MCU regression now 79/79.
2026-04-28 09:21:43 +05:45
Jason f28a0eaa80 fix(mcu): MCU-A7 — persist emergency state across MCU resets in BKPSRAM
Emergency_Stop's hold loop refreshed IWDG forever, so any reset path that
DID fire (SYSRESETREQ from another fault, brown-out) would re-run
startup and re-energize the PA rails — there was no record that the
system had been in emergency state. Watchdog defeat in the hold loop
masked the problem.

BKPSRAM gives us a flag that survives every reset path but is lost on
main-power removal — exactly the recovery semantics we want:
power-cycle is the deliberate operator action that clears emergency,
every other reset stays in safe-hold.

  - Added emergency_persist_set/check helpers (BKPSRAM @ 0x40024000,
    magic 0xDEAD5A5A); enable PWR + backup-access + BKPSRAM clock.
  - Emergency_Stop now writes the flag BEFORE the rail-cut sequence so
    even an interrupted shutdown still leaves the persisted state set.
  - main() checks the flag immediately after MX_IWDG_Init and before
    any PA enable code; if set, calls Emergency_Stop directly. GPIO
    init has already forced all PA enables LOW, so the safe-hold path
    is reached without a single PA rail going hot.

Hold-loop IWDG refresh kept intentionally: a healthy hold loop does not
need to cycle the MCU, but if the loop itself wedges (stack corruption,
bus fault), refresh stops, IWDG fires, and the persist flag routes the
reset right back into safe-hold.

Added test_mcu_a7_emergency_persist (6 cases) modelling BKPSRAM
persistence vs power-cycle, including a regression check that exercises
the pre-fix "no persistence" boot to confirm it would have re-energized
the PAs. MCU regression now 78/78.
2026-04-27 19:52:13 +05:45
Jason df0b2fd469 fix(mcu): MCU-A1 — replace 25 C cooling stub with 70/60 C hysteresis
Cooling-fan trip in main.cpp's periodic temperature block was a 25 C dev
stub that latched the fan ON at room temperature on every boot. Replaced
with production thermal control: ON at 70 C, OFF at 60 C. The 10 C
dead-band prevents relay/fan chatter near the threshold; the 70 C ON
point sits below the 75 C SAFE-mode gate in checkSystemHealth() so the
fan engages before the system shuts down.

Driven from the existing `temperature` global (max of 8 sensors,
populated just above by the GAP-3 fix) instead of re-OR'ing the eight
Temperature_N variables — single source of truth, and the diag now
prints the actual peak temperature on each transition.

Added test_mcu_a1_cooling_hysteresis (9 cases) covering cold-start,
upward crossing, dead-band hold, downward crossing, and a regression
guard at 30 C that would have engaged the fan under the old stub.
MCU regression now 77/77.
2026-04-27 19:42:42 +05:45
Jason 2c34323bcb fix(mcu): MCU-N5/C4 — runRadarPulseSequence stops shadowing m/n/y globals
runRadarPulseSequence was redeclaring `int m, n, y` at function scope,
which shadowed the file-scope `uint8_t m, n, y` globals at lines
~190-192 that getStatusString reports to the GUI as
BeamPos|Azimuth|ChirpCount. The function's increments updated only the
locals, then discarded them — so telemetry was permanently frozen at
"BeamPos:1|Azimuth:1|ChirpCount:1" no matter how many beam positions
or revolutions had elapsed.

Fix: drop the three local declarations; the body already references
m/n/y by name, so removing the locals lets the writes hit the globals.
A comment documents the pitfall so the locals do not get re-added by
a future cleanup. Numeric ranges are safe (m_max=32, n_max=31,
y_max=50, all fit in uint8_t).

Test: new standalone test_bug16_runradar_shadows_globals.c reproduces
both the buggy (locals shadow globals) and fixed (globals advance)
patterns and asserts the expected post-sweep values
(g_n=16, g_m=1 wraps each iter, g_y=2 after one revolution).

MCU regression: 76/76 (was 75).
2026-04-27 13:36:28 +05:45
Jason 25a280c200 refactor(mcu): remove redundant ADAR1000 T/R SPI paths (FPGA-owned)
Per-chirp T/R switching is owned by the FPGA plfm_chirp_controller
driving adar_tr_x pins (TR_SOURCE=1 in REG_SW_CONTROL, already set by
initializeSingleDevice). The MCU's SPI RMW path via fastTXMode/
fastRXMode/pulseTXMode/pulseRXMode/setADTR1107Control was:
  (a) architecturally redundant — raced the FPGA-driven TR line,
  (b) toggled the wrong bit (TR_SOURCE instead of TR_SPI),
  (c) in setFastSwitchMode(true) bundled a datasheet-violating
      PA+LNA-simultaneously-biased side effect.

Removed methods and their backing state (fast_switch_mode_,
switch_settling_time_us_). Call sites in executeChirpSequence /
runRadarPulseSequence updated to rely on the FPGA chirp FSM (GPIOD_8
new_chirp trigger unchanged).

Tests: adds CMSIS-Core DWT/CoreDebug/SystemCoreClock stubs to
stm32_hal_mock so F-4.7's DWT-based delayUs() compiles under the host
mock build. SystemCoreClock=0 makes the busy-wait exit immediately.
2026-04-21 01:09:38 +05:45
Jason 658752abb7 fix: propagate FPGA AGC enable to MCU outer loop via DIG_6 GPIO
Resolve cross-layer AGC control mismatch where opcode 0x28 only
controlled the FPGA inner-loop AGC but the STM32 outer-loop AGC
(ADAR1000_AGC) ran independently with its own enable state.

FPGA: Drive gpio_dig6 from host_agc_enable instead of tied low,
making the FPGA register the single source of truth for AGC state.

MCU: Change ADAR1000_AGC constructor default from enabled(true) to
enabled(false) so boot state matches FPGA reset default (AGC off).
Read DIG_6 GPIO every frame with 2-frame confirmation debounce to
sync outerAgc.enabled — prevents single-sample glitch from causing
spurious AGC state transitions.

Tests: Update MCU unit tests for new default, add 6 cross-layer
contract tests verifying the FPGA-MCU-GUI AGC invariant chain.
2026-04-17 00:04:37 +05:45
copilot-swe-agent[bot] df875bdf4d Merge origin/develop into feat/um982-gps-driver
Co-authored-by: JJassonn69 <83615043+JJassonn69@users.noreply.github.com>
2026-04-16 06:23:05 +00:00
Jason bcbbfabbdb harden error_strings[] safety and update .gitignore
- Add ERROR_COUNT sentinel to SystemError_t enum
- Change error_strings[] to static const char* const
- Add static_assert to enforce enum/array sync at compile time
- Add runtime bounds check with fallback for invalid error codes
- Add all missing test binary names to .gitignore
2026-04-16 02:12:37 +05:45
Jason b9c36dcca5 fix(ci): remove macOS test binaries from git, update .gitignore
The gap3, agc, and gps test binaries (Mach-O executables compiled on macOS)
were accidentally tracked. CI runs on Linux and fails with 'Exec format error'.
Removed from index and added to .gitignore.
2026-04-16 00:45:52 +05:45
3aLaee 35539ea934 fix(mcu): harden checkSystemHealth() watchdog against cold-start + stale-ts
checkSystemHealth()'s internal watchdog (pre-fix step 9) had two linked
defects that, combined with the previous commit's escalation of
ERROR_WATCHDOG_TIMEOUT to Emergency_Stop(), would false-latch AERIS-10:

  1. Cold-start false trip:
       static uint32_t last_health_check = 0;
       if (HAL_GetTick() - last_health_check > 60000) { trip; }
     On the first call, last_health_check == 0, so the subtraction
     against a seeded-zero sentinel exceeds 60 000 ms as soon as the MCU
     has been up >60 s -- normal after the ADAR1000 / AD9523 / ADF4382
     init sequence -- and the watchdog trips spuriously.

  2. Stale timestamp after early returns:
       last_health_check = HAL_GetTick();   // at END of function
     Every earlier sub-check (IMU, BMP180, GPS, PA Idq, temperature) has
     an `if (fault) return current_error;` path that skips the update.
     After ~60 s of transient faults, the next clean call compares
     against a long-stale last_health_check and trips.

With ERROR_WATCHDOG_TIMEOUT now escalating to Emergency_Stop(), either
failure mode would cut the RF rails on a perfectly healthy system.

Fix: move the watchdog check to function ENTRY. A dedicated cold-start
branch seeds the timestamp on the first call without checking. On every
subsequent call, the elapsed delta is captured first and
last_health_check is updated BEFORE any sub-check runs, so early returns
no longer leave a stale value. 32-bit tick-wrap semantics are preserved
because the subtraction remains on uint32_t.

Add test_gap3_health_watchdog_cold_start.c covering cold-start, paced
main-loop, stall detection, boundary (exactly 60 000 ms), recovery
after trip, and 32-bit HAL_GetTick() wrap -- wired into tests/Makefile
alongside the existing gap-3 safety tests.
2026-04-15 20:36:19 +02:00
Jason b0e5b298fe feat(gps): add UM982 GPS driver replacing broken TinyGPS++
Implement a complete UM982 GNSS driver (um982_gps.h/.c) with:
- NMEA parser for GGA, RMC, THS, VTG with multi-talker support (GP/GN/GL/GA/GB)
- Correct coordinate parsing using decimal-point-based degree detection
  (fixes PR #68 bug: 3-digit longitude degrees)
- Checksum verification on all incoming sentences
- Non-blocking line assembler with ring buffer
- Init sequence: UNLOG, HEADING FIXLENGTH, baseline config, NMEA enables,
  VERSIONA handshake (no SAVECONFIG to avoid NVM wear)
- Validity/age checks with configurable timeouts

Integration into main.cpp:
- Replace TinyGPSPlus with UM982_GPS_t, UART5 baud 9600->115200
- Non-blocking um982_process() in main loop (single-byte UART reads)
- GPS heading override with magnetometer fallback
- Health check using um982_position_age()

Test infrastructure:
- 49 unit tests covering checksums, coordinate parsing, all sentence types,
  talker IDs, feed/assembly, validity, init sequence, edge cases
- Mock HAL_UART_Receive with per-UART ring buffer for integration tests
- All 72 MCU tests passing (23 existing + 49 new)

Fixes all 12 bugs identified in PR #68 analysis (5 compile errors + 7 functional).
2026-04-15 17:46:21 +05:45
Jason 0b25db08b5 fix(test): align emergency_state_ordering test with overtemp/watchdog fix
- Rename ERROR_STEPPER_FAULT → ERROR_STEPPER_MOTOR to match main.cpp enum
- Update critical-error predicate to include ERROR_TEMPERATURE_HIGH and
  ERROR_WATCHDOG_TIMEOUT (was testing stale pre-fix logic)
- Test 4 now asserts overtemp DOES trigger e-stop (previously asserted opposite)
- Add Test 5 (watchdog triggers e-stop) and Test 6 (memory alloc does not)
- Add ERROR_MEMORY_ALLOC and ERROR_WATCHDOG_TIMEOUT to local enum
- 7 tests, all pass
2026-04-15 13:18:07 +05:45
3aLaee 4900282042 fix(mcu-tests): strip stray literal backslash-r in Makefile continuations
The previous commit accidentally introduced the literal 2-byte sequence
'\r' at the end of two backslash-continuation lines (TESTS_STANDALONE
and the .PHONY list). GNU make on Linux treats that as text rather than
a line continuation, which orphans the following line with leading
spaces and aborts CI with:

  Makefile:68: *** missing separator (did you mean TAB instead of 8 spaces?)

Strip the extraneous 'r' so each continuation ends with a real backslash
+ LF.
2026-04-15 09:16:03 +02:00
3aLaee a2686b7424 fix(mcu): escalate overtemp and watchdog-timeout faults to Emergency_Stop()
handleSystemError() only called Emergency_Stop() for error codes in
[ERROR_RF_PA_OVERCURRENT .. ERROR_POWER_SUPPLY] (9..13). Two critical
faults were left out of the gate and fell through to attemptErrorRecovery()'s
default log-and-continue branch:

  - ERROR_TEMPERATURE_HIGH (14): raised by checkSystemHealth() when the
    hottest of 8 PA thermal sensors exceeds 75 C. Without cutting bias
    (DAC CLR) and the PA 5V0/5V5/RFPA_VDD rails, the 10 W GaN QPA2962
    stages remain biased in an overtemperature state -- a thermal-runaway
    path in AERIS-10E.

  - ERROR_WATCHDOG_TIMEOUT (16): indicates the health-check loop has
    stalled (>60 s since last pass). Transmitter state is unknown;
    relying on IWDG to reset the MCU re-runs startup and re-energises
    the PA rails rather than latching the safe state.

Fix: extend the critical-error predicate so these two codes also trigger
Emergency_Stop(). Add test_gap3_overtemp_emergency_stop.c covering all
17 SystemError_t values (must-trigger and must-not-trigger), wired into
tests/Makefile alongside the existing gap-3 safety tests.
2026-04-14 21:53:39 +02:00
Jason 666527fa7d feat: AGC phases 4-5 — STM32 outer-loop AGC class + main.cpp integration
Implements the STM32 outer-loop AGC (ADAR1000_AGC) that reads the FPGA
saturation flag on DIG_5/PD13 once per radar frame and adjusts the
ADAR1000 VGA common gain across all 16 RX channels.

Phase 4 — ADAR1000_AGC class (new files):
- ADAR1000_AGC.h/.cpp: attack/recovery/holdoff logic, per-channel
  calibration offsets, effectiveGain() with OOB safety
- test_agc_outer_loop.cpp: 13 tests covering saturation, holdoff,
  recovery, clamping, calibration, SPI spy, reset, mixed sequences

Phase 5 — main.cpp integration:
- Added #include and global outerAgc instance
- AGC update+applyGain call between runRadarPulseSequence() and
  HAL_IWDG_Refresh() in main loop

Build system & shim fixes:
- Makefile: added CXX/CXXFLAGS, C++ object rules, TESTS_WITH_CXX in
  ALL_TESTS (21 total tests)
- stm32_hal_mock.h: const uint8_t* for HAL_UART_Transmit (C++ compat),
  __NOP() macro for host builds
- shims/main.h + real main.h: FPGA_DIG5_SAT pin defines

All tests passing: MCU 21/21, GUI 92/92, cross-layer 29/29.
2026-04-13 20:14:31 +05:45
Jason f3bbf77ca1 Gap 3 Safety Architecture: IWDG watchdog, Emergency_Stop PA rail cutoff, temp max, periodic IDQ re-read, emergency state ordering + 5 tests (20/20 pass) 2026-03-19 21:58:39 +02:00
Jason c466021bb6 Fix bugs B12-B17 (PA cal loop, ADC buffer, DIAG_SECTION args, htim3 init, stale annotations) with regression tests
B12: PA IDQ calibration loop condition inverted (< 0.2 -> > 0.2) for both DAC1/DAC2
B13: DAC2 ADC buffer mismatch — reads from hadc2 now correctly stored to adc2_readings
B14: DIAG_SECTION macro call sites changed from 2-arg to 1-arg form (4 sites)
B15: htim3 definition + MX_TIM3_Init() added (PWM mode, CH2+CH3, Period=999)
B16: Removed stale NO-OP annotation on TriggerTimedSync (already fixed in Bug #3)
B17: Updated stale GPIO-only warnings to reflect TIM3 PWM implementation (Bug #5)

All 15 tests pass (11 original + 4 new for B12-B15).
2026-03-19 11:04:53 +02:00
Jason 49c9aa28ad Fix Bug #11 (platform SPI transmit-only), FPGA B2 (chirp BRAM migration), FPGA B3 (DSP48 pipelining)
Bug #11: platform_noos_stm32.c used HAL_SPI_Transmit instead of
HAL_SPI_TransmitReceive — reads returned garbage. Changed to in-place
full-duplex. Dead code (never called), fixed per audit recommendation.
Test added: test_bug11_platform_spi_transmit_only.c. Mock infrastructure
updated with SPI spy types. All 11 firmware tests pass.

FPGA B2: Migrated long_chirp_lut[0:3599] from ~700 lines of hardcoded
assignments to BRAM with (* ram_style = "block" *) attribute and
$readmemh("long_chirp_lut.mem"). Added sync-only read block for proper
BRAM inference. 1-cycle read latency introduced. short_chirp_lut left
as distributed RAM (60 entries, too small for BRAM).

FPGA B3: Added BREG (window_val_reg) and MREG (mult_i_raw/mult_q_raw)
pipeline stages to doppler_processor.v. Eliminates DPIP-1 and DPOP-2
DRC warnings. S_LOAD_FFT retimed: fft_input_valid starts at sub=2,
+1 cycle total latency. BREG primed in S_PRE_READ at no extra cost.
Both FPGA files compile clean with Icarus Verilog.
2026-03-19 10:31:16 +02:00
Jason 3b32f67087 Fix SPI bugs #9 (NULL platform_ops) and #10 (missing CS toggle), widen chip_select to uint16_t
Bug #9: Both TX and RX SPI init params had platform_ops = NULL, causing
adf4382_init() -> no_os_spi_init() to fail with -EINVAL. Fixed by setting
platform_ops = &stm32_spi_ops and passing stm32_spi_extra with correct CS
port/pin for each device.

Bug #10: stm32_spi_write_and_read() never toggled chip select. Since TX
and RX ADF4382A share SPI4, every register write hit both PLLs. Rewrote
stm32_spi.c to assert CS LOW before transfer and deassert HIGH after,
using stm32_spi_extra metadata. Backward-compatible: legacy callers
(e.g., AD9523) with cs_port=NULL skip CS management.

Also widened chip_select from uint8_t to uint16_t in no_os_spi.h since
STM32 GPIO_PIN_xx values (e.g., GPIO_PIN_14=0x4000) overflow uint8_t.

10/10 tests pass (8 original + 2 new regression tests).
2026-03-19 10:00:05 +02:00
Jason 397969348e Fix all 8 firmware bugs with regression tests
Bugs fixed in adf4382a_manager.c:
- Bug #1: Move initialized=true before sync setup, propagate sync failure
- Bug #3: Implement TriggerTimedSync with sw_sync pulse (was no-op)
- Bug #5: Replace GPIO-only placeholder with TIM3 PWM for DELADJ
- Bug #7: Correct GPIOG pin definitions to match CubeMX (pins 6-15)

Bugs fixed in main.cpp:
- Bug #2: Remove pre-reset ad9523_setup() call (keep only post-reset)
- Bug #4: Move init error check before phase shift calls
- Bug #6: Fix timer variable (last_check -> last_check1) in temp block
- Bug #8: Uncomment uart_print/uart_println debug helpers

Test harness updates:
- All 8 tests rewritten to assert correct post-fix behavior
- Added TIM PWM mock (SPY_TIM_PWM_START/STOP/SET_COMPARE)
- Added mock_adf4382_set_timed_sync_retval for failure injection
- Updated shims and Makefile for new test dependencies
- All 8 tests pass: make clean && make test -> 8/8 passed
2026-03-19 09:42:59 +02:00
Jason b93ee04592 Add .gitignore for test build artifacts, remove committed binaries and .o files 2026-03-19 09:28:48 +02:00
Jason 28a66889ad Add MCU firmware test harness with 8 bug-confirming tests
Complete test infrastructure for the observe-before-fix methodology:

- stm32_hal_mock: HAL stub types + spy/recording ring buffer (512 entries)
- ad_driver_mock: ADF4382/AD9523 mock drivers with configurable returns
- 9 shim headers redirecting real #includes to mock types
- Makefile with individual (test_bug1..8) and aggregate (test) targets

All 8 tests pass, confirming:
  #1 Timed sync init ordering (SetupTimedSync before initialized=true)
  #2 AD9523 double setup (first call before reset release)
  #3 TriggerTimedSync no-op (prints messages, no HW action)
  #4 Phase shift before init error check
  #5 SetFinePhaseShift GPIO-only placeholder (no PWM)
  #6 Timer variable collision (last_check vs last_check1)
  #7 GPIO pin mapping conflict (manager.h vs CubeMX main.h)
  #8 uart_print/uart_println commented out
2026-03-19 09:28:19 +02:00