Commit Graph

72 Commits

Author SHA1 Message Date
Jason fd6036b49b PR-AB.b expanded commit 3: XDC + MCU GPIO scrub (PD9 / PD10)
Strip the FPGA-side pin constraints and MCU-side GPIO init+toggles for
the two STM32→FPGA beam-step GPIOs that the commit 1 RTL strip rendered
unreachable. The MCU was toggling PD9 once per beam_pos iteration and
PD10 once per azimuth step; both edges fed FPGA edge_detector_enhanced
instances that drove elevation_counter / azimuth_counter regs in
plfm_chirp_controller_v2 — counters that were never consumed (status
pack didn't carry them; on 50T they went to _nc; on 200T to
unconstrained outputs). GUI already uses MCU-side software counters
m/n/y via USB-CDC.

- constraints/xc7a50t_ftg256.xdc: delete PACKAGE_PIN E16 (PD9) +
  D16 (PD10); tighten stm32_new_* wildcard to explicit stm32_new_chirp.
- constraints/xc7a200t_fbg484.xdc: delete PACKAGE_PIN N18 (PD9) +
  N19 (PD10); tighten wildcard same as 50T.
- main.cpp:633: delete HAL_GPIO_TogglePin(GPIOD, GPIO_PIN_9) inside the
  matrix1/matrix2 beam_pos loop.
- main.cpp:655: delete HAL_GPIO_TogglePin(GPIOD, GPIO_PIN_10) at the
  azimuth-step / stepper-rotate boundary.
- main.cpp:3118 (MX_GPIO_Init output level): drop PD9 + PD10 from the
  GPIOD WritePin OR-mask.
- main.cpp:3172-3174 (MX_GPIO_Init pin config): drop PD9 + PD10 from
  the GPIOD pin OR-mask + comment. PD9 + PD10 now default to high-Z
  inputs after MCU reset — no leakage path because the FPGA-side ports
  are gone.

MCU regression: 51/51 + 34/34 suites green. FPGA regression unchanged
at 42/0/0 (XDC isn't consumed by iverilog).

The remaining DIG_0..DIG_3 bus pins are PD8 stm32_new_chirp (kept until
commit 5 renames it to stm32_beam_ready), PD11 stm32_mixers_enable, and
PD12 reset_n.
2026-05-11 11:06:21 +05:45
Jason ada170ef1f feat(fpga,mcu,gui): PR-AB.b — drift-free dwell sync via DIG_6 frame_pulse + AGC always-on policy
FPGA (Phase 1+2):
- gpio_dig6 (PD14) now carries chirp_scheduler frame_pulse, FPGA-stretched
  to ~100 ns so the STM32 EXTI on PD14 can latch reliably.
- gpio_dig7 (PD15) returns to its pre-PR-AB.b role: control-fault OR
  (range_decim_watchdog | CDC overrun); MCU stuck-high sampler unchanged.
- rx_range_decim_watchdog gains a sticky in source clock domain so a slow
  status poll cannot miss a 1-cycle assertion (Phase 1).
- New tb_dig6_frame_pulse.v (13 checks); tb_status_words_stickies.v extended
  with DIG_7 fault-OR coverage (14 checks); retired tb_audit_s10_gpio_split.v.
- Port comments in radar_system_top.v / _50t.v and XDC roles refreshed.

MCU (Phase 3):
- PD14 reconfigured to GPIO_MODE_IT_RISING + GPIO_PULLDOWN; new
  EXTI15_10_IRQHandler in stm32f7xx_it.c dispatches to HAL_GPIO_EXTI_Callback
  that bumps a volatile g_frame_pulse_count.
- runRadarPulseSequence dwell loop replaces 3x HAL_Delay(8) with
  waitForFramePulse(20) — per-pattern dwell now tracks the actual mask-aware
  ladder length (drift-free, mask-aware), with a 20 ms timeout safety net.
- AGC outer loop is ALWAYS-ON in production (compile-time policy); bench
  builds compile the body out via -DMCU_AGC_FORCE_DISABLED. The runtime
  enable/debounce + DIG_6 polling that previously gated AGC are removed.
- main.h adds FPGA_FRAME_PULSE_* aliases pointing at FPGA_DIG6_*.

GUI (Phase 4):
- Settings tab gains a Bench / Diagnostics group with a BENCH-MODE checkbox
  (off by default, persisted via QSettings).
- AGC group header swaps between a green "AGC: ALWAYS-ON" badge (production)
  and Enable/Disable AGC buttons (bench), pinned to the top of the group.
  The redundant 0/1 spinbox row for opcode 0x28 is removed — buttons send
  the same opcode and cannot accept invalid input.
- Both the FPGA Control AGC Status box and the AGC Monitor strip share a
  helper that honours bench-mode in production (always shows ALWAYS-ON in
  green so the two views never disagree with the badge).
- _add_fpga_param_row uses setFixedWidth on label and Set button + explicit
  stretch=1 on the hint, so all rows align column-wise whether they sit
  directly in a QVBoxLayout or inside a wrapper QWidget.

Regression: FPGA 42/0/0 (PR-M.4 baseline) - MCU 34/34 - GPS extended 51/51
- GUI v7 150/150 - BENCH-MODE flip behaviorally verified.
Hardware-blocked steps deferred: bench-scope verify (PD14 dwell pulse,
counter advance, PD15 stuck-high recovery still triggers).

Closes #182.
2026-05-07 13:29:48 +05:45
Jason b215caa294 fix(mcu): PR-AB.a — move vector_0 out of inner beam_pos loop
runRadarPulseSequence used to fire vector_0 (broadside reference)
between every matrix1 and matrix2 pattern, i.e. 15 times per azimuth.
That dwell × 8 ms × 15 = 120 ms per azimuth × 50 azimuths = ~6 s of
the 18.4 s revisit time was burned on redundant broadside frames.

Pull vector_0 out of the loop and fire it once per azimuth before the
sweep. Each azimuth now produces 1 broadside frame + 30 steered frames
(matrix1 + matrix2 across 15 beam_pos), down from 15 + 30 = 45 frames.
Revisit time drops from 18.4 s to ~12.8 s (31% improvement).

If multiple per-position broadside frames are ever needed, gate them
behind a runtime switch — the comment block flags this.

test_bug16_runradar_shadows_globals updated to mirror the new
1-outside + 2-inside m-counter pattern; 13/13 PASS, full MCU
regression 51/0 + 34/0.
2026-05-06 12:18:00 +05:45
Jason 83cbc91d8b refactor(mcu): PR-W F-6.7 — privatize setADTR1107Mode
API hygiene. setADTR1107Mode flips ADTR1107 PA/LNA bias registers but
does NOT touch the per-channel ADAR1000 RX/TX enable bits. Production
always reaches it through setAllDevicesTXMode / setAllDevicesRXMode,
which emit both halves. Leaving setADTR1107Mode public after F-6.1
removed the other public mode-switch wrappers invited a future caller
to invoke it directly and end up in a mismatched bias-vs-enable state.

Move the declaration to the private section with a short comment
explaining why the wrappers are the only sanctioned entry point.
2026-05-05 11:30:46 +05:45
Jason e3bd885be9 fix(mcu): PR-W F-6.3 — clear opposite REG_MISC_ENABLES bit in setADTR1107Mode
Latent bit-mask hygiene gap. setADTR1107Mode(TX) was asserting BIAS_EN
(bit 5) without first clearing LNA_BIAS_OUT_EN (bit 4); the RX branch
mirrored the bug. On any TX→RX→TX (or symmetric) transition through
this register both PA and LNA bias outputs would end up enabled
simultaneously. Production today only ever calls one direction at boot
and the opposite at shutdown — never both during normal operation —
so the bug was unreachable, but a future per-chirp SPI mode switch
would trip it.

Now each branch resetBit's the opposite enable before asserting its
own. 1 line per branch loop (not 1 per device — used the existing
for-dev loop).
2026-05-05 11:30:25 +05:45
Jason f23b35b719 chore(mcu): PR-W F-6.1 — prune dead ADAR1000Manager surface
Stage-6 ADTR1107 audit cleanup. Delete 4 unused public methods plus
their 2 internal helpers from ADAR1000_Manager.cpp/h. Production boot
goes through main.cpp's C-style systemPowerUpSequence(), so the C++
ADAR1000Manager::powerUpSystem / powerDownSystem / switchToTXMode /
switchToRXMode wrappers had zero call sites; the same was true of the
setPABias / setLNABias helpers, only ever invoked from the dead
switchTo* paths. -130 LOC, no behavioral change.

kPaBiasRxSafe is intentionally KEPT — it is live-used inside
setADTR1107Mode(RX) as the safe PA bias when transitioning to RX.
2026-05-05 11:29:32 +05:45
Jason 00d5d5f220 fix(mcu): PR-V — ADF4382A Stage-5 audit fixes (F-5.1..F-5.10)
F-5.1: revert PWM scaffolding to binary DELADJ. Schematic-verified:
  PG7/PG13 on STM32F746ZGT7 have no TIM3 alternate function (Port G AFs are
  FMC/ETH/USART6/SAI2/SDMMC2 — no TIMx routes), and the FreqSynth-board
  DELADJ net has only a 200 kOhm pulldown (R22, R35) — no series-R +
  shunt-C LPF for PWM-to-DC. The 3979693 (Bug #5) + c466021 (B15) PWM
  scaffolding was a false-fix; 5fbe97f's original honest TODO matched the
  actual hardware. Delete htim3, MX_TIM3_Init, start/stop_deladj_pwm,
  phase_ps_to_duty_cycle. Rewrite test_bug5 for binary; delete test_bug15.

F-5.2: split ADF4382A ref_div per device. RX 10.38 GHz / 300 MHz = 34.6 is
  fractional mode, but ADF4382_PFD_FREQ_FRAC_MAX = 250 MHz — driver does
  not reject the out-of-spec config, ldwin_pw silently left at 0. Set
  rx_param.ref_div = 2 -> PFD = 150 MHz, in spec. TX unchanged (integer).

F-5.3: free prior tx_dev/rx_dev in Manager_Init before re-allocating. The
  recovery dispatch on TX/RX unlock calls Manager_Init again; previous
  adf4382_dev allocations were leaking. Mirrors F-4.5 fix for AD9523.

F-5.4: fix upstream adf4382_remove() — only freed dev struct on FAILED SPI
  removal (success path leaked) and always returned 0. Now: NULL guard,
  unconditional free, propagate ret.

F-5.8: lock-detect uses register reg[0x58] LOCKED bit as authoritative.
  GPIO disagreement still logged via DIAG_WARN but no longer flips the
  result — a mis-routed GPIO LKDET would otherwise trigger false-unlock
  recovery loops.

F-5.10: drop stale "EZSYNC" diagnostic string (post-C-14a residue).

Bench-side checks for first power-on:
- Scope PG13 (TX_DELADJ) and PG7 (RX_DELADJ) — both should be HIGH (3.3V)
  after SetPhaseShift(500,500) runs at boot.
- Confirm both ADF4382A LOs lock with PFD=150 MHz on RX (was 300 MHz).
  Lock-time may be slightly longer; phase-noise sidebands shift.
- Confirm no false-unlock storms on the recovery path — the GPIO LKDET
  disagreement DIAG_WARN should no longer flip the lock decision.

Regression: tests/ make test 34/34 PASS (was 35/35 baseline; -1 from
test_bug15 deletion as planned).
2026-05-05 09:20:06 +05:45
Jason e1e5ae464a fix(mcu): F-4.3/4.4 (Option A) — AD9523 PLL1 bypass for first bring-up
The F-4.1+4.2+4.7 patch (ddc0df4) made ad9523_init() run before the
user pdata overrides, which means pll1_bypass_en=0 (the previous
override) is now actually honoured by the driver. Combined with the
fact that pll1_charge_pump_current_nA and pll1_feedback_div were
never set in main.cpp, PLL1 would be expected active but couldn't
lock (CP=0) — ad9523_status() with bypass_en=0 checks PLL1+REFA+REFB
bits, so the failure surfaces, returns -1, and configure_ad9523()
halts boot at main.cpp:1742.

Option A: set pll1_bypass_en=1. VCXO free-runs on its own crystal
stability; ad9523_status() skips PLL1 checks. Boot path is now
clean. Trade-off: VCXO frequency drifts with temperature (~±20 ppm
over -40°C..+85°C for typical XO) — acceptable for first-flight
checkout, but eventual production should re-enable PLL1 (Option B,
deferred to F-4.3/4.4 with measured loop-filter values).

Comment notes the deferral and what's needed before flipping to
bypass=0 (CP current + loop filter rzero tuned to VCXO Kvco).

Regression: 86/0.
2026-05-04 23:39:06 +05:45
Jason 05472c1493 fix(mcu): F-4.5 + F-4.6 — AD9523 heap/lifecycle hygiene
F-4.5: ad9523_setup() malloc's both an ad9523_dev and a no_os SPI
descriptor (ad9523.c:430,435). Previously the dev pointer was local
to configure_ad9523() and fell out of scope on return — every
recovery cycle (ERROR_AD9523_CLOCK → re-run configure_ad9523) leaked
one struct + one SPI desc. STM32F7 heap is bounded; sustained
brown-out flapping would eventually exhaust it. Move dev to a file-
scope `g_ad9523_dev` and call ad9523_remove() at the top of
configure_ad9523() to free the previous instance before re-setup.
Initial boot path is unaffected (g_ad9523_dev=NULL → remove call
gated by NULL check).

F-4.6: ad9523_setup() called ad9523_calibrate() but discarded its
return value (ad9523.c:707). VCO calibration can fail silently — if
the target VCO is outside the 3.6-4.0 GHz band (e.g. F-4.1 wipe left
PLL2 N=16, target 1.6 GHz), calibrate would report failure but setup
still proceeded to ad9523_status(), where PLL2_LD might pass
spuriously. Capture and propagate the calibrate return so a failed
calibration aborts setup with a clear non-zero status code instead
of being absorbed.

Both fixes are mechanical and don't change correct-path behaviour.
Regression: 86/0 (mocks bypass real driver, so F-4.6 is not covered
by tests; F-4.5 changes are in main.cpp and don't trip mocked
configure_ad9523).
2026-05-04 22:06:08 +05:45
Jason ddc0df464e fix(mcu): F-4.1+4.2+4.7 — AD9523 init order + M1 divider + channel math
Three coupled bugs in configure_ad9523() that together prevented the
AD9523 from producing the labelled output frequencies:

F-4.1: ad9523_init() unconditionally overwrites every field in the
caller's pdata (vcxo_freq=0, pll1_bypass_en=1, pll2_ndiv_b_cnt=4,
all channel fields). Calling it AFTER customization wiped every user
value. Reorder: call ad9523_init() before the pdata.X = Y block; user
overrides land on top of ADI defaults instead of being wiped.

F-4.2: pll2_vco_diff_m1 / m2 are required (range 3..5 per datasheet)
but were left at 0 from memset. The driver's AD_IFE() macro promotes
m=0 to M_PWR_DOWN_EN, killing channels 4-9 (ADC, SYNC, FPGA system
clock, DAC). Set m1=m2=3 explicitly.

F-4.7: AD9523 has no VCO-direct path for OUT4-OUT9; channels source
M1 or M2 only (datasheet + ad9523_vco_out_map register definitions
confirmed). With VCO 3.6 GHz and m1=3, channel dividers see 1.2 GHz,
not 3.6 GHz — every channel_divider in main.cpp was 3x too large.
Updated values:
  OUT0/1 (ADF4382A REF, 300 MHz):  /12 -> /4
  OUT4/5 (ADC + FPGA_ADC, 400 MHz): /9 -> /3
  OUT6 (FPGA SYSCLK, 100 MHz):     /36 -> /12
  OUT7 (FPGA TEST, 20 MHz):       /180 -> /60
  OUT8/9 (SYNC, 60 MHz):           /60 -> /20
  OUT10/11 (DAC, 120 MHz):         /30 -> /10

m1=3 is the unique choice for this labelled frequency set (m1=4 fails
OUT4, m1=5 fails OUT0/1).

PLL1 (F-4.3/4.4) is not addressed here — pll1_bypass_en=0 with
pll1_charge_pump_current_nA still 0 means PLL1 won't lock and status()
will report it. Decide bypass strategy before bench.

Test mocks (ad_driver_mock.c) bypass the real driver, so this is not
caught by make. Regression: 86/0 (unchanged).

Bench-verify OUT4=400MHz and OUT6=100MHz with scope before trusting
downstream. F-1.10 (which crystal is fitted on X5/X6) goes in the
same bench session — F-4.7 resolution shows 100 MHz VCXO is the only
math-coherent choice regardless of BOM document.
2026-05-04 21:52:53 +05:45
Jason b84aa6a6f3 fix(mcu): F-3.1 Error_Handler reset + audit cleanup tail
F-3.1 (functional): Error_Handler() now calls NVIC_SystemReset() instead
of __disable_irq(); while(1). Every MX_*_Init() helper invokes
Error_Handler before MX_IWDG_Init() runs, so an infinite spin would brick
the MCU on any transient boot-time glitch with no watchdog to recover.
SystemReset turns a hard-to-debug brick into a visible reboot loop.

F-3.3..F-3.8 (comment hygiene in main.cpp init helpers + post-init):
  - TIM3 init: clarify 1 MHz tick @ 72 MHz timer clock (APB1=36 MHz but
    RCC_TIMPRES_ACTIVATED forces TIMxCLK=HCLK)
  - GPIO init: fix EN_P_3V3_ADAR12EN_P_3V3_VDD_SW_Pin → EN_P_3V3_VDD_SW_Pin
    typo; correct PD8-11 → PD8-12 and PD12-15 → PD13-15 ranges
  - SystemClock_Config: add VOS3 + 72 MHz intent comment
  - MPU_Config: decode SubRegionDisable=0x87 bitmask

D1/D6/D7 (ADAR cleanup tail): code was already deleted in a prior pass;
this strips the residual tombstone comments per the no-tombstone feedback
policy.
  - ADAR1000_Manager.h: 5 tombstone blocks removed (fastTXMode/etc,
    setBeamAngle/4-phase/BeamConfig, setADTR1107Control, Configuration
    section + setSwitchSettlingTime/setFastSwitchMode/setBeamDwellTime,
    setTRSwitchPosition)
  - ADAR1000_Manager.cpp: 6 tombstone comments removed; switchToRXMode
    Step 4→3, Step 5→4 renumbered after Step-3 gap
  - ADAR1000_AGC.cpp: stale "(matching the convention in setBeamAngle)"
    reference removed
  - main.cpp:556-557: redundant "setFastSwitchMode(true) call removed"
    tombstone removed

D2 (comment-only): initializeBeamMatrices() and runRadarPulseSequence
descriptions rewritten to describe array-math peak (matrix1 → NEGATIVE
θ peak, matrix2 → POSITIVE θ peak) instead of the misleading "positive
phase difference" framing. Sky/ground sign vs antenna mount explicitly
flagged unverified — functional sign question remains hardware-blocked
pending calibrated-source bench test.

Regression: 86/0.
2026-05-04 21:06:23 +05:45
Jason 53f7d1e3ee chore(mcu): C-14a — delete dead ADF4382A EZSync surface
Production firmware never used SYNC_METHOD_EZSYNC — both callsites
(main.cpp:938 recovery, main.cpp:1955 boot) pass SYNC_METHOD_TIMED.
The original audit C-14 flagged TX/RX SPI skew in EZSync's trigger
sequence, but the path was dead from production; only test_bug3
referenced it for spy-harness regression coverage.

Removed:
  - SYNC_METHOD_EZSYNC enum value
  - ADF4382A_SetupEZSync function (and declaration)
  - ADF4382A_TriggerEZSync function (and declaration)
  - EZSync branch in ADF4382A_Manager_Init (collapsed to unconditional
    SetupTimedSync call)
  - test_bug3_timed_sync_noop.c Test C (EZSync regression coverage)

Production header and test shim header both cleaned. SyncMethod enum
kept as single-value to avoid touching the 7 other test callers that
pass SYNC_METHOD_TIMED.

Residual concern (separate from original C-14): ADF4382A_TriggerTimedSync
uses the same TX-then-RX sw_sync SPI sequencing pattern as the deleted
EZSync trigger. ~5 µs SPI gap between TX-armed and RX-armed means TX
and RX may capture different SYNCP/SYNCN edges (60 MHz cycle = 16.7 ns,
~300 edges in the gap). External SYNCP only provides simultaneity if
both devices are armed before a common edge. Hardware bench-test
required to confirm operational tolerance; cannot fix in firmware
without DMA SPI burst rewrite.

Regression: 86/0 (matches baseline).
2026-05-04 21:05:50 +05:45
Jason b505266f33 fix(mcu): P-5 — align radar params with PR-F/PR-Q.1; document mode-01 production stance
main.cpp pre-PR-F constants caused two issues:
  - m_max = 32 disagreed with RP_CHIRPS_PER_FRAME = 48 (3 sub-frames * 16);
    getStatusString reported "32 chirps/position" to the GUI, false telemetry.
  - PRI MEDIUM = 161 us (PR-Q.1 stagger) was missing entirely; the MCU only
    knew SHORT=175 / LONG=167. T2 was also stuck at the pre-PR-E 0.5 us
    SHORT chirp width; PR-E switched to 1.0 us.

Fixes:
  - m_max 32 -> 48; T2 0.5 -> 1.0; new T_MEDIUM=5.0, PRI_MEDIUM=161.0 constants.
  - Big doc-comment above runRadarPulseSequence states the production stance:
    FPGA cold-resets to mode 2'b01 (auto-scan) so the MCU's chirp GPIO toggles
    are no-ops; pass-through mode 2'b00 needs a 3-PRI loop the MCU does not
    yet emit, so mode-00 is operationally unsupported until that's built.
  - Removed the redundant /* */ block-comment shadow of the same constants
    that had `T2` defined twice (typo for `PRI2`); pure dead-code cleanup.
  - test_bug16_runradar_shadows_globals.c m_max 32 -> 48 with refreshed
    arithmetic comment; binary still PASSes all 4 checks (g_m wraps to 1
    each iter regardless of m_max value).

No GPIO timing change (would need hardware verification). Audit P-5 closes
with the documented mode-01 stance; rebuilding the loop for mode-00 stays
on the backlog if/when pass-through becomes a deployment requirement.
2026-05-02 16:40:32 +05:45
Jason 534905263f mcu(health): poll PD15 + dispatch ERROR_FPGA_DSP_STALL (AUDIT-S10 follow-up)
AUDIT-S10 (commit `58154a6`) split the FPGA's six-flag aggregate
gpio_dig5 into two MCU-visible bits: gpio_dig5 keeps signal-saturation
(AGC reacts), gpio_dig7 (PD15) carries control-fault classes
(range_decim_watchdog | cic_fir_overrun). Until now the MCU did NOT
poll PD15, so DSP control faults were invisible to the recovery
dispatcher.

Changes:

- New `ERROR_FPGA_DSP_STALL` enum value placed AFTER ERROR_WATCHDOG_TIMEOUT
  so the dispatcher routes to attemptErrorRecovery (FPGA reset pulse) not
  Emergency_Stop. Updated error_strings[] in lockstep (static_assert
  enforces).

- checkSystemHealth section 10 polls PD15 at 1 Hz with 2-sample debounce.
  `last_dsp_check` is committed BEFORE the early return per AUDIT-CAL
  pattern, so a flapping fault never bypasses the rate-limit. Streak
  counter resets to 0 after firing (armed for next post-recovery
  assertion) AND resets naturally when PD15 returns LOW.

- attemptErrorRecovery: ERROR_FPGA_DSP_STALL fans into the existing
  ERROR_FPGA_COMM PD12 reset case (stacked case labels, same body). No
  MCU-driven reset_monitors path exists; full bitstream reload clears
  all sticky monitors as a side effect.

Tests:
- tests/test_audit_s10_dsp_stall_polling.c (NEW, 7 scenarios, 7/7 PASS):
  T1 healthy 60s, T2 single-sample glitch blocked by debounce, T3
  sustained fault fires once, T4 post-fire rate-limit holds within
  window, T5 sustained fault rate bounded (29 errors / 60s -- MCU-N1
  latch at error_count>10 fires in ~22s, gives operator time to
  intervene), T6 counter-test demos no-debounce false-positive on
  glitch, T7 HAL_GetTick 32-bit wrap.
- MCU host suite 35/35 PASS (was 34/34; +1 new, 0 regressions).
2026-04-29 23:42:21 +05:45
Jason 1b1b5f4fb2 mcu(health): commit rate-limit window before early returns (AUDIT-CAL follow-up)
checkSystemHealth() had three watchdog blocks with the identical
"last_X_check not updated on error path" bug — same root cause as
AUDIT-CAL (BMP180 fix in commit 95aed35), distinct sites:

  AD9523 clock check   (5 s)  main.cpp:693-705
  ADAR1000 comm check  (2 s)  main.cpp:729-749
  IMU comm check       (10 s) main.cpp:752-760

Pre-fix, each block placed `last_X_check = HAL_GetTick();` below the
early-return path, so once the underlying check (STATUS0/1 RESET,
SCRATCHPAD verify fail, GY85_Update false) started failing, the
rate-limit window never engaged. Every subsequent iteration of the
main while(1) loop re-fired the corresponding ERROR_*. With
error_count > 10 latching system_emergency_state per MCU-N1, the
radar would trip into SAFE-MODE within ~10 main-loop iterations of
the first transient — far short of the intended ~100-150 s grace
window meant for operator intervention or attemptErrorRecovery
to succeed. ADAR1000 comm-failure also re-ran the 16 ms blocking
SPI verify (4 devices × 4 ms HAL_Delay) per iteration → chirp jitter.

Fix at all three sites: move the timestamp update INTO the if-block
and BEFORE any sub-check call. Mirrors the AUDIT-CAL post-fix
BMP180 block at main.cpp:771-780. ADAR1000 overtemp check stays
per-loop (unchanged) — over-temperature must remain responsive.

Test: tests/test_audit_imu_watchdog_cadence.c (6 tests, 6/6 PASS)
exercises the post-fix predicate against simulated HAL_GetTick()
ticks and a controllable GY85_Update() mock; counter-test runs the
pre-fix predicate to demonstrate the regression. Test uses IMU as
representative; AD9523 (5 s) and ADAR1000 (2 s) sites have identical
control flow.

Verification: full MCU host suite 34/34 PASS (was 33/33; +1 new test,
0 regressions).
2026-04-29 20:57:50 +05:45
Jason 95aed35d89 mcu(bmp180): call cal-coefficient init at boot + watchdog cadence fix (AUDIT-CAL)
The BMP180 driver had no public init method and never called
readCalibrationCoefficients() from anywhere -- _calCoeff ran at the
C++ in-class member-initializer defaults (all zeros) at runtime.

Consequence chain:
  - computeB5(UT) short-circuited via 0/0 (Cortex-M7 SDIV with
    SCB->CCR.DIV_0_TRP=0 returns 0 silently -- system_stm32f7xx.c does
    not enable the trap)
  - getPressure() always tripped the `if (B4 == 0)` guard, returning
    the I2C-error sentinel (post-AUDIT-C17: INT32_MIN; pre-: 255)
  - health watchdog at main.cpp:758 fired ERROR_BMP180_COMM every
    main-loop iteration because last_bmp_check was only updated on the
    success path, so the 15 s rate-limit never engaged once the check
    started failing
  - error_count > 10 latched system_emergency_state = true (per the
    MCU-N1 fix), driving SAFE-MODE within ~25 s of every boot

Fix:
  - Added BMP180::begin() public method: probes chip ID, then reads the
    11 factory cal coefficients (registers 0xAA..0xBE step 2). Returns
    true only on full success; false on chip-ID mismatch or any I2C
    failure mid-loop.
  - main.cpp BAROMETER INIT calls myBMP.begin() with up to 3 retries
    (50 ms backoff) and sets a file-scope bmp180_operational flag.
    Altitude-baseline loop now gated on success -- failure path leaves
    RADAR_Altitude at 0.0f instead of letting pow(negative, fractional)
    propagate NaN into gps_data telemetry.
  - Health watchdog gates BMP180 check on bmp180_operational AND
    updates last_bmp_check regardless of the error path. A single bad
    pressure reading no longer tight-loops into SAFE-MODE; legit sensor
    failure now takes the intended ~150 s (10 errors x 15 s) before
    the MCU-N1 latch trips, giving the operator time to intervene.

Verification:
  - new test_audit_cal_bmp180_begin.c, 3/3 PASS:
      T1 every coefficient loaded in order with correct signed/unsigned types
      T2 chip-mismatch and I2C-fail short-circuit semantics correct
      T3 regression demo: zero-cal computeB5 returns 0 for any UT (the
         silent-fail mode); datasheet cal reproduces 15.0 C
  - full MCU regression 33/33 PASS (was 32/32; +1 new test, 0 regressions)

Bug introduced in 5fbe97f (initial upload of the driver from the
Arduino enjoyneering79 BMP180 library -- the begin()/init pattern from
the upstream Arduino version was lost in the STM32 port). Latent until
this audit cycle.
2026-04-29 19:21:35 +05:45
Jason 4b142166be mcu(bmp180): replace in-band sentinel + fix uint16->int16 narrowing (AUDIT-C17)
BMP180_ERROR=255 was an in-band sentinel returned by uint16_t I/O helpers
(read16, readRawTemperature) on I2C failure. 255 is also a valid uint16
register reading (0x00FF appears across the calibration block and is
reachable as a raw temperature/pressure sample), so a sensor failure was
indistinguishable from a real reading.

getTemperature() additionally narrowed the uint16_t raw read to int16_t
before passing to computeB5(). Raw bit-patterns >= 0x8000 (reachable across
the BMP180 -40..+85 C operating window) flipped to negative int16_t and
sign-extended into computeB5(), producing temperature errors of order
100s of C (e.g. -347 C instead of +51 C for raw UT = 0x8000).

Fix:
  - Internal I/O helpers (read8/read16/readRawTemperature/readRawPressure)
    now return bool and pass the value through an out-param. None of the
    new sentinels collide with valid sensor output:
      * getTemperature       -> NaN          on error
      * getPressure          -> INT32_MIN    on error
      * getSeaLevelPressure  -> INT32_MIN    on error
  - getTemperature() keeps raw as uint16_t and widens value-preservingly
    via (int32_t)raw before computeB5().
  - readRawPressure() reads XLSB through the bool-out-param contract;
    previously OR'd in 0xFF on I2C fail, silently corrupting the LSB.

Verification: test_audit_c17_bmp180_sentinel_and_cast 4/4 PASS, including
datasheet UT=27898 -> 15.0 C reproduction and 64/64 finite outputs across
a full uint16 sweep (vs 32/32 collapses in the upper half under the buggy
narrowing). Full MCU regression 32/32 PASS.

Caller-side: no external code references BMP180_ERROR; main.cpp's existing
range check at the health-watchdog catches INT32_MIN via the < 30000.0
branch.
2026-04-29 18:55:48 +05:45
Jason 26f8d1fa72 fix(mcu): MCU-A4 — BKPSRAM warm-restart bypass for OCXO 180 s warmup
Every boot waited the full 180 s OCXO warmup soak — even an
IWDG/SYSRESETREQ reset that takes seconds and leaves the OCXO oven hot
lost three minutes of bringup time.

Added BKPSRAM slot 3 (magic 0xCA1C1F1E) with warmup_persist_set/check
helpers next to the existing MCU-A2/A7 BKPSRAM block. Cold-boot path
now arms the flag at the end of the full 180 s soak; subsequent boots
that find the flag still set know the OCXO oven is still hot and the
crystal is settled, so they wait 5 s and move on. Power-cycle clears
BKPSRAM and forces the full soak again — safe default, operator can't
accidentally skip the warmup by yanking and re-applying power.

Added test_mcu_a4_ocxo_warm_restart (7 cases): cold boot soaks 180 s
and sets the flag; warm reset is 5 s; 5 consecutive warm resets stay
fast; power-cycle restores the cold path; cold-after-power-cycle
re-arms the bypass; pre-fix regression confirms 10 warm restarts save
1750 s vs the old always-180-s path. MCU regression now 82/82.
2026-04-28 09:50:32 +05:45
Jason 0a49320e31 fix(mcu): MCU-A2 — site-configurable mag declination, persisted in BKPSRAM
The magnetometer yaw correction used a hardcoded -0.61 deg literal baked
in for one deployment site. Yaw_Sensor was wrong by (site_decl + 0.61)
deg at every other site whenever the UM982 dual-antenna heading was
unavailable.

Backed the value with BKPSRAM (slots 1+2 — slot 0 is the MCU-A7
emergency flag) and exposed set_mag_declination_deg / get_mag_declination_deg.
Default returns the legacy -0.61 deg when no override has been written so
the original site stays correct out of the box; a host command (or
future GPS-derived auto-calibration) writes the new site value once and
it persists across every reset path until main-power removal.

Hardened with a +/-30 deg range clamp on both write AND read paths — real
magnetic declinations are roughly +/-25 deg worldwide, so a wider value
indicates a calibration error or BKPSRAM corruption (VBAT brown-out, bit
flip) rather than a legitimate site. Defensive read-side clamp prevents
a corrupted slot from propagating a wild heading offset.

Replaced the single use site at the magnetometer yaw computation with
the getter; legacy global Mag_Declination retained and kept in sync by
the setter for any external linkage.

Added test_mcu_a2_mag_declination (10 cases): default, set/get,
persistence across reset, power-cycle clear, write-side clamp both
directions, plausible-site passthrough, defensive read-side clamp on
corruption, wrong-magic fallback, pre-fix bearing-error regression.
MCU regression now 81/81.
2026-04-28 09:45:41 +05:45
Jason 4a102e30fe fix(mcu): MCU-A6 — recovery handlers for AD9523_CLOCK and FPGA_COMM
attemptErrorRecovery() previously fell through to the default log-only
branch for both ERROR_AD9523_CLOCK and ERROR_FPGA_COMM. checkSystemHealth
keeps re-firing the same error every pass with no recovery action ever
attempted, so the system limps along until escalation kicks in.

ERROR_AD9523_CLOCK: AD9523_RESET_ASSERT, 10 ms settle, then re-run
configure_ad9523() (releases reset, selects REFB, reprograms, waits for
lock). On second failure we log and let the next health pass re-fire so
a transient brown-out on the 100 MHz reference does not drop straight
into Emergency_Stop.

ERROR_FPGA_COMM: pulse PD12 LOW->10 ms->HIGH (matches the boot reset
pattern). PA rails left untouched at runtime; brief adar_tr_x undefined
window is acceptable vs. losing the radar entirely.

Added test_mcu_a6_recovery_dispatch (11 cases) covering both new
handlers, all existing routes, the default branch, a pre-fix regression
check, and an explicit assertion that RF_PA_OVERCURRENT escalates
upstream (handleSystemError) rather than recovering inline. MCU
regression now 80/80.
2026-04-28 09:26:35 +05:45
Jason 1317a91e01 fix(mcu): MCU-A5 — gate Idq health-window during PA calibration walk
The boot-time Idq calibration walks DAC_val from 126 down toward the
1.680 A target. Mid-walk readings sit well above the 2.5 A overcurrent
threshold by design, and a channel that hits the safety_counter timeout
(50 iters) can be left above the window. Without a gate, the next
checkSystemHealth() pass would trip ERROR_RF_PA_OVERCURRENT and route
straight into Emergency_Stop, killing the system mid-bringup.

Added a `pa_calibration_in_progress` flag set TRUE around both DAC1 and
DAC2 cal walks. checkSystemHealth's Idq window short-circuits while the
flag is set; bias-fault and overcurrent thresholds remain fully active
once the walk completes, so any genuinely stuck-high channel surfaces on
the very next health pass and routes through the normal handler.

Other health checks (lock, comm, temperature, watchdog) stay live during
cal — no behavioural change to anything except the Idq window.

Added test_mcu_a5_pa_cal_gate (7 cases): mid-walk masking, post-cal
re-arming, stuck-high channel surfacing after gate clears, bias-fault
gating, PowerAmplifier=false short-circuit, and a pre-fix regression
case showing the buggy path would have tripped overcurrent mid-walk.
MCU regression now 79/79.
2026-04-28 09:21:43 +05:45
Jason f28a0eaa80 fix(mcu): MCU-A7 — persist emergency state across MCU resets in BKPSRAM
Emergency_Stop's hold loop refreshed IWDG forever, so any reset path that
DID fire (SYSRESETREQ from another fault, brown-out) would re-run
startup and re-energize the PA rails — there was no record that the
system had been in emergency state. Watchdog defeat in the hold loop
masked the problem.

BKPSRAM gives us a flag that survives every reset path but is lost on
main-power removal — exactly the recovery semantics we want:
power-cycle is the deliberate operator action that clears emergency,
every other reset stays in safe-hold.

  - Added emergency_persist_set/check helpers (BKPSRAM @ 0x40024000,
    magic 0xDEAD5A5A); enable PWR + backup-access + BKPSRAM clock.
  - Emergency_Stop now writes the flag BEFORE the rail-cut sequence so
    even an interrupted shutdown still leaves the persisted state set.
  - main() checks the flag immediately after MX_IWDG_Init and before
    any PA enable code; if set, calls Emergency_Stop directly. GPIO
    init has already forced all PA enables LOW, so the safe-hold path
    is reached without a single PA rail going hot.

Hold-loop IWDG refresh kept intentionally: a healthy hold loop does not
need to cycle the MCU, but if the loop itself wedges (stack corruption,
bus fault), refresh stops, IWDG fires, and the persist flag routes the
reset right back into safe-hold.

Added test_mcu_a7_emergency_persist (6 cases) modelling BKPSRAM
persistence vs power-cycle, including a regression check that exercises
the pre-fix "no persistence" boot to confirm it would have re-energized
the PAs. MCU regression now 78/78.
2026-04-27 19:52:13 +05:45
Jason df0b2fd469 fix(mcu): MCU-A1 — replace 25 C cooling stub with 70/60 C hysteresis
Cooling-fan trip in main.cpp's periodic temperature block was a 25 C dev
stub that latched the fan ON at room temperature on every boot. Replaced
with production thermal control: ON at 70 C, OFF at 60 C. The 10 C
dead-band prevents relay/fan chatter near the threshold; the 70 C ON
point sits below the 75 C SAFE-mode gate in checkSystemHealth() so the
fan engages before the system shuts down.

Driven from the existing `temperature` global (max of 8 sensors,
populated just above by the GAP-3 fix) instead of re-OR'ing the eight
Temperature_N variables — single source of truth, and the diag now
prints the actual peak temperature on each transition.

Added test_mcu_a1_cooling_hysteresis (9 cases) covering cold-start,
upward crossing, dead-band hold, downward crossing, and a regression
guard at 30 C that would have engaged the fan under the old stub.
MCU regression now 77/77.
2026-04-27 19:42:42 +05:45
Jason 2c34323bcb fix(mcu): MCU-N5/C4 — runRadarPulseSequence stops shadowing m/n/y globals
runRadarPulseSequence was redeclaring `int m, n, y` at function scope,
which shadowed the file-scope `uint8_t m, n, y` globals at lines
~190-192 that getStatusString reports to the GUI as
BeamPos|Azimuth|ChirpCount. The function's increments updated only the
locals, then discarded them — so telemetry was permanently frozen at
"BeamPos:1|Azimuth:1|ChirpCount:1" no matter how many beam positions
or revolutions had elapsed.

Fix: drop the three local declarations; the body already references
m/n/y by name, so removing the locals lets the writes hit the globals.
A comment documents the pitfall so the locals do not get re-added by
a future cleanup. Numeric ranges are safe (m_max=32, n_max=31,
y_max=50, all fit in uint8_t).

Test: new standalone test_bug16_runradar_shadows_globals.c reproduces
both the buggy (locals shadow globals) and fixed (globals advance)
patterns and asserts the expected post-sweep values
(g_n=16, g_m=1 wraps each iter, g_y=2 after one revolution).

MCU regression: 76/76 (was 75).
2026-04-27 13:36:28 +05:45
Jason 6f68f3263a fix: MCU-N4 delay_us bound; GUI-S4 STREAM_CONTROL comment
MCU-N4: delay_us(us) reset TIM1 then waited for the counter to reach `us`,
but TIM1 ARR is 0xffff-1 (~65 ms at the 1 MHz tick). Any caller passing
us > 65534 spun forever after the first wrap — a real hazard with the PA
energized. Chunk requests larger than ARR into ARR-sized waits, then the
remainder in the existing single wait. Current callers (T1, PRI1-T1,
Guard, 500us spots) are all well under the bound; this is defensive.

GUI-S4: radar_protocol.STREAM_CONTROL was annotated "3-bit stream enable
mask"; the FPGA accepts usb_cmd_value[5:0] = 6 bits. The wire protocol
already carried the full 32-bit value field, so the upper bits were
reachable via Custom Command — only the comment was wrong. Updated to
match radar_system_top.v:1004.

Verified: 75/75 MCU tests pass; 83/83 v7 GUI tests pass (covered by GUI-C3 commit).
2026-04-23 07:43:53 +05:45
Jason 9d1eb4b11c fix(radar): RX chain corrections, GUI bin alignment, MCU boot ordering
FPGA — RX chain
  matched_filter_multi_segment.v: drop the gratuitous /4 scaling on
    DDC sign-extended input (was ddc_i[17:2] + ddc_i[1]); use
    ddc_i[15:0] directly. fft_engine has INTERNAL_W=32 with
    saturating 16-bit output, so full 16-bit input is safe. Restores
    ~12 dB of MF input dynamic range.
  radar_receiver_final.v: remove latency_buffer (count-N-pulses-then-
    prime FIFO that left frame 1 with all-zero ref). Replaced with
    a single-FF alignment register on ref_i/ref_q that matches the
    1-FF stage multi_segment ST_PROCESSING uses on adc_data.
    Verified by tb/tb_rxb_fullchain_latency.v — autocorrelation peak
    at bin 0 with peak/mean ~88x.
  doppler_processor.v / mti_canceller.v / cfar_ca.v /
    range_bin_decimator.v / radar_receiver_final.v / radar_system_top.v
    / usb_data_interface_ft2232h.v: switch port and parameter widths
    from RP_NUM_RANGE_BINS / RP_RANGE_BIN_BITS (always 512 / 9-bit)
    to RP_MAX_OUTPUT_BINS / RP_RANGE_BIN_WIDTH_MAX (auto-scales:
    50T 512 / 9-bit, 200T 4096 / 12-bit). Unblocks 200T 20 km mode
    at the RX module boundary; USB wire-protocol extension still
    pending.
  radar_receiver_final.v: doppler_frame_done_prev reset value 0 -> 1
    to prevent false done pulse on cycle 1 when level signal is
    HIGH at reset.
  matched_filter_processing_chain.v: delete the broken `ifdef
    SIMULATION inline behavioural FFT (482 lines removed). It
    produced wrong-bin peaks and 100-1000x weak magnitudes. Chain
    now uses production fft_engine.v + frequency_matched_filter.v
    in both iverilog and Vivado. Iverilog tests are ~38x slower per
    chain pass but produce correct results. Misleading "OK with
    Xilinx IP" comments at three test sites updated since the FFT
    is in-house, not an IP placeholder.

FPGA — testbenches
  tb/tb_rxb_latency_measure.v (new): measures chain internal pipeline
    depth (~2057 cycles, chirp-agnostic).
  tb/tb_rxb_fullchain_latency.v (new): full-chain autocorrelation
    verification — drives ddc with the same chirp samples the loader
    serves as ref, finds peak position and peak/mean.
  tb/tb_matched_filter_processing_chain.v: wait timeouts bumped
    50000 -> 500000 cycles to accommodate production FFT pipeline.

MCU
  main.cpp checkSystemHealthStatus: latch system_emergency_state on
    the error_count > 10 path so the SAFE-MODE blink loop in main()
    actually engages (was bypassed because predicate was false).
  main.cpp: move FPGA reset BEFORE the if(PowerAmplifier) block so
    adar_tr_x is driven LOW (RX commanded externally) before PA Vdd
    reaches 22 V. Old reset block at the original location removed.
  main.cpp MX_GPIO_Init: add GPIO_PIN_12 (FPGA reset) to the
    explicit WritePin(LOW) list so the safe initial state is no
    longer implicit.
  main.cpp checkSystemHealth: rate-limit ADAR1000
    verifyDeviceCommunication (HAL_Delay 1ms x 4 devices = 4 ms
    blocking SPI burst per main-loop iteration) from every-loop to
    every 2 s. readTemperature stays per-loop so over-temp
    detection latency is unchanged.
  USBHandler.cpp processSettingsData: dispatch threshold bumped
    74 -> 82 (matches parser minimum); buffer drained after parse
    attempt (slide remaining bytes left) so a false END find no
    longer sticks the buffer until 256-byte overflow.

GUI
  radar_protocol.py: NUM_RANGE_BINS 64 -> 512 (matches FPGA
    RP_NUM_RANGE_BINS); NUM_CELLS 2048 -> 16384.
  radar_protocol.py _ingest_sample: honor FPGA frame_start bit for
    resync after a USB drop; capture range_profile[rbin] once per
    range bin at dbin == 0 (FPGA emits the same range_i/range_q for
    all 32 Doppler cells of a given range bin; previous accumulator
    inflated the profile 32x).
  v7/models.py RadarSettings: range_resolution 24 -> 6 m (matches
    c/(2*100MHz)*4); max_distance and coverage_radius 1536 -> 3072 m;
    map_size 2000 -> 4000.
  v7/models.py WaveformConfig: n_range_bins 64 -> 512, fft_size
    1024 -> 2048, decimation_factor 16 -> 4.
  GUI_V65_Tk.py: _RANGE_PER_BIN math and stale "~24 m / ~1536 m"
    comments updated.
  test_v7.py: assertion values updated to match new defaults.

Tests
  test_ddc_cosim_fuzz.py: remove unused os/tempfile imports, wrap
    three long lines for ruff E501 compliance.
2026-04-23 05:56:52 +05:45
Jason 25a280c200 refactor(mcu): remove redundant ADAR1000 T/R SPI paths (FPGA-owned)
Per-chirp T/R switching is owned by the FPGA plfm_chirp_controller
driving adar_tr_x pins (TR_SOURCE=1 in REG_SW_CONTROL, already set by
initializeSingleDevice). The MCU's SPI RMW path via fastTXMode/
fastRXMode/pulseTXMode/pulseRXMode/setADTR1107Control was:
  (a) architecturally redundant — raced the FPGA-driven TR line,
  (b) toggled the wrong bit (TR_SOURCE instead of TR_SPI),
  (c) in setFastSwitchMode(true) bundled a datasheet-violating
      PA+LNA-simultaneously-biased side effect.

Removed methods and their backing state (fast_switch_mode_,
switch_settling_time_us_). Call sites in executeChirpSequence /
runRadarPulseSequence updated to rely on the FPGA chirp FSM (GPIOD_8
new_chirp trigger unchanged).

Tests: adds CMSIS-Core DWT/CoreDebug/SystemCoreClock stubs to
stm32_hal_mock so F-4.7's DWT-based delayUs() compiles under the host
mock build. SystemCoreClock=0 makes the busy-wait exit immediately.
2026-04-21 01:09:38 +05:45
Jason 356acea314 fix(adar): F-4.1 lower broadcast writes to per-device unicast loop
The `broadcast=1` path on adarWrite() emitted the 0x08 broadcast opcode
but setChipSelect() only asserts one device's CS line, so only the single
selected chip ever saw the frame. The opcode path has also never been
validated on silicon. Until a HIL test confirms multi-CS semantics, route
broadcast=1 through a unicast loop over all devices so caller intent
(all four take the write) is preserved and the dead opcode path becomes
unreachable. Logs a DIAG_WARN on entry for visibility.
2026-04-20 15:48:34 +05:45
Jason 675b1c0015 fix(pre-bringup): second-batch P1/P2/P3 audit findings
Addresses the remaining actionable items from
docs/DEVELOP_AUDIT_2026-04-19.md after commit 3f47d1e.

XDC (dead waivers — F-0.4, F-0.5, F-0.6, F-0.7):
- ft_clkout_IBUF CLOCK_DEDICATED_ROUTE now uses hierarchical filter;
  flat net name did not exist post-synth.
- reset_sync_reg[*] false-path rewritten to walk hierarchy and filter
  on CLR/PRE pins.
- adc_clk_mmcm.xdc ft601_clk_in references replaced with foreach-loop
  over real USB clock names, gated on -quiet existence.
- MMCM LOCKED waiver uses REF_PIN_NAME filter instead of the
  previously-missing u_core/ literal path.

CDC (F-1.1, F-1.2, F-1.3):
- Documented the quasi-static-bus stability invariant above the
  FT601 cmd_valid toggle block.
- cdc_adc_to_processing gains an `overrun` output; the two CIC->FIR
  instances feed a sticky cdc_cic_fir_overrun flag surfaced on
  gpio_dig5 so silent sample drops become visible to the MCU.
- Removed the dead mixers_enable synchronizer in ddc_400m.v; the _sync
  output was unused and every caller ties the port to 1'b1.

Diagnostics (F-6.4):
- range_bin_decimator watchdog_timeout plumbed through receiver
  and top-level, OR'd into gpio_dig5.

ADAR (F-4.7):
- delayUs() replaced with DWT cycle counter; self-initialising
  TRCENA/CYCCNTENA, overflow-safe unsigned subtraction.

Regression: tb_cdc_modules.v 57/57 passes under iverilog after
the cdc_modules.v change. Remote Vivado verification in progress.
2026-04-20 14:28:22 +05:45
Jason 3f47d1ef71 fix(pre-bringup): resolve P0 + quick-win P1 findings from 2026-04-19 audit
Addresses findings from docs/DEVELOP_AUDIT_2026-04-19.md:

P0 source-level:
- F-4.3 ADAR1000_Manager::adarSetTxPhase now writes REG_LOAD_WORKING
  with LD_WRK_REGS_LDTX_OVERRIDE (0x02) instead of 0x01. Previous value
  toggled the LDRX latch on a TX-phase write, so host TX phase updates
  never reached the working registers.
- F-6.1 DDC mixer_saturation / filter_overflow / diagnostics were deleted
  at the receiver boundary. Now plumbed to new outputs on
  radar_receiver_final (ddc_overflow_any, ddc_saturation_count) and
  aggregated into gpio_dig5 in radar_system_top. Added mark_debug
  attributes for ILA visibility. Test/debug inputs tied low explicitly.
- F-0.8 adc_clk_mmcm.xdc set_clock_uncertainty: removed invalid -add
  flag (Vivado silently rejected it, applying zero guardband). Now uses
  absolute 0.150 ns which covers 53 ps jitter + ~100 ps PVT margin.

P1:
- F-4.2 adarSetBit / adarResetBit reject broadcast=ON — the RMW sampled
  a single device but wrote to all four, clobbering the other three's
  state.
- F-4.4 initializeSingleDevice returns false and leaves initialized=false
  when scratchpad verification fails; previously marked the device
  initialized anyway so downstream PA enable could drive a dead bus.
- F-6.2 FIR I/Q filter_overflow ports, previously unconnected, now OR'd
  into the module-level filter_overflow output.
- F-6.3 mti_canceller exposes 8-bit saturation counter. Saturation was
  previously invisible and produces spurious Doppler harmonics.

Verification:
- 27/27 iverilog testbenches pass
- 228/228 pytest pass (cross-layer contract + cosim)
- MCU unit tests 51/51 + 24/24 pass
- Remote Vivado 2025.2 build: bitstream writes; 400 MHz mixer pipeline
  now shows WNS -0.109 ns which MATCHES the audit's F-0.9 prediction
  that the design only closed because F-0.8's guardband was silently
  dropped. ft_clkout F-0.9 remains a show-stopper (requires MRCC pin
  move), tracked separately.

Not addressed in this PR (larger scope, follow-up tickets):
F-0.4, F-0.5, F-0.6, F-0.7, F-0.9, F-1.1, F-1.2, F-2.2, F-3.2, F-4.1,
F-4.7, F-6.4, F-6.5.
2026-04-20 13:48:36 +05:45
Jason 2539d46d93 merge: resolve conflicts with develop (supersede by PR #89 / #107)
Three conflicts — all resolved in favor of develop, which has a more
refined version of the same work this branch introduced:

- radar_system_top.v: develop's cleaner USB_MODE=1 comment (same value).
- run_regression.sh: develop's ${SYSTEM_RTL[@]} refactor + added
  USB_MODE=1 test variants.
- tb/radar_system_tb.v: develop's ifdef USB_MODE_1 to dump the correct
  USB instance based on mode.

The 400 MHz reset fan-out fix (nco_400m_enhanced, cic_decimator_4x_enhanced,
ddc_400m) and ADAR1000 channel-indexing fix remain intact on this branch.
2026-04-19 16:28:07 +05:45
Jason 582476fa0d fix(adar1000): correct 1-based channel indexing in setters (issue #90)
The four channel-indexed ADAR1000 setters (adarSetRxPhase, adarSetTxPhase,
adarSetRxVgaGain, adarSetTxVgaGain) computed their register offset as
`(channel & 0x03) * stride`, which silently aliased CH4 (channel=4 ->
mask=0) onto CH1 and shifted CH1..CH3 by one. The API contract (1-based
CH1..CH4) is documented in ADAR1000_AGC.cpp:76 and matches the ADI
datasheet; every existing caller already passes `ch + 1`.

Fix: subtract 1 before masking -- `((channel - 1) & 0x03) * stride` --
and reject `channel < 1 || channel > 4` early with a DIAG message so a
future stale 0-based caller fails loudly instead of writing to CH4.

Adds TestTier1Adar1000ChannelRegisterRoundTrip (9 tests) which closes
the loop independently of the driver:
  - parses the ADI register map directly from ADAR1000_Manager.h,
  - verifies the datasheet stride invariants (gain=1, phase=2),
  - auto-discovers every C++ TU under MCU_LIB_DIR / MCU_CODE_DIR so a
    new caller cannot silently escape the round-trip check,
  - asserts every caller's channel argument evaluates to {1,2,3,4} for
    ch in {0,1,2,3} (catches bare 0-based or literal-0 callers at CI
    time before the runtime bounds-check would silently drop them),
  - round-trips each (caller, ch) through the helper arithmetic and
    checks the final address equals REG_CH{ch+1}_*.

Adversarially validated: reverting any one helper, all four helpers,
corrupting the parsed register map, injecting a bare-ch caller, and
auto-discovering a literal-0 caller in a fresh TU each cause the
expected (and only the expected) test to fail.

Stacked on fix/adar1000-vm-tables (PR #107).
2026-04-18 06:39:07 +05:45
NawfalMotii79 d3476139e3 Merge pull request #89 from NawfalMotii79/feat/ft2232h-default-ft601-option
feat: make FT2232H default USB interface, add FT601 premium option, deprecate GUI V6
2026-04-17 22:21:58 +01:00
Jason 7c91a3e0b9 fix(adar1000): populate VM_I/VM_Q phase tables; remove dead VM_GAIN
The ADAR1000 vector-modulator I/Q lookup tables VM_I[128] and VM_Q[128]
were declared but defined as empty initialiser lists since the first
commit (5fbe97f). Every call to adarSetRxPhase / adarSetTxPhase therefore
wrote (I=0x00, Q=0x00) to registers 0x21/0x23 (Rx) and 0x32/0x34 (Tx)
regardless of the requested phase state, leaving beam steering completely
non-functional in firmware.

This commit:

* Populates VM_I[128] and VM_Q[128] from ADAR1000 datasheet Rev. B
  Tables 13-16 (p.34) on a uniform 2.8125 deg grid (360 / 128 states).
  Byte format: bits[7:6] reserved 0, bit[5] polarity (1 = positive
  lobe), bits[4:0] 5-bit unsigned magnitude - exactly as specified.
* Removes VM_GAIN[128] declaration and (empty) definition. The
  ADAR1000 has no separate VM gain register; per-channel VGA gain is
  set via CHx_RX_GAIN (0x10-0x13) / CHx_TX_GAIN (0x1C-0x1F) by
  adarSetRxVgaGain / adarSetTxVgaGain. VM_GAIN was never populated,
  never read anywhere in the firmware, and its presence falsely
  suggested a missing scaling step in the signal path.
* Adds 9_Firmware/tests/cross_layer/adar1000_vm_reference.py: an
  independently-derived ground-truth module containing the full
  datasheet table plus byte-format / uniform-grid / quadrant-symmetry
  / cardinal-point invariant checkers and a tolerant C array parser.
* Adds TestTier2Adar1000VmTableGroundTruth (9 tests) to
  test_cross_layer_contract.py, including a tokenising C/C++
  comment+string stripper used by the VM_GAIN reintroduction guard,
  and an adversarial self-test that corrupts one byte and asserts
  the comparison detects it (defends against silent bypass via
  future fixture/parser refactors).

Adversarially validated: removing the firmware definitions, flipping
a single byte, or reintroducing VM_GAIN as code each cause the suite
to fail; restoring causes it to pass. VM_GAIN appearing inside string
literals or comments correctly does NOT trip the guard.

Closes the empty-table half of the ADAR1000 phase-control bug class.
The separate channel-rotation issue (#90) will be addressed in a
follow-up PR.

Refs: 7_Components Datasheets and Application notes/ADAR1000.pdf
      Rev. B Tables 13-16 p.34
2026-04-18 02:02:07 +05:45
Jason c3db8a9122 Merge pull request #96 from joyshmitz/chore/remove-dead-adar1000-c-api
chore(mcu): remove dead C-style adar1000 driver
2026-04-16 23:51:22 +03:00
Serhii 8e1b3f22d2 chore(mcu): remove dead C-style adar1000 driver
The firmware uses the C++ ADAR1000_Manager class exclusively. The C-style
driver pair (adar1000.c, 693 LoC; adar1000.h, 294 LoC) has no external
call sites:

  grep -rn "Adar_Set|Adar_Read|Adar_Write|Adar_Soft" 9_Firmware
  grep -rn "AdarDevice|AdarBiasCurrents|AdarDeviceInfo" 9_Firmware

Both return hits only inside adar1000.c/h themselves. ADAR1000_Manager.h
has its own copies of REG_CH1_*, REG_INTERFACE_CONFIG_A, etc. and does
not include adar1000.h. main.cpp had a lone #include "adar1000.h" but
referenced no symbols from it; the REG_* macros it uses resolve through
ADAR1000_Manager.h on the next line.

No behaviour change: the deleted code was unreachable.

Side note on #90: adar1000.c contained a second copy of the
REG_CH1_* + (channel & 0x03) channel-rotation pattern tracked in #90
(lines 349, 397-398, 472, 520-521). This commit does not fix #90 --
the live path in ADAR1000_Manager.cpp still needs the channel-index
fix -- but it removes the dormant copy so the bug has one less place
to hide.

Verification:
- 9_Firmware/9_1_Microcontroller/tests: make clean && make -> all passing
  (51/51 UM982 GPS, 24/24 driver, 13/13 ADAR1000_AGC, bugs #1-15, Gap-3
  fixes 1-5, safety fixes)
- 9_Firmware/tests/cross_layer: 29 passed
- grep -rn "adar1000\.h|adar1000\.c|Adar_|AdarDevice" 9_Firmware: 0 hits
2026-04-16 22:12:23 +03:00
Jason 658752abb7 fix: propagate FPGA AGC enable to MCU outer loop via DIG_6 GPIO
Resolve cross-layer AGC control mismatch where opcode 0x28 only
controlled the FPGA inner-loop AGC but the STM32 outer-loop AGC
(ADAR1000_AGC) ran independently with its own enable state.

FPGA: Drive gpio_dig6 from host_agc_enable instead of tied low,
making the FPGA register the single source of truth for AGC state.

MCU: Change ADAR1000_AGC constructor default from enabled(true) to
enabled(false) so boot state matches FPGA reset default (AGC off).
Read DIG_6 GPIO every frame with 2-frame confirmation debounce to
sync outerAgc.enabled — prevents single-sample glitch from causing
spurious AGC state transitions.

Tests: Update MCU unit tests for new default, add 6 cross-layer
contract tests verifying the FPGA-MCU-GUI AGC invariant chain.
2026-04-17 00:04:37 +05:45
Jason f393e96d69 feat(fpga): make FT2232H default USB interface, rewrite FT601 write FSM, add clock-loss watchdog
- Set USB_MODE default to 1 (FT2232H) in radar_system_top.v; 200T build
  overrides to USB_MODE=0 via build_200t.tcl generic property
- Rewrite FT601 write FSM: 4-state architecture with 3-word packed data,
  pending-flag gating, and frame sync counter
- Add FT2232H read FSM rd_cmd_complete flag, stream field zeroing, and
  range_data_ready 1-cycle pipeline delay in both USB modules
- Implement clock-loss watchdog: ft_heartbeat toggle + 16-bit timeout
  counter drives ft_clk_lost, feeding ft_effective_reset_n via 2-stage
  ASYNC_REG synchronizer chain
- Fix sample_counter reset literal width (11'd0 -> 12'd0)
- Add FT2232H I/O timing constraints to 50T XDC; fix dac_clk comments
- Document vestigial ft601_txe_n/rxf_n ports (needed for 200T XDC)
- Tie off AGC ports on TE0713 dev wrapper
- Rewrite tb_usb_data_interface.v for new 4-state FSM (89 checks)
- Add USB_MODE=1 regression runs; remove dead CHECK 5/6 loop
- Update diag_log.h USB interface comment
2026-04-16 16:18:52 +05:45
copilot-swe-agent[bot] df875bdf4d Merge origin/develop into feat/um982-gps-driver
Co-authored-by: JJassonn69 <83615043+JJassonn69@users.noreply.github.com>
2026-04-16 06:23:05 +00:00
Jason bcbbfabbdb harden error_strings[] safety and update .gitignore
- Add ERROR_COUNT sentinel to SystemError_t enum
- Change error_strings[] to static const char* const
- Add static_assert to enforce enum/array sync at compile time
- Add runtime bounds check with fallback for invalid error codes
- Add all missing test binary names to .gitignore
2026-04-16 02:12:37 +05:45
Jason b9c36dcca5 fix(ci): remove macOS test binaries from git, update .gitignore
The gap3, agc, and gps test binaries (Mach-O executables compiled on macOS)
were accidentally tracked. CI runs on Linux and fails with 'Exec format error'.
Removed from index and added to .gitignore.
2026-04-16 00:45:52 +05:45
Jason db4e73577e fix: use authoritative tx frame signal for frame sync, consistent ad9523 error path
FPGA-001: The previous fix derived frame boundaries from chirp_counter==0,
but that counter comes from plfm_chirp_controller_enhanced which overflows
to N (not wrapping at chirps_per_elev). This caused frame pulses only on
6-bit rollover (every 64 chirps) instead of every N chirps. Now wires the
CDC-synchronized tx_new_chirp_frame_sync signal from the transmitter into
radar_receiver_final, giving correct per-frame timing for any N.

STM32-004: Changed ad9523_init() failure path from Error_Handler() to
return -1, matching the pattern used by ad9523_setup() and ad9523_status()
in the same function. Both halt the system, but return -1 keeps IRQs
enabled for diagnostic output.
2026-04-16 00:33:27 +05:45
3aLaee 35539ea934 fix(mcu): harden checkSystemHealth() watchdog against cold-start + stale-ts
checkSystemHealth()'s internal watchdog (pre-fix step 9) had two linked
defects that, combined with the previous commit's escalation of
ERROR_WATCHDOG_TIMEOUT to Emergency_Stop(), would false-latch AERIS-10:

  1. Cold-start false trip:
       static uint32_t last_health_check = 0;
       if (HAL_GetTick() - last_health_check > 60000) { trip; }
     On the first call, last_health_check == 0, so the subtraction
     against a seeded-zero sentinel exceeds 60 000 ms as soon as the MCU
     has been up >60 s -- normal after the ADAR1000 / AD9523 / ADF4382
     init sequence -- and the watchdog trips spuriously.

  2. Stale timestamp after early returns:
       last_health_check = HAL_GetTick();   // at END of function
     Every earlier sub-check (IMU, BMP180, GPS, PA Idq, temperature) has
     an `if (fault) return current_error;` path that skips the update.
     After ~60 s of transient faults, the next clean call compares
     against a long-stale last_health_check and trips.

With ERROR_WATCHDOG_TIMEOUT now escalating to Emergency_Stop(), either
failure mode would cut the RF rails on a perfectly healthy system.

Fix: move the watchdog check to function ENTRY. A dedicated cold-start
branch seeds the timestamp on the first call without checking. On every
subsequent call, the elapsed delta is captured first and
last_health_check is updated BEFORE any sub-check runs, so early returns
no longer leave a stale value. 32-bit tick-wrap semantics are preserved
because the subtraction remains on uint32_t.

Add test_gap3_health_watchdog_cold_start.c covering cold-start, paced
main-loop, stall detection, boundary (exactly 60 000 ms), recovery
after trip, and 32-bit HAL_GetTick() wrap -- wired into tests/Makefile
alongside the existing gap-3 safety tests.
2026-04-15 20:36:19 +02:00
Jason 8187771ab0 fix: resolve 3 deferred issues (STM32-006, STM32-004, FPGA-001)
STM32-006: Remove blocking do-while loop that waited for legacy GUI start
flag — production V7 PyQt GUI never sends it, hanging the MCU at boot.

STM32-004: Check ad9523_init() return code and call Error_Handler() on
failure, matching the pattern used by all other hardware init calls.

FPGA-001: Simplify frame boundary detection to only trigger on
chirp_counter wrap-to-zero. Previous conditions checking == N and == 2N
were unreachable dead code (counter wraps at N-1). Now correct for any
chirps_per_elev value.
2026-04-16 00:13:45 +05:45
Jason b0e5b298fe feat(gps): add UM982 GPS driver replacing broken TinyGPS++
Implement a complete UM982 GNSS driver (um982_gps.h/.c) with:
- NMEA parser for GGA, RMC, THS, VTG with multi-talker support (GP/GN/GL/GA/GB)
- Correct coordinate parsing using decimal-point-based degree detection
  (fixes PR #68 bug: 3-digit longitude degrees)
- Checksum verification on all incoming sentences
- Non-blocking line assembler with ring buffer
- Init sequence: UNLOG, HEADING FIXLENGTH, baseline config, NMEA enables,
  VERSIONA handshake (no SAVECONFIG to avoid NVM wear)
- Validity/age checks with configurable timeouts

Integration into main.cpp:
- Replace TinyGPSPlus with UM982_GPS_t, UART5 baud 9600->115200
- Non-blocking um982_process() in main loop (single-byte UART reads)
- GPS heading override with magnetometer fallback
- Health check using um982_position_age()

Test infrastructure:
- 49 unit tests covering checksums, coordinate parsing, all sentence types,
  talker IDs, feed/assembly, validity, init sequence, edge cases
- Mock HAL_UART_Receive with per-UART ring buffer for integration tests
- All 72 MCU tests passing (23 existing + 49 new)

Fixes all 12 bugs identified in PR #68 analysis (5 compile errors + 7 functional).
2026-04-15 17:46:21 +05:45
Jason f67440ee9a Merge pull request #74 from NawfalMotii79/revert-68-feature/add-um982-gps-driver
Revert "Add UM982 GPS driver (um982_gps.h/.cpp) for NMEA sentence parsing
2026-04-15 12:51:47 +03:00
Jason 513e0b9a69 Merge pull request #69 from 3aLaee/fix/overtemp-emergency-stop
Escalate overtemp and watchdog-timeout faults to Emergency_Stop()
2026-04-15 12:51:22 +03:00
Jason 78dff2fd3d Revert "Add UM982 GPS driver (um982_gps.h/.cpp) for NMEA sentence parsing and…" 2026-04-15 11:35:36 +03:00
Jason 0b25db08b5 fix(test): align emergency_state_ordering test with overtemp/watchdog fix
- Rename ERROR_STEPPER_FAULT → ERROR_STEPPER_MOTOR to match main.cpp enum
- Update critical-error predicate to include ERROR_TEMPERATURE_HIGH and
  ERROR_WATCHDOG_TIMEOUT (was testing stale pre-fix logic)
- Test 4 now asserts overtemp DOES trigger e-stop (previously asserted opposite)
- Add Test 5 (watchdog triggers e-stop) and Test 6 (memory alloc does not)
- Add ERROR_MEMORY_ALLOC and ERROR_WATCHDOG_TIMEOUT to local enum
- 7 tests, all pass
2026-04-15 13:18:07 +05:45
3aLaee 4900282042 fix(mcu-tests): strip stray literal backslash-r in Makefile continuations
The previous commit accidentally introduced the literal 2-byte sequence
'\r' at the end of two backslash-continuation lines (TESTS_STANDALONE
and the .PHONY list). GNU make on Linux treats that as text rather than
a line continuation, which orphans the following line with leading
spaces and aborts CI with:

  Makefile:68: *** missing separator (did you mean TAB instead of 8 spaces?)

Strip the extraneous 'r' so each continuation ends with a real backslash
+ LF.
2026-04-15 09:16:03 +02:00