Commit Graph

30 Commits

Author SHA1 Message Date
Jason db6b220f92 ci(fpga): PR-M.3 — wire T-6 drift cosim into regression + CI deps
Adds the T-6 independent reference drift cosim (PR-M.1, c30be89) as a
gated regression check so any future hand-edit drift in NCO_SINE_LUT,
fft_twiddle_*.mem, or DOPPLER_WINDOW_COEFF surfaces on every run.

run_regression.sh: new "Independent Reference Drift (T-6)" check after
the RX-B autocorrelation block in Phase 3. Plain `python3` (no path
sniffing). Distinguishes three states from the script's exit code +
markers:
  rc=0,  PASS markers -> PASS (counts toward `passed`)
  rc=2,  no markers   -> SKIP (counts toward `skipped`)
  rc!=0, FAIL markers -> FAIL (gates the regression)

compare_independent.py: detects missing numpy/scipy at startup and exits
with code 2 plus a [SKIP] marker pointing at `uv sync --group dev`.
Without that, an environment without scipy crashed mid-script and the
regression captured a partial 3-of-13 PASS count.

pyproject.toml: scipy>=1.13 added to the dev dependency group (used by
fpga_reference.doppler_window_ideal() for analytical Cheby ground truth).

.github/workflows/ci-tests.yml: fpga-regression now installs Python
3.12, sets up uv, runs `uv sync --group dev`, and activates the
resulting .venv before bash run_regression.sh. Without the activate
line the runner's system python3 (no scipy) would resolve first and
the drift check would [SKIP] in CI.

Verified locally:
  with venv:    Drift PASS (13 checks), Tests: 43 passed / 0 / 0
  no scipy:     Drift SKIP (msg points at install cmd), 42p / 0f / 1s
2026-05-01 18:53:09 +05:45
Jason 237e74ceba test(realdata): PR-K — synthetic regen of doppler/fullchain realdata fixtures
Replaces the legacy ADI CN0566 .npy capture flow with a synthetic radar
scene generated by tb/cosim/real_data/gen_realdata_hex.py via the
existing radar_scene + fpga_model bit-accurate Python models.

Dimensions now match production radar_params.vh:
  RP_FFT_SIZE=2048, RP_DECIMATION_FACTOR=4, RP_NUM_RANGE_BINS=512,
  CHIRPS_PER_FRAME=48, NUM_DOPPLER_BINS=48 (3 sub-frames x 16-pt FFT).

Previously both TBs were pinned to legacy 32-chirp / 2-subframe / 1024->64
DECIM=16 dimensions. range_bin_decimator.v's 2-bit comparisons against
DECIMATION_FACTOR/2 only behave correctly for small DECIM, so the old
DECIM=16 path no longer worked even though the TBs compiled — that is
why Full-Chain Real-Data was reporting pass=0/fail=3.

Changes:
  tb/cosim/real_data/gen_realdata_hex.py  (new) - synthesises 6 fixture
    files from a 2-target scene via DopplerProcessor (3-subframe) and
    RangeBinDecimator (peak, 2048->512). Reproducible (fixed seed 42).

  tb/cosim/real_data/golden_reference.py  (deleted, 1436 lines) - the
    legacy generator depended on out-of-tree ADI .npy captures and
    modelled only the 2-subframe / 32-chirp path.

  tb/cosim/real_data/hex/  - 43 orphan artifacts deleted (CFAR / MTI /
    notched / detection / range-FFT debug dumps that nothing in the
    active TB or regression was loading); 6 fixtures regenerated at
    production dimensions:
      doppler_input_realdata.hex     24576 packed lines (was 2048)
      doppler_ref_{i,q}.hex          24576 lines each   (was 2048)
      fullchain_range_input.hex      98304 packed lines (was 32768)
      fullchain_doppler_ref_{i,q}.hex 24576 lines each  (was 2048)

  tb/tb_doppler_realdata.v          - CHIRPS 32->48, RANGE_BINS 64->512,
                                       DOPPLER_FFT 32->48, MAX_CYCLES bumped.
  tb/tb_fullchain_realdata.v        - same + INPUT_BINS 1024->2048,
                                       DECIM_FACTOR 16->4, fixed
                                       decim_bin_index width to
                                       RP_RANGE_BIN_WIDTH_MAX, fixed
                                       start_bin width 10->11.

  run_regression.sh                 - "Doppler Real-Data" label updated
                                       (no longer "ADI CN0566"); both
                                       realdata tests get explicit
                                       --timeout values (300 / 600 s).

Standalone results:
  tb_doppler_realdata    - 24584/24584 PASS (3.36 s sim, ~50 s wall)
  tb_fullchain_realdata  - 24585/24585 PASS (4.10 s sim, ~5 min wall)

Full regression now: 41 passed / 1 failed (only remaining FAIL is
FFT Engine, pre-existing pre-PR-K regex-reveal — unrelated).
2026-05-01 14:26:54 +05:45
Jason 81d6f210cb test(integration): PR-I.4 — wire new TBs into regression, retire tb_system_e2e
run_regression.sh replaces "System E2E (tb_system_e2e)" + "System E2E
USB_MODE=1 (FT2232H)" with the three PR-I subsuites (tb_system_opcodes,
tb_system_mechanics, tb_system_dataflow). SKIP count for --quick mode
bumped 5 -> 6 to match. "System Top USB_MODE=1 (FT2232H)" via
radar_system_tb.v is kept as a structural smoke test.

Dataflow gets --timeout=600 (vs 300 default). Its 18 ms sim takes
~430-450 s wall on this host; the 300 s default killed it at ~12.4 ms,
before the test logic block ran, yielding UNKNOWN. With 600 s, the TB
finishes cleanly and G2.2/G4.1/G4.2 all pass (3/3). The
matched_filter_multi_segment ST_WAIT_FFT hang documented in the TB
header still affects deeper coverage (G4.4 doppler, G5.x USB egress,
G9.x reset recovery), which remain deferred to PR-J.

tb_system_e2e.v removed (1294 lines) — coverage is fully replaced by
the focused subsuites; its USB_MODE=1 BFM was structurally broken
(wired only the FT601 ports, leaving the FT2232H DUT ports dangling),
which is why a USB_MODE=1 variant could "pass" without exercising the
production FT2232H path.

tb_usb_protocol_v2.v comment updated to point at tb_system_opcodes
for opcode-dispatch integration coverage.
2026-05-01 13:37:16 +05:45
Jason 65f1e02766 fix(regression): allow leading whitespace in [PASS]/[FAIL] anchors
Three regex sites (run_test, run_mf_cosim, run_doppler_cosim) anchored
at column 0 with `^\[PASS|^\[FAIL`, but most TBs emit `  [PASS]` /
`  [FAIL]` from `task check;` formatting. Anchors silently matched
zero markers, the fallback "did anything reach $finish" path
reported PASS, and 48 real failures across tb_system_e2e (×2 modes),
tb_fft_engine, and tb_fullchain_realdata went unnoticed across PR-D..G.

Switch all three anchors to `^[[:space:]]*\[PASS|^[[:space:]]*\[FAIL`.
No RTL change. Surfaces the truth — does not fix the underlying
test failures (tracked separately as T-2..T-10 in PR-Tests-1 / PR-I).
2026-05-01 10:45:15 +05:45
Jason a1a8fa7107 chirp-v2 PR-E: plfm_chirp_controller_v2 + scheduler-driven TX via async-FIFO
Replaces plfm_chirp_controller_enhanced (5-state FSM with hardcoded
LONG/SHORT timings + 60-entry inline short LUT) with plfm_chirp_controller_v2,
a pure DAC playback driver: IDLE -> CHIRP -> IDLE keyed off a 1-cycle
dst_chirp_valid pulse, with sample count selected by dst_wave_sel
(SHORT=120 / MEDIUM=600 / LONG=3600). Inter-chirp timing (LISTEN, GUARD,
frame boundaries) is now owned exclusively by chirp_scheduler.

Scheduler -> TX bridge: cdc_async_fifo (Cummings style #2, WIDTH=2 DEPTH=4)
crosses {wave_sel} from clk_100m to clk_120m_dac, with chirp_pulse as
src_valid. frame_pulse rides a separate toggle CDC for chirp_counter
clear and the new_chirp_frame status output. mixers_enable now also gates
the scheduler so it stays in S_IDLE while the radar is "off" — without
this gate the first chirp_pulse fires at reset and gets dropped before
mixers come up.

Files:
- NEW  plfm_chirp_controller_v2.v      DAC playback driver (3 LUTs, FSM)
- DEL  plfm_chirp_controller.v         legacy controller (382 lines)
- DEL  long_chirp_lut.mem              legacy LUT (3600 lines), replaced
                                       by tx_long_lut.mem from PR-B
- chirp_scheduler.v       + mixers_enable input (master quiesce)
- radar_receiver_final.v  + sched_*_out output ports + mixers_enable_100m
- radar_system_top.v      wire sched_*_out -> tx_inst.sched_*; pass
                          stm32_mixers_enable_100m to rx_inst
- radar_transmitter.v     full rewrite: drop new_chirp edge detector +
                          toggle CDC, instantiate cdc_async_fifo for
                          {wave_sel}, toggle CDC for frame_pulse,
                          plfm_chirp_controller_v2 in place of _enhanced
- tb/tb_chirp_controller.v  + tb/tb_chirp_contract.v  rewritten for v2
                          contract (43/43 unit + 10/10 contract green)
- tb/tb_radar_receiver_final.v  + .mixers_enable_100m(1'b1) pin
- run_regression.sh, scripts/200t/build_200t.tcl  file-list bumped

Test summary:
- tb_chirp_controller_v2:   43/43 PASS
- tb_chirp_contract:        10/10 contracts upheld
- tb_rxb_fullchain:         peak 24033 ~80x (parity with PR-D)
- tb_mti_canceller:         43/43 PASS
- tb_system_e2e:            33/49 (1 new vs 34/49 PR-D baseline: G2.2
                            new_chirp_frame, intentional v2 frame-pulse
                            semantics — fires once per Doppler frame
                            instead of once per stm32 chirp toggle.
                            TB needs widening in PR-H to wait the full
                            frame.)
2026-04-30 21:51:46 +05:45
Jason 8e8f3e60c4 chirp-v2 PR-D: chirp_scheduler replaces radar_mode_controller; MF/MTI wave_sel-native
Single 100 MHz scheduler emits wave_sel[1:0] and chirp_pulse natively. Modes
00 (STM32 pass-through), 01 (auto-scan over SHORT/MEDIUM/LONG sub-frames),
10 (single-chirp debug), 11 (track dwell with watchdog scan-fallback after
RP_DEF_TRACK_WATCHDOG_FRAMES=5 idle frames). Sub-frame mask lets ops drop a
waveform without recompiling.

Drops the receiver_final wave_sel shim added in PR-C: wave_sel comes
straight from the scheduler; chirp_pulse replaces the old mc_new_chirp
toggle + XOR edge converter. matched_filter_multi_segment and mti_canceller
take wave_sel[1:0] and chirp_pulse directly — no parallel paths.

multi_segment also bumped: SHORT_CHIRP_SAMPLES 50 -> 100 (V2 1 us SHORT)
and MEDIUM_CHIRP_SAMPLES = 500 (5 us). LONG path unchanged. Dead
mc_new_elevation/azimuth XOR converters removed.

Deletes radar_mode_controller.v, formal/fv_radar_mode_controller.v, and
tb/tb_radar_mode_controller.v. Build manifests (run_regression.sh,
scripts/200t/build_200t.tcl) updated. Receiver_final pins medium/track/
subframe_enable inputs to RP_DEF_* defaults until PR-G plumbs USB opcodes.

Verification:
- tb_rxb_fullchain_latency: peak |I|+|Q|=24033 at bin 0, ~80x peak/mean
  (up from PR-C's 15115 since matched filter now uses full 100 SHORT samples)
- tb_mti_canceller: 43/43 PASS with new wave_sel[1:0] input
- tb_radar_receiver_final: 8/8 PASS, ALL TESTS PASSED
- tb_system_e2e: 34/49 PASS - identical to pre-PR-D baseline (15 failures
  are pre-existing matched-filter cycle-budget skips); G8.2/G8.3 chirp_scheduler
  probes PASS
- tb_multiseg_cosim: 16/32 - same as pre-PR-D baseline
2026-04-30 20:52:32 +05:45
Jason 4238eb1b99 chirp-v2 PR-C: chirp_reference_rom replaces chirp_memory_loader_param
Drop the chirp-v1 1-bit use_long_chirp memory loader and its 6 .mem files;
introduce chirp_reference_rom — wave_sel-native, single 8192x16 BRAM array
per Q15 lane, 4-region init (SHORT, MEDIUM, LONG seg0/seg1) loaded from the
PR-B mem files. Same 1-clk read latency as the legacy module so the RX-B
autocorrelation alignment fix carries through unchanged.

Receiver-side wave_sel shim added in radar_receiver_final.v:
  wire [1:0] wave_sel = use_long_chirp ? RP_WAVE_LONG : RP_WAVE_SHORT;
This is a 1-line transitional bridge while radar_mode_controller still
emits 1-bit use_long_chirp; PR-D deletes the shim and wires chirp_scheduler
straight through. MEDIUM is loaded into the ROM but unreachable through
the production path until PR-D.

BRAM cost: 8 RAMB18 (was 6 in chirp-v1). +2 BRAM is the cost of adding
MEDIUM to the waveform set; not avoidable.

Files added:
  - chirp_reference_rom.v
Files removed:
  - chirp_memory_loader_param.v
  - long_chirp_seg{0,1}_{i,q}.mem (4 files)
  - short_chirp_{i,q}.mem (2 files)
  - tb/cosim/validate_mem_files.py (legacy file-set validator; replaced by
    gen_chirp_mem.py's internal verify_phase_match)
  - tb/cosim/analyze_short_chirp_mismatch.py (one-shot tool from the
    chirp-v1 TX-I investigation; finding incorporated, references the
    deleted short_chirp_*.mem files)
Files updated for module rename:
  - radar_receiver_final.v        — instance, comments, wave_sel shim
  - radar_mode_controller.v       — header comment
  - matched_filter_processing_chain.v — header comment
  - scripts/200t/build_200t.tcl   — explicit RTL list
  - run_regression.sh             — 5 spots
  - tb/tb_rxb_fullchain_latency.v — instance, wave_sel shim, mem filenames,
                                    SHORT_LEN 50 → 100 (1 µs at 100 MHz)
  - tb/tb_system_e2e.v            — header comment

Verification:
  - chirp_reference_rom standalone iverilog compile: clean
  - Full receiver chain compile (21 RTL files): clean
  - tb_rxb_fullchain_latency runs end-to-end with new ROM + new mem files
    + 100-sample SHORT chirp; autocorrelation peak at bin 0, peak |I|+|Q|
    = 15115. Confirms 1-clk ROM read latency is preserved and the RX-B
    direct-wire-with-1-FF alignment still holds.
  - 50T build script (scripts/50t/build_50t.tcl) uses glob *.v — no edit
    needed; it picks up the new file automatically.
2026-04-30 19:37:43 +05:45
Jason 58d2e1ba10 AUDIT-C11: replace Gray-CDC at CIC→FIR with home-grown async FIFO
cdc_adc_to_processing carries multi-bit data across 400→100 MHz via
TWO independent synchronizer chains (data Gray-encoded + a separate
2-bit toggle). Under metastability, the chains can resolve on
different cycles, letting the destination latch a half-resolved Gray
word that decodes to an arbitrary value. Audit C-11. Practical MTBF
is years per event but the design is non-conformant for arbitrary
multi-bit data — Gray code's single-bit-flip protection only holds
for ±1 transitions, not for CIC samples that can change by hundreds
of LSBs.

Replace with cdc_async_fifo, a Cummings SNUG-2002 style #2 async
FIFO. Data does NOT cross domains; it sits in dual-clock distRAM
(write port src_clk, read port dst_clk). Only the read/write
Gray-coded POINTERS cross — and pointers genuinely change ±1 per
increment, so Gray code's protection is correct by construction.
Home-grown rather than XPM_FIFO_ASYNC: vendor-neutral (iverilog can
simulate it directly, no SIM stub), keeps the project's existing
home-grown CDC convention (3 sibling primitives in cdc_modules.v),
and avoids XPM library version skew.

Port shape is preserved (same WIDTH=18, same dst_data/dst_valid/
overrun semantics — 1-cycle pulse per read in steady state) so the
swap is local to two instantiations in ddc_400m.v. Sticky-overrun
aggregation downstream is unchanged.

XDC: project already has blanket set_false_path on
clk_100m ↔ adc_dco_p, which covers both new pointer crossings.
Synchronizer FFs carry ASYNC_REG="TRUE" for placement-aware MTBF.
No XDC change needed.

New TB tb_cdc_async_fifo.v exercises 7 groups (28 checks): reset,
single-sample passthrough, multi-Gray-bit-flip (0x00000 ↔ 0x3FFFF —
audit's recommended coverage point, asserts NO intermediate values
appear at dst_data), matched-rate continuous stream, sustained-burst
overrun, drain-to-empty, and mid-stream reset.

Resource: 8 LUTRAMs per instance × 2 instances = 16 LUTRAMs (~0.05%
of XC7A50T budget).

Verified: full FPGA regression 42/42 PASS (was 41/41; +1 new test,
0 regressions in DDC Chain / Doppler Co-Sim / Full-Chain Real-Data
/ Receiver Integration / System Top / System E2E / MF Co-Sim — all
of which exercise the swap path through the production signal
chain). 0 lint errors.
2026-04-30 10:47:31 +05:45
Jason 9bed35287a AUDIT-C16: parameterize NUM_CELLS + sample_counter width for 200T
Pre-fix usb_data_interface.v hardcoded `localparam [14:0] NUM_CELLS =
15'd16384` for the 50T 512-range x 32-doppler layout. On 200T builds
with SUPPORT_LONG_RANGE defined, RP_MAX_OUTPUT_BINS=4096 makes a real
frame 131072 cells, so the fixed value caused two distinct defects:

  (a) value: counter wrapped 8x per real frame; bit-7 frame-start
      marker fired 8x at incorrect host-frame offsets, silently
      desyncing the GUI parser
  (b) width: 15 bits could not represent 131072 (needs 17 bits)

Fix: derive NUM_CELLS = RP_MAX_OUTPUT_BINS * RP_NUM_DOPPLER_BINS and
counter width = RP_DOPPLER_MEM_ADDR_W (14 on 50T, 17 on 200T) from
radar_params.vh, so both scale together with the build define.

Tests:
- tb_audit_c16_num_cells.v: standalone counter-block exerciser (T1
  reset, T2 increment, T3 wrap at NUM_CELLS-1, T4 exactly 2 markers
  across 2*NUM_CELLS ticks, T5 top-bit observability) -- 6/6 PASS at
  both 50T (NUM_CELLS=16384, CTR_W=14) and 200T (131072, 17).
- tb_usb_data_interface.v: existing test 7-8 retargeted from the old
  hardcoded `>=15` / `==15'd16384` invariant to the new parameterized
  one (`==RP_DOPPLER_MEM_ADDR_W` / `==RP_MAX_OUTPUT_BINS*RP_NUM_DOPPLER_BINS`).

Regression: 41/41 PASS (+2 new entries: 50T default + 200T
`+define+SUPPORT_LONG_RANGE`).
2026-04-29 23:01:41 +05:45
Jason 58154a6bf1 fpga: split gpio_dig5/dig7 by fault class (AUDIT-S10)
gpio_dig5 (PD13) previously OR'd six flags — four signal-saturation
classes (AGC, DDC overflow, DDC saturation, MTI saturation) and two
control-fault classes (range-decimator watchdog from F-6.4, CIC->FIR
CDC overrun from F-1.2). The MCU outer-loop AGC reduces RF gain on
PD13 assertion, which is the wrong response to a watchdog or CDC
stall — it just hides the stall behind a quiet receive chain. gpio_dig7
(PD15) was tied 1'b0 as "reserved".

Split:
  gpio_dig5 = signal-saturation only (AGC continues to react correctly)
  gpio_dig7 = control-fault classes

Telemetry: status_words[5][6:5] now exposes the two control-fault
classes in BOTH legacy (FT601) and FT2232H USB variants, with 2-FF
level CDC sync from clk_100m to ft601_clk_in / ft_clk. Bit [7] is
reserved. AUDIT-C12's frame_drop_count at [31:25] is preserved.

50T XDC H12 -> gpio_dig7 pin already assigned (audit AUDIT-C15-era);
no XDC change.

Test: tb/tb_audit_s10_gpio_split.v 17/17 PASS — exercises both the
combinational GPIO split and the CDC status-word packing path.
Regression: 39/39 PASS (was 34/34).
2026-04-29 20:06:52 +05:45
Jason 59f3c82fbb fpga: wire AD9484 PWDN to host opcode 0x32 (AUDIT-S25)
`radar_receiver_final.v:246` had `assign adc_pwdn = 1'b0;` -- the AD9484
PWDN pin was hard-tied LOW with no path for the host or MCU to assert
it. Combined with AUDIT-C13 (CSB hard-tied HIGH on the production board,
no SPI access to the AD9484), the ADC was fully un-recoverable from a
stuck state without dropping main power -- which also drops the
VBAT-backed BKPSRAM persistence (MCU-A4 OCXO warmup, MCU-A7 emergency
flag) and forces a 180 s warmup soak.

Opcode 0x32 was reserved during the AUDIT-C3 fix (commit 24ef5e7) for
exactly this purpose. Wire it through:

  - `radar_system_top.v` adds `reg host_adc_pwdn` next to `host_adc_format`,
    resets to 1'b0 (matches historical hard-tied state -- preserves
    bringup behavior), latches `usb_cmd_value[0]` on opcode 0x32, drives
    the new receiver input port.
  - `radar_receiver_final.v` adds `input wire host_adc_pwdn`, replaces the
    hard-coded `assign adc_pwdn = 1'b0` with `assign adc_pwdn = host_adc_pwdn`.
  - No CDC: `host_adc_pwdn` is a stable single-bit level driven from the
    clk_100m register straight to the I/O pad. AD9484 PWDN is asynchronous
    w.r.t. the ADC clock; the chip re-acquires its DLL on PWDN deassert.

XDC pin assignments were already in place from AUDIT-C15 (50T:T5,
200T:P20, both LVCMOS25 driving the AD9484 PWDN net via the R36/R37
divider on the Main Board).

Verification:
  - new tb/tb_adc_pwdn_opcode.v, 15/15 PASS:
      T1 reset -> host_adc_pwdn=0, adc_pwdn pin=0 (ADC powered up)
      T2 opcode 0x32 val=1 -> host_adc_pwdn=1, pin=1 (PWDN asserted)
      T3 opcode 0x32 val=0 -> cleared
      T4 only bit[0] consumed (upper bits ignored)
      T5 unrelated opcodes (0x33, 0x01) don't disturb host_adc_pwdn
      T6 cmd_valid_100m gating works
  - Quick regression 33/33 PASS (was 32/32; +1 new test, 0 regressions)
  - Lint: 0 errors
2026-04-29 19:37:37 +05:45
Jason ea2615ef84 doppler: gate S_IDLE→S_ACCUMULATE on frame_start_pulse (AUDIT-S3)
Pre-fix S_IDLE had two independent if-branches: one for frame_start_pulse
(resets pointers) and one for data_valid (transitions to S_ACCUMULATE).
A data_valid arriving before frame_start_pulse would advance the FSM with
whatever pointers happened to be live, and the BRAM write block would write
the sample into mem_write_addr = (write_chirp_index*RANGE_BINS) + 0.

In current operation the race is benign — end-of-S_ACCUMULATE always zeros
write_chirp_index/write_range_bin (line 287-288) and the MF pipeline latency
(~165 µs) is millions of cycles longer than the frame_start CDC latency
(~50 ns), so frame_start always arrives first. But the FSM relies on an
undocumented system-level invariant; a future code path that leaves
pointers stale on entry to S_IDLE would silently corrupt the first sample.

Fix: add a `frame_armed` register set when frame_start_pulse arrives in
S_IDLE, cleared on transition to S_ACCUMULATE. Both the FSM transition and
the BRAM write block gate on `(frame_start_pulse || frame_armed)`. The OR
admits the same-cycle case where both arrive together (write to addr 0
still resolves correctly because both blocks use the same gate).

Verification: tb_doppler_frame_start_gate 21/21 PASS, quick regression
32/32 PASS (was 31/31; +1 new test, 0 regressions). tb_doppler_realdata
(full FFT pipeline) still passes — gate transparent to normal operation.
2026-04-29 18:36:31 +05:45
Jason e67368d621 ft2232h: add frame drop counter (AUDIT-C12) + cfar RMW cadence guard (AUDIT-S22)
AUDIT-C12: usb_data_interface_ft2232h had a misleading single-buffer comment
that overstated the timing slack and referenced a frame_ack_toggle CDC that
was never implemented. Re-verified actual numbers: at 178 fps the slack is
1.14 ms (20%), not "much shorter than gap". No data corruption today (write
order matches read order, addresses don't collide), but frame_complete
firing while WR_FSM is still draining the previous frame causes silent
frame drops via the missed frame_ready_toggle edge.

Fix is instrumentation, not architectural rework: add wr_done_toggle
(ft_clk -> clk CDC) on WR_DONE -> WR_IDLE, track frame_pending in clk
domain, count drops in 7-bit saturating frame_drop_count, surface in
unused upper 7 bits of status_words[5]. Host now has visibility into the
failure mode if margin ever shrinks (faster frame rate or USB bandwidth
shortfall). Replaced misleading comment with corrected timing breakdown.

AUDIT-S22: cfar_ca emits one detection per 3 cycles (THR/MUL/CMP); the
detection RMW takes 3 cycles. Match by construction today, fragile against
any CFAR speedup. Added a header comment in cfar_ca.v documenting the
dependency, and a SIMULATION-only assertion in usb_data_interface_ft2232h.v
that fires [ASSERT FAIL] AUDIT-S22 if cfar_valid arrives while RMW busy.
Catches silent-drop regressions in the test suite.

Verification: new tb_ft2232h_frame_drop.v with 5 scenarios (no drops /
stalled drops / multi-drop / recovery / saturation at 127) - 10/10 PASS.
Quick regression 31/31 PASS (was 30/30; +1 new test, 0 regressions).
2026-04-29 17:51:30 +05:45
Jason 0c82de54a2 fft_engine_axi_bridge: respect axi_din_tready with 1-deep skid buffer
Bug: bridge advanced in_count and asserted tlast on din_valid alone,
ignoring the IP's tready handshake. With LogiCORE FFT v9.1 in
nonrealtime throttle mode (per .xci), tready can deassert briefly
during BFP normalization or pipeline events, silently dropping input
samples and shifting tlast off-by-N.

Fix: add 1-deep skid buffer + AXI-correct handshake. Phase 1 drains
the active beat when the IP accepts it (and shifts skid up); Phase 2
loads new upstream samples respecting post-handshake slot availability.
Track accept_count separately from in_count to drive the S_FEED->S_DRAIN
transition on the Nth accepted beat. Sustained 2+ cycle backpressure
exhausts the skid and sets overflow_sticky for debug visibility.

Audit cross-refs (AUDIT-C10):
- "tready ignored" - CONFIRMED, fixed here
- "SCALE_SCH unset" - REFUTED (BFP mode uses tuser, not cfg_tdata)
- "output ordering not configured" - REFUTED (.xci natural_order)

Verification: new tb_fft_engine_axi_bridge.v with stub xfft_2048
exercises 4 backpressure patterns (none / dip-at-3 / dip-at-100 /
3-cycle sustained). Quick regression 30/30 PASS.
2026-04-29 17:24:21 +05:45
Jason 4f0b82de6e test(fpga): receiver-integration — fix tb wiring + skip-guard XSim-only checks
tb_radar_receiver_final had three pre-existing issues that all surfaced as
fails in regression (32 passed, 2 failed before; 34 passed, 0 after):

1. host_range_mode was undriven (floating 2'bzz); rmc log confirmed
   "Auto-scan starting, range_mode=z". Add explicit 2'b01 (long-range
   dual-chirp) for the test scenario.

2. DDC_MAX_ENERGY threshold (2^56) was sized for an unspecified earlier
   stimulus; the test feeds a deliberately-loud 120 MHz sawtooth that
   produces ~1.27e17 energy over 2M samples. Raised to 2^60 (~10x
   observed) so B1b catches true overflow without false-firing.

3. The 9 doppler-frame-dependent checks (S4-S9, G1, B2a, B3, B4) need
   ~108 ms simulated time to fill a 32-chirp Doppler frame because the
   in-house fft_engine takes ~340 K cycles per multi-segment chirp
   (RX-NEW-3, commit 5c8cc8c). Iverilog can't elaborate the Xilinx FFT IP
   that would make this tractable. Guard those checks behind
   `ifdef FFT_USE_XILINX_IP` so iverilog cleanly SKIPs them with an
   explanatory line; XSim with the IP runs them normally.

Also tightens run_regression.sh's pass/fail regex from
^\[(PASS|FAIL)([^]]*)\] to ^\[(PASS|FAIL)( [0-9]+)?\] so informational
tags like [FAIL-INFO] (used to document the known RX-NEW-1 fft_engine
bin-shift in tb_matched_filter_processing_chain.v) no longer false-fire
as real failures. The Matched Filter Chain test goes from FAIL (40 pass,
2 false-fails) to PASS (40 checks).

Regression: 34 passed, 0 failed.
2026-04-29 11:41:40 +05:45
Jason b7ac2de1a4 chore: delete dead latency_buffer; doc cleanup for two stale comments
latency_buffer.v has had zero non-tb instantiations since RX-B (2026-04-23)
replaced its hookup in radar_receiver_final with a 1-FF alignment register.
The module was being kept "for potential future use" — exactly the kind of
dead weight the codebase does not need. Deleted, along with all build /
test infrastructure that dragged it along:

  - 9_Firmware/9_2_FPGA/latency_buffer.v
  - 9_Firmware/9_2_FPGA/tb/tb_latency_buffer.v
  - run_regression.sh: removed from RTL_FILES and RECEIVER_RTL
  - scripts/200t/build_200t.tcl: removed from synthesis source list
  - tb/tb_system_e2e.v: removed from header compile-string example
  - tb/cosim/validate_mem_files.py: deleted test_latency_buffer() (~75 lines),
    its call site, and the corresponding entry in the module docstring

Historical RX-B comments referencing latency_buffer in radar_receiver_final.v,
tb_rxb_fullchain_latency.v, and tb_rxb_latency_measure.v are kept — they
explain WHY the module was removed, which is still useful design archaeology.

Two doc-only housekeeping touches bundled in:

  - plfm_chirp_controller.v: replaced two empty "CRITICAL FIX: Generate
    valid signal" labels at LONG_CHIRP and SHORT_CHIRP with one shared
    chirp_valid policy comment block above LONG_CHIRP that explains the
    actual rationale (downstream FIFO underrun on trailing samples).

  - v7/models.py: replaced the "range_resolution and velocity_resolution
    should be calibrated" docstring (sounded like an open TODO but was a
    documented placeholder) with a clear pointer to the GUI-C3 fix in
    workers.py:RadarDataWorker so future readers know the live path
    derives correct values from WaveformConfig.

FPGA quick regression unchanged: 28/29 (1 fail is the unrelated iverilog/
Xilinx-IP RX-NEW-3 gap). GUI suite 180/180. Ruff clean.
2026-04-28 12:52:13 +05:45
Jason 5c8cc8c96a feat(fpga): swap matched-filter chain to Xilinx LogiCORE FFT v9.1 IP
Replaces the in-house iterative fft_engine.v in the matched-filter chain
with the Pipelined Streaming Xilinx FFT IP, closing RX-NEW-3 (FFT chain
~11x too slow vs PRI budget).

Components:
  * ip/xfft_2048_ip/xfft_2048_ip.xci — committed IP definition
    (16-bit fixed point, BFP scaling, convergent rounding, natural order,
    pipelined-streaming, BRAM data/reorder/phase factors). Vivado
    regenerates .dcp / sim-netlist from this on each build.
  * scripts/50t/gen_xfft_2048_ip.tcl — IP-Catalog generation script
  * scripts/50t/run_xfft_xsim.sh — XSim batch runner for tb_xfft_2048_xsim
  * xfft_2048.v — AXI-Stream wrapper. FFT_USE_XILINX_IP define routes to
    real LogiCORE for synth/XSim; falls back to fft_engine batched
    one-shot for iverilog (unit coverage only).
  * fft_engine_axi_bridge.v — exposes legacy fft_engine port surface on
    top of the xfft_2048 AXI wrapper, so the chain swap is a 1-line
    module-name change.
  * matched_filter_processing_chain.v — fft_engine -> fft_engine_axi_bridge
  * scripts/50t/build_50t.tcl — read_ip + generate_target + synth_ip;
    adds FFT_USE_XILINX_IP to verilog defines.
  * tb/tb_xfft_2048_xsim.v — XSim verification (DC, impulse, tone bin 128).
    All 5 assertions PASS on remote with the real IP; tuser=0x0a (BLK_EXP=10)
    confirms BFP scaling working.

Local iverilog regression: 32/34 PASS — identical to baseline. Same two
RX-NEW-3 failures (Receiver Integration, Matched Filter Chain) — these
only resolve in remote XSim with the real IP, since iverilog uses the
fft_engine fallback inside xfft_2048 (~150K cycles/pass, not the
~2200-cycle Pipelined Streaming throughput). MF cosim 4/4 PASS confirms
bridge bit-exact in fallback mode.

Pending: remote XSim of tb_radar_receiver_final to demonstrate Doppler
frames produced within PRI budget; remote synth to confirm DSP/timing
post-IP.
2026-04-23 12:39:33 +05:45
Jason f1f69ca623 ci(fpga): wire RX-B latency tests; fix downstream compile after inline-FFT removal
- run_regression.sh: add frequency_matched_filter.v to PROD_RTL and RECEIVER_RTL
  compile groups (was implicitly required after inline behavioural FFT in
  matched_filter_processing_chain.v was removed); empty EXTRA_RTL with set -u
  guards; bump Matched Filter Chain timeout to 600s.
- run_regression.sh: add two PHASE 3 tests — tb_rxb_latency_measure (chain
  pipeline depth) and tb_rxb_fullchain_latency (multi-segment + chain).
- radar_receiver_final.v: replace dangling delayed_ref_i/q references (left
  over from latency_buffer removal) with ref_chirp_real/imag.
- tb/tb_radar_receiver_final.v: chain-state debug uses production
  collect_count/out_count signals instead of the deleted SIMULATION-only
  fwd_in_count.
- tb/tb_rxb_latency_measure.v: add explicit [PASS]/[FAIL] markers around the
  2007..2107 cycle expected-latency window.
2026-04-23 06:34:05 +05:45
Jason 5f3002a4d1 merge(wave2): manual resolution of 6 shared files — fft-2048 × p0 audit
Hand-merged files modified on both fix/pre-bringup-audit-p0 and
feat/fft-2048-upgrade. Wave 1 (commit 60e49c7) took 20 files from fft
verbatim; this wave resolves the overlap.

- run_regression.sh: 3-way merge. Adopts fft's ${RECEIVER_RTL[@]} array
  refactor and drops the self-blessing golden pair from p0. Skip count
  bumped to 5.

- usb_data_interface.v (FT601/200T): p0 FSM + clock-loss watchdog kept
  wholesale; widened stream_control 3 -> 6 bits to carry fft's extended
  mode bits through the CDC sync chain and the 0xFF status word.

- mti_canceller.v: fft's BRAM-inferred 512-range-bin implementation as
  the base, with p0's F-6.3 saturation counter grafted onto the d1
  pipeline stage. Overflow detection uses the top-two-bits disagreement
  on diff_{i,q}_full (DATA_WIDTH+1 signed).

- radar_receiver_final.v: fft's 2048-pt / 512-bin structure + p0
  diagnostic plumbing (ADC overrange sticky+CDC, DDC diagnostics,
  tx_frame_start edge detector replacing chirp_counter frame sync,
  mti_saturation_count, range_decim_watchdog).

- radar_system_top.v: clean 3-way merge, orthogonal regions
  (+38 / -27).

- usb_data_interface_ft2232h.v (FT2232H/50T): fft's per-frame bulk BRAM
  rewrite kept wholesale. Ported two p0 items that are orthogonal to
  the write FSM:
    * ft_clk-loss watchdog (heartbeat + 2FF ASYNC_REG sync + 16-bit
      timeout) ORed into a 2FF sync'd ft_effective_reset_n for the FSM.
    * rd_cmd_complete flag so RD_DEASSERT can distinguish a legitimate
      3-byte completion from an ft_rxf_n abort that also zeros
      rd_byte_cnt.

Deliberately NOT taken from 2401f5f: cic_decimator_4x_enhanced.v and
ddc_400m.v reset-strategy changes. Those conflict with p0's shipped
registered-sync-reset + max_fanout=25 distribution, which is already
timing-clean on the production build.
2026-04-21 02:12:04 +05:45
Jason 7a35f42e61 refactor(fpga): deduplicate RTL file lists in run_regression.sh
Extract RECEIVER_RTL and SYSTEM_RTL shared arrays to replace 6
near-identical file lists. New modules now only need adding once.
2026-04-16 17:07:01 +05:45
Jason f393e96d69 feat(fpga): make FT2232H default USB interface, rewrite FT601 write FSM, add clock-loss watchdog
- Set USB_MODE default to 1 (FT2232H) in radar_system_top.v; 200T build
  overrides to USB_MODE=0 via build_200t.tcl generic property
- Rewrite FT601 write FSM: 4-state architecture with 3-word packed data,
  pending-flag gating, and frame sync counter
- Add FT2232H read FSM rd_cmd_complete flag, stream field zeroing, and
  range_data_ready 1-cycle pipeline delay in both USB modules
- Implement clock-loss watchdog: ft_heartbeat toggle + 16-bit timeout
  counter drives ft_clk_lost, feeding ft_effective_reset_n via 2-stage
  ASYNC_REG synchronizer chain
- Fix sample_counter reset literal width (11'd0 -> 12'd0)
- Add FT2232H I/O timing constraints to 50T XDC; fix dac_clk comments
- Document vestigial ft601_txe_n/rxf_n ports (needed for 200T XDC)
- Tie off AGC ports on TE0713 dev wrapper
- Rewrite tb_usb_data_interface.v for new 4-state FSM (89 checks)
- Add USB_MODE=1 regression runs; remove dead CHECK 5/6 loop
- Update diag_log.h USB interface comment
2026-04-16 16:18:52 +05:45
Jason 519c95f452 fix: regenerate golden hex for dual-16pt Doppler and add real-data TBs to regression
Regenerate all real-data golden reference hex files against the current
dual 16-point FFT Doppler architecture (staggered-PRI sub-frames).
The old hex files were generated against the previous 32-point single-FFT
architecture and caused 2048/2048 mismatches in both strict real-data TBs.

Changes:
- Regenerate doppler_ref_i/q.hex, fullchain_doppler_ref_i/q.hex, and all
  downstream golden files (MTI, DC notch, CFAR) via golden_reference.py
- Add tb_doppler_realdata (exact-match, ADI CN0566 data) to regression
- Add tb_fullchain_realdata (exact-match, decim->Doppler chain) to regression
- Both TBs now pass: 2048/2048 bins exact match, MAX_ERROR=0
- Update CI comment: 23 -> 25 testbenches
- Fill in STALE_NOTICE.md with regeneration instructions

Regression: 25/25 pass, 0 fail, 0 skip. ruff check: 0 errors.
2026-04-09 02:36:14 +03:00
Jason 1e284767cd fix(test,docs): remove dead xfft_32 files, update test infra for dual-16 FFT, add regression guide
- Remove xfft_32.v, tb_xfft_32.v, and fft_twiddle_32.mem (dead code
  since PR #33 moved Doppler to dual 16-pt FFT architecture)
- Update run_regression.sh: xfft_16 in PROD_RTL, remove xfft_32 from
  EXTRA_RTL and all compile commands
- Update tb_fft_engine.v to test with N=16 / fft_twiddle_16.mem
- Update validate_mem_files.py: validate fft_twiddle_16.mem instead of 32
- Update testbenches and golden data from main_cleanup branch to match
  dual-16 architecture (tb_doppler_cosim, tb_doppler_realdata,
  tb_fullchain_realdata, tb_fullchain_mti_cfar_realdata, tb_system_e2e,
  radar_receiver_final, golden_doppler.mem)
- Update CONTRIBUTING.md with full regression test instructions covering
  FPGA, MCU, GUI, co-simulation, and formal verification

Regression: 23/23 FPGA, 20/20 MCU, 57/58 GUI, 56/56 mem validation,
all co-sim scenarios PASS.
2026-04-07 02:51:48 +03:00
Jason 4985eccbae Wire self-test results (0x31) to USB status readback path, add fpga_self_test to regression
- usb_data_interface.v: Add 3 self-test status inputs, expand status packet
  from 7 words (header + 5 data + footer) to 8 words (header + 6 data + footer).
  New status_words[5] carries {busy, detail[7:0], flags[4:0]}.
- radar_system_top.v: Wire self_test_flags_latched, self_test_detail_latched,
  self_test_busy to usb_data_interface ports. Add opcode 0x31 as status
  readback alias so host can read self-test results.
- tb_usb_data_interface.v: Add self-test port connections, verify word 5 in
  Group 16, add Group 18 (busy flag + partial failure variant). 81 checks pass.
- run_regression.sh: Add fpga_self_test.v to PROD_RTL lint list and system-
  level compile lists. Add tb_fpga_self_test as Phase 1 unit test.
- 24/24 regression tests pass, lint clean (0 errors, 4 advisory warnings).
2026-03-20 20:03:11 +02:00
Jason ed629e7559 Integrate MTI canceller and DC notch filter for ground clutter removal
MTI canceller (2-pulse, H(z)=1-z^{-1}) between range decimator and
Doppler processor. Subtracts previous chirp from current, nulling DC
Doppler (stationary clutter). Pass-through when host_mti_enable=0.

DC notch filter (post-Doppler, pre-CFAR) zeros bins within
+/-host_dc_notch_width of DC. Complements MTI for residual clutter.

New host registers: 0x26 (mti_enable), 0x27 (dc_notch_width).
Both default to 0 (disabled) - fully backward-compatible.

Verification: 23/23 regression, 29/29 MTI standalone, 3/3 real-data
co-sim (5137/5137 exact match) all PASS.
2026-03-20 16:39:17 +02:00
Jason f71923b67d Integrate CA-CFAR detector: replace fixed-threshold comparator with adaptive sliding-window CFAR engine (22/22 regression PASS)
- Add cfar_ca.v: CA/GO/SO-CFAR with BRAM magnitude buffer, host-configurable
  guard cells, training cells, alpha multiplier, and mode selection
- Replace old threshold detector block in radar_system_top.v with cfar_ca
  instantiation; backward-compatible (cfar_enable defaults to 0)
- Add 5 new host registers: guard (0x21), train (0x22), alpha (0x23),
  mode (0x24), enable (0x25)
- Expose doppler_frame_done_out from radar_receiver_final for CFAR frame sync
- Add tb_cfar_ca.v standalone testbench (14 tests, 24 checks)
- Add Group 14 E2E tests: 13 checks covering range-mode (0x20) and all
  CFAR config registers (0x21-0x25) through full USB command path
- Update run_regression.sh with CFAR in lint, Phase 1, and integration compiles
2026-03-20 04:57:34 +02:00
Jason e93bc33c6c Production fixes 1-7: detection bugs, cfar→threshold rename, digital gain control, Doppler mismatch protection, decimator watchdog, bypass_mode dead code removal, range-mode register (21/21 regression PASS)
Fix 1: Combinational magnitude + non-sticky detection flag (tb: 23/23)
Fix 2: Rename all cfar_* signals to detect_*/threshold_* (honest naming)
Fix 3: New rx_gain_control.v between DDC and FFT, opcode 0x16 (tb: 33/33)
Fix 4: Clamp host_chirps_per_elev to DOPPLER_FFT_SIZE, error flag (E2E: 54/54)
Fix 5: Decimator watchdog timeout, 256-cycle limit (tb: 63/63)
Fix 6: Remove bypass_mode dead code from ddc_400m.v (DDC tb: 21/21)
Fix 7: Range-mode register 0x20 with status readback (USB tb: 77/77)
2026-03-20 04:38:35 +02:00
Jason 0773001708 E2E integration test + RTL fixes: mixer sequencing, USB data-pending flags, receiver toggle wiring (19/19 FPGA)
RTL fixes discovered via new end-to-end testbench:
- plfm_chirp_controller: TX/RX mixer enables now mutually exclusive
  by FSM state (Fix #4), preventing simultaneous TX+RX activation
- usb_data_interface: stream control reset default 3'b001 (range-only),
  added doppler/cfar data_pending sticky flags, write FSM triggers on
  range_valid only — eliminates startup deadlock (Fix #5)
- radar_receiver_final: STM32 toggle signals wired through for mode-00
  pass-through, dynamic frame detection via host_chirps_per_elev
- radar_system_top: STM32 toggle signal wiring to receiver instance
- chirp_memory_loader_param: explicit readmemh range for short chirp

Test infrastructure:
- New tb_system_e2e.v: 46 checks across 12 groups (reset, TX, safety,
  RX, USB R/W, CDC, beam scanning, reset recovery, stream control,
  latency budgets, watchdog)
- tb_usb_data_interface: Tests 21/22/56 updated for data_pending
  architecture (preload flags, verify consumption instead of state)
- tb_chirp_controller: mixer tests T7.1/T7.2 updated for Fix #4
- run_regression.sh: PASS/FAIL regex fixed to match only [PASS]/[FAIL]
  markers, added E2E test entry
- Updated rx_final_doppler_out.csv golden data
2026-03-20 01:45:00 +02:00
Jason 94ffdb8f77 Add Phase 0 Vivado-style lint to regression runner, update golden data
Adds two-layer lint pass (iverilog -Wall + custom static checks) that
catches part-select OOB errors and case-without-default warnings before
pushing to remote Vivado. Catches the exact Synth 8-524 class error that
broke Build 18 initial attempt. Lint errors abort regression; warnings
are advisory. Regenerated golden data for BRAM-migrated matched filter.
2026-03-19 21:19:07 +02:00
Jason 463ebef554 CIC comb pipeline registers, BUFG sim guard, system TB fix, regression runner
- cic_decimator_4x_enhanced.v: Add integrator_sampled_comb and
  data_valid_comb_pipe pipeline stages between integrator sampling and
  comb computation to break the critical path (matches remote 40cda0f)
- radar_system_top.v: Wrap 3 BUFG instances in ifdef SIMULATION guard
  with pass-through assigns for iverilog compatibility
- radar_system_tb.v: Convert generate_radar_echo function to task and
  move sin_lut declaration before task (iverilog declaration-order fix),
  add modular index clamping to prevent LUT out-of-bounds
- run_regression.sh: Automated regression runner for all 18 FPGA
  testbenches with --quick mode. Results: 17 pass, 1 pre-existing fail
- .gitignore: Exclude *.vvp, *.vcd simulation artifacts
2026-03-19 11:31:46 +02:00