fix(fpga): PR-Z A6 — usb cfar dense bug end-to-end fix + e2e test

The PR-Z A6 e2e test (tb_e2e_dsp_to_host) exposed that the wire-format
cfar_dense map emitted by usb_data_interface_ft2232h was all-zero for
our deterministic single-target stimulus, even though cfar_ca's
in-flight outputs showed CONFIRMED at the expected cells (verified via
in-TB capture, E5/E6 PASS).

Deep instrumented debug (BRAM-WRITE, BRAM-READ, EGRESS-CAP probes)
revealed THREE independent bugs that combined to produce the all-zero
wire output. Each bug alone would have been visible; the way they
compounded made the symptom look like a single coarse failure.

Bug A — stale write address (radar_system_top.v):

  usb_inst.range_bin_in/doppler_bin_in were tied to notched_*_bin
  (= rx_*_bin = doppler_processor outputs). After doppler returns to
  S_IDLE its `output reg`s hold their last-driven values (511, 47).
  cfar_ca's CMP-phase emit (cycles ~520..73520 after frame_complete)
  fires cfar_valid with detect_range/detect_doppler set to its own
  per-cell scan counters, but those outputs were dangling — usb's
  RMW saw the doppler stale (511, 47) and slammed every cfar write
  to byte_addr {511, 47[5:2]} = bram[8187], past the 6144-byte wire
  range entirely.

  Fix: register cfar_detect_range/doppler in lockstep with the existing
  rx_detect_valid/rx_detect_class registration block (clk_100m_buf
  domain), then mux them into usb_inst.range_bin_in/doppler_bin_in on
  rx_detect_valid. doppler-magnitude write path is unaffected because
  doppler_valid and rx_detect_valid are mutually exclusive (BUFFER vs
  CMP phases of cfar_ca).

Bug B — BRAM read pipeline lag (usb_data_interface_ft2232h.v):

  The detect_rd_data <= detect_bram[detect_rd_addr] BRAM read port has
  1-cycle latency. WR_DETECT_DATA's emit FSM advanced detect_rd_addr
  and read detect_rd_data in the SAME edge — so cycle K read bram[K-2]
  (the addr from cycle K-1's commit) instead of bram[K-1]. Result:
  every cfar wire byte = bram[N-1] instead of bram[N], shifting the
  entire 6144-byte detect section +1 byte = +4 doppler bins. Doppler
  hides this naturally because its 2-byte-per-cell rhythm gives BRAM a
  free settling cycle between addr-set and emit-read.

  Fix: pre-load detect_rd_addr <= 1 and det_doppler_byte_idx <= 1 at
  every WR_DETECT_DATA entry transition (HDR direct, RANGE direct,
  DOPPLER → DETECT). BRAM produces bram[0] for the first emit cycle
  (settled since reset because detect_rd_addr was 0 throughout the
  preceding section) while the addr advance schedules bram[1] for the
  second emit cycle — and from then on the FSM's natural advance
  pattern keeps the pipeline aligned, including across the per-range
  boundary (det_doppler_byte_idx == DET_BYTE_LAST_PER_RANGE).

Bug C — detect_clearing window overlaps cfar's first 4 columns:

  detect_clearing fired 1 cycle after frame_complete and ran for 8192
  clk cycles (1 byte/cycle). cfar_valid writes were gated on
  `!detect_clearing` (line 512). cfar's CMP-phase emits start at
  frame_complete + ~520 cycles and run for ~73000 cycles, so the
  first ~7672 cycles (≈ 4 doppler columns) of cfar pulses were
  silently dropped. Test stimulus lit (67, 2/3) for sub-frame 0, all
  inside the clearing window → bytes lost. (67, 18/19) and (67, 34/35)
  for SF1/SF2 fell after clearing → captured correctly. Visible as
  one-byte mismatch (0x0A expected, 0x00 captured) at offset 49965
  (= cfar byte 804 = range 67, doppler 0..3) once Bugs A and B were
  fixed.

  Fix: move detect_clearing trigger from "1 cycle after frame_complete"
  to wr_done_pulse (USB-transfer-complete edge already CDC'd into clk
  via the AUDIT-C12 wr_done_sync chain). Clearing now runs in the dead
  zone after USB has finished reading frame N's BRAM, well before
  frame N+1's cfar starts CMP (~480k cycles of margin at 178 fps).
  First frame after reset relies on BRAM init=0 — added explicit
  initial block under `ifdef SIMULATION so iverilog matches Vivado's
  synthesis default.

Test infrastructure:

  - tb/tb_e2e_dsp_to_host.v new — deterministic single-target stimulus
    fed through the back-half of the radar pipeline (range_decim → MTI
    → doppler → DC-notch → cfar → registered sync → usb), 16 in-TB
    asserts + bit-exact byte capture.
  - tb/cosim/gen_e2e_stimulus.py / gen_e2e_expected.py new — Python
    deterministic stim + bit-exact frame golden.
  - tb/cosim/tb_e2e_dsp_to_host_parse.py new — parses captured frame
    via radar_protocol, runs 12 strict-bit-equality checks plus 16
    semantic checks (target == CONFIRMED, neighbors == NONE,
    DC-notched bins == NONE, etc).
  - run_regression.sh — A6 hookup + retired the two zero-assertion
    radar_system_tb USB_MODE=0/1 smoke runs and the 3-liveness-only
    tb_system_dataflow (subsumed by A6's stronger checks). Saves
    ~7 min wall.

Verification:

  - Local iverilog: in-TB 16/16 PASS, parser strict 28/28 PASS.
  - Remote Vivado 2025.2 xsim (Artix-7 target): in-TB 16/16 PASS,
    parser strict 28/28 PASS.
  - Full regression: 41 / 0 / 0.

The MODEL_USB_CFAR_BUG bug-model flag (used to keep the regression
green during development against buggy production) is removed — the
test is now strict bit-exact against the post-fix wire format.
This commit is contained in:
Jason
2026-05-06 01:20:19 +05:45
parent ce869e9e20
commit 9c231d85db
10 changed files with 1774 additions and 984 deletions
+45 -18
View File
@@ -616,10 +616,45 @@ if [[ "$QUICK" -eq 0 ]]; then
tb/tb_rx_final_reg.vvp \
tb/tb_radar_receiver_final.v "${RECEIVER_RTL[@]}"
# Full system top (monitoring-only, legacy)
run_test "System Top (radar_system_tb)" \
tb/tb_system_reg.vvp \
tb/radar_system_tb.v "${SYSTEM_RTL[@]}"
# A6 end-to-end DSP -> host test (PR-Z). Replaces the two zero-assertion
# `radar_system_tb` smoke runs (USB_MODE=0 + USB_MODE=1) that this PR
# supersedes. Three stages:
# 1. gen_e2e_stimulus.py - deterministic single-target stimulus
# 2. gen_e2e_expected.py - bit-exact Python golden (fpga_model)
# 3. tb_e2e_dsp_to_host.v - production-faithful chain
# (range_decim -> mti -> doppler -> dc_notch
# -> cfar -> sync -> usb_data_interface_ft2232h)
# 4. tb_e2e_dsp_to_host_parse.py - radar_protocol round-trip + section asserts
printf " %-45s " "E2E DSP-to-Host (PR-Z A6)"
set +e
a6_log=/tmp/a6_e2e_$$.log
{
python3 tb/cosim/gen_e2e_stimulus.py && \
python3 tb/cosim/gen_e2e_expected.py && \
iverilog -g2001 -DSIMULATION -o tb/tb_e2e_dsp_to_host.vvp \
tb/tb_e2e_dsp_to_host.v mti_canceller.v doppler_processor.v \
xfft_16.v fft_engine.v cfar_ca.v usb_data_interface_ft2232h.v \
edge_detector.v && \
timeout 300 vvp tb/tb_e2e_dsp_to_host.vvp && \
python3 tb/cosim/tb_e2e_dsp_to_host_parse.py
} > "$a6_log" 2>&1
a6_rc=$?
set -e
rm -f tb/tb_e2e_dsp_to_host.vvp
a6_tb_pass=$(grep -Ec '^[[:space:]]*\[PASS( [0-9]+)?\]' "$a6_log" || true)
a6_tb_fail=$(grep -Ec '^[[:space:]]*\[FAIL( [0-9]+)?\]' "$a6_log" || true)
a6_parse_overall_pass=$(grep -Ec '^\[OVERALL PASS\]' "$a6_log" || true)
if [[ "$a6_rc" -eq 0 && "$a6_tb_fail" -eq 0 && "$a6_parse_overall_pass" -ge 1 ]]; then
echo -e "${GREEN}PASS${NC} (TB pass=$a6_tb_pass + parse OVERALL PASS)"
PASS=$((PASS + 1))
else
echo -e "${RED}FAIL${NC} (rc=$a6_rc, TB pass=$a6_tb_pass fail=$a6_tb_fail, parse=$a6_parse_overall_pass)"
ERRORS="$ERRORS\n E2E DSP-to-Host: rc=$a6_rc"
echo " ---- A6 last 30 lines of log ----"
tail -30 "$a6_log" | sed 's/^/ /'
FAIL=$((FAIL + 1))
fi
rm -f "$a6_log"
# PR-I subsuites (replace tb_system_e2e). Each TB instantiates
# radar_system_top with USB_MODE=1 (production FT2232H path) and
@@ -627,8 +662,10 @@ if [[ "$QUICK" -eq 0 ]]; then
# cover all at once:
# tb_system_opcodes - opcode dispatch via FT2232H send_cmd (fast)
# tb_system_mechanics - reset/RF/safety/CDC mechanics (fast)
# tb_system_dataflow - shallow TX + range-pipeline integration
# (slow; 18 ms sim, ~430-450 s wall on this host).
# Note: tb_system_dataflow was retired in PR-Z — its 3 liveness-only
# asserts (chirp_frames>0, range_valid>0, range_valid>=100) are now
# dominated by A6's stronger in-TB checks (egress-byte exact, doppler
# bit-exact vs Python golden, cfar class). ~7 min wall reclaimed.
run_test "System Opcodes (tb_system_opcodes)" \
tb/tb_system_opcodes_reg.vvp \
tb/tb_system_opcodes.v "${SYSTEM_RTL[@]}"
@@ -636,19 +673,9 @@ if [[ "$QUICK" -eq 0 ]]; then
run_test "System Mechanics (tb_system_mechanics)" \
tb/tb_system_mechanics_reg.vvp \
tb/tb_system_mechanics.v "${SYSTEM_RTL[@]}"
run_test --timeout=600 "System Dataflow (tb_system_dataflow)" \
tb/tb_system_dataflow_reg.vvp \
tb/tb_system_dataflow.v "${SYSTEM_RTL[@]}"
# USB_MODE=1 system top — different TB, kept as a structural smoke test.
run_test "System Top USB_MODE=1 (FT2232H)" \
tb/tb_system_ft2232h_reg.vvp \
-DUSB_MODE_1 \
tb/radar_system_tb.v "${SYSTEM_RTL[@]}"
else
echo " (skipped receiver integration + system top + opcodes/mechanics/dataflow + USB_MODE=1 — use without --quick)"
SKIP=$((SKIP + 6))
echo " (skipped receiver integration + e2e dsp-to-host + opcodes/mechanics — use without --quick)"
SKIP=$((SKIP + 4))
fi
echo ""