mirror of
https://github.com/NawfalMotii79/PLFM_RADAR.git
synced 2026-06-08 22:47:16 +00:00
8b6f2ec8ec
Standalone diagnostic TB that drives a single chirp (SHORT/MEDIUM/LONG
selectable via +WAVE=N plusarg) through the production matched_filter
stack — chirp_reference_rom -> matched_filter_multi_segment ->
matched_filter_processing_chain (xfft_2048 + frequency_matched_filter)
— and logs every state transition of:
ms_state, ch_state, mem_request/mem_ready, segment_request,
current_segment, pc_valid, ms_status
Used to localise the LONG-chirp hang surfaced by tb_system_dataflow.
Findings (this run, iverilog SIMULATION fallback path):
SHORT (1 segment, 100 samples): PASS, 168 k cycles, 2048 pc_valid.
MEDIUM (1 segment, 500 samples): PASS, 168 k cycles, 2048 pc_valid.
LONG (2 segments, 3000 samples):
segment 0: COMPLETES — chain 0->1..10->0, 2048 pc_valid pulses,
ms_state walks ST_OUTPUT (6) -> ST_NEXT_SEGMENT (7) ->
ST_OVERLAP_COPY (8) -> ST_COLLECT_DATA (1) with
curr_seg = 1.
segment 1: HANGS in ST_COLLECT_DATA forever.
Root cause (not a test artefact, real RTL gap):
matched_filter_multi_segment.v ST_COLLECT_DATA increments
chirp_samples_collected and buffer_write_ptr only when ddc_valid is
high in that state. After ST_OVERLAP_COPY copies the 128 tail samples
of segment 0 into buffer[0..127], the FSM re-enters ST_COLLECT_DATA
and waits for buffer_write_ptr to reach 2048 (or
chirp_samples_collected >= LONG_CHIRP_SAMPLES = 3000) — both gated
on fresh ddc_valid pulses.
But the LONG chirp's tail samples (2048..2999 of the 3000-sample
ramp) arrived ~30 us into the chirp, while ms_state was stuck in
ST_PROCESSING / ST_WAIT_FFT / ST_OUTPUT processing segment 0. The
module has no side-channel ingestion, so those samples are dropped;
segment 1 never gets the data it needs and ST_COLLECT_DATA blocks
indefinitely.
Even on production xfft_2048 timing (~2200 cycles per FFT pass,
~7 k cycles per chain pass), segment 0 processing (~70 us) outlasts
the 30 us chirp duration. The bug is structural, not iverilog-only.
PR-J.2 will fix this. Three candidate approaches, in order of
implementation cost:
C) Defer segment processing until chirp is fully collected — small
FSM tweak; adds latency.
A) Extend the input BRAM to 4096 entries to hold the full LONG
chirp; segments slide over a stable buffer post-collection. ~1
extra BRAM, simplest data-flow.
B) Parallel ingestion FSM + ping-pong buffer that decouples capture
from processing. Keeps segment 0 latency optimal but is the most
RTL surface change.
This TB stays out of run_regression.sh until PR-J.2 lands the fix —
LONG would deterministically FAIL today.