fix(fpga): PR-O.8 — cfg_tdata 24->16 for Pipelined Streaming I/O

PR-O in 8541443 packed cfg_tdata using PG109 Burst I/O semantics (22-bit
SCALE_SCH, 24-bit total). The xfft_2048 IP we instantiate is Pipelined
Streaming I/O — that arch has SCALE_SCH width = 2*ceil(NFFT_MAX/2) = 12
bits, cfg_tdata = 16 bits. Mismatch surfaced when the Vivado-regenerated
.xci reported C_S_AXIS_CONFIG_TDATA_WIDTH=16. Realigns wrappers + TBs.

Total /N scaling preserved: 22'h155555 (/N as 11 stages of >>1) becomes
12'hAA9 (stage 1 alone >>1 + stages 2-11 grouped as 5 pairs of >>2 each).
Iverilog fft_engine.v fallback unchanged — applies fixed >>>1 per stage.

Verified: tb_fft_engine_axi_bridge 4/4, tb_matched_filter_processing_chain
40/40. Vivado .dcp / .veo regenerated from .xci; gitignored as usual.
This commit is contained in:
Jason
2026-05-02 10:08:00 +05:45
parent 8541443c64
commit af64b0952e
6 changed files with 62 additions and 69 deletions
+26 -11
View File
@@ -95,18 +95,33 @@
// chain output (FFT·conj(FFT)·IFFT) is /N², predictable and per-frame
// constant, so CFAR alpha calibrated in iverilog matches silicon counts.
//
// cfg_tdata layout per PG109 (1 channel, no CP, fixed NFFT, scaled):
// bit 0 = FWD/INV (1 = forward, 0 = inverse)
// bits[22:1] = SCALE_SCH (22 bits)
// bit 23 = byte-align padding (0)
// Total cfg_tdata width = 24 bits.
// cfg_tdata layout per PG109 (1 channel, no CP, fixed NFFT, scaled,
// Pipelined Streaming I/O architecture). The IP groups radix-2 stages
// into radix-4-style pairs for scheduling — each 2-bit field covers a
// pair of stages, so SCALE_SCH width = 2 * ceil(NFFT_MAX/2) = 12 bits
// for NFFT_MAX=11. (PR-O.2 originally used the 22-bit Burst-I/O
// layout — wrong for our Pipelined Streaming arch; corrected in
// PR-O.8 commit after Vivado IP regen reported cfg_tdata=16.)
//
// The same schedule is replicated in fft_engine.v (iverilog fallback) by
// applying convergent-rounding >>>1 at every BF_WRITE stage so absolute
// counts agree between sim and silicon.
`define RP_FFT_CFG_TDATA_W 24
`define RP_FFT_SCALE_SCH_W 22
`define RP_FFT_SCALE_SCH 22'h155555 // [01,01,01,01,01,01,01,01,01,01,01]
// bit 0 = FWD/INV (1 = forward, 0 = inverse)
// bits[12:1] = SCALE_SCH (12 bits, LSB = stage 1 alone, then 5 pairs)
// bits[15:13] = byte-align padding (0)
// Total cfg_tdata width = 16 bits.
//
// SCALE_SCH = 12'hAA9 = 12'b10_10_10_10_10_01:
// stage 1 alone bits[1:0] = 2'b01 → >>1
// stages 2..3 bits[3:2] = 2'b10 → >>2 (/4 across pair)
// stages 4..5 bits[5:4] = 2'b10
// stages 6..7 bits[7:6] = 2'b10
// stages 8..9 bits[9:8] = 2'b10
// stages 10..11 bits[11:10] = 2'b10
// Total shift = 1 + 5*2 = 11 = /N. The iverilog fft_engine.v fallback
// applies >>>1 at every BF_WRITE (= /N total too) so absolute output
// magnitudes match between sim and silicon for any /N-equivalent
// schedule.
`define RP_FFT_CFG_TDATA_W 16
`define RP_FFT_SCALE_SCH_W 12
`define RP_FFT_SCALE_SCH 12'hAA9
// 3-ladder waveform identity (replaces 1-bit use_long_chirp rail in PR-C onward)
// `define RP_WAVE_<NAME> values are 2-bit waveform selectors carried on