The header had two claims that "valid samples arrive every ~4 cycles" at
the FIR boundary. That is false in the production wiring: the CIC `_4x`
decimator turns clk_400m into a 100 M-pulse-per-second stream, then
cdc_adc_to_processing crosses that into clk_100m where dst_valid asserts
every cycle in steady state. The 4:1 ratio applies between the two clock
domains, not as further sub-sampling inside clk_100m.
This matters because the 32-tap coefficients were designed for the
25 MSPS rate the wrong comment described, but the FIR is actually being
driven at 100 MSPS. The cutoff sits 4x higher than intended; existing
tests pass because the 36-bit accumulator silently wraps on large
sustained inputs (see RX-NEW-3 in the project ledger).
Comment-only commit. No RTL behaviour change. Any future DSP-saving
rework — symmetric pre-adder, 4:1 fold, Xilinx FIR Compiler — needs a
designer call on whether to redesign coefficients for 100 MSPS, add a
real decimation stage to hit 25 MSPS, or keep the current accidental
behaviour.
Add (* USE_DSP = "no" *) attribute to FIR lowpass adder tree registers
(add_l1, add_l2, add_l3, accumulator_reg) to prevent Vivado from
inferring DSP48E1 slices for pure addition operations.
Each fir_lowpass_parallel_enhanced instance was using 47 DSPs (32 for
multiply + 15 for the adder tree). The 15 adder-tree DSPs per instance
(30 total for I/Q pair) performed only PCIN+A:B additions with no
multiplier usage. On the XC7A50T with only 120 DSP48E1 slices, this
caused 100% DSP utilization and forced FFT butterfly complex multipliers
to spill into 18-level fabric carry chains (WNS=-1.103ns).
Moving these 36-bit additions to fabric CARRY4 chains (~9 CARRY4 per
add, ~2ns propagation) is well within the 10ns clock period and frees
~30 DSPs for the FFT engine to use native DSP48E1 multipliers.
Regression: 23/23 FPGA tests PASS (attribute is synthesis-only).
P0-1: nco_400m_enhanced.v — DSP48E1 OPMODE corrected from PCIN to P
feedback (was routing stale cascade data into accumulator)
P0-2: radar_receiver_final.v — removed same-clock CDC that corrupted
ADC data path between ad9484_interface and DDC
P1-5: fir_lowpass.v — fixed zero replication count in coefficient
symmetric extension ({0{1'b0}} is empty, now uses explicit 0)
Also updates .gitignore to exclude debug/scratch artifacts.
All 30+ testbenches pass (unit, co-sim, integration).