The coefficient ROM has a deliberate positive DC pre-emphasis. Sum of
32 signed coefficients = 231,944; with the output slice at
accumulator[34:17] (effective Q17), DC gain = 231944 / 2^17 = 1.7696
= +4.96 dB. Bit-exact against the in-header golden-model line
(DC=5000 → 8847).
The +4.96 dB pre-emphasis compensates the upstream 4-stage CIC's
~3-4 dB passband droop. Without this note in the header, a future
engineer rebuilding the filter from a clean FIR design tool would
silently lose the pre-emphasis; AGC/saturation budgets in downstream
stages must also account for the +4.96 dB rather than assume 0 dB.
Audit's original "+7 dB" estimate was directionally correct but
quantitatively wrong (no Q-format reconciles to +7 dB; Q15 → +17 dB,
Q16 → +11 dB, Q17 → +4.96 dB). Documented at the verified +4.96 dB.
No coefficient or RTL change. Verified: full FPGA regression
41/41 PASS, 0 lint errors (FIR Lowpass: 13 checks PASS).
Re-group the 32-tap symmetric lowpass into 16 (D+A)*B operations using
the DSP48E1 pre-adder, exploiting coeff[k] == coeff[31-k]. Production
silicon (XC7A50T) drops from 112/120 DSPs (93.3%) to 80/120 (66.7%),
freeing the budget needed for the matched-filter FFT swap (RX-NEW-3).
Bit-exact contract preserved at non-saturating signal levels: DC=5000
→ 8847 and 45 MHz tone → ±16 LSB match the unfolded design and the
Python golden model. Throughput unchanged (1 sample/cycle, 100 MSPS);
latency +2 cycles for the pre-adder stage.
Saturation thresholds rebuilt via bit concatenation to dodge the
Verilog 32-bit-literal trap (1 <<< 34 silently wraps to 0, which
made the earlier symmetric draft assert positive saturation on all
non-negative accumulator values).
Local regression: 32/34 PASS — same as baseline; the two failures
(Receiver Integration, Matched Filter Chain) are pre-existing
RX-NEW-3 (FFT throughput) and unaffected by this change.
The header had two claims that "valid samples arrive every ~4 cycles" at
the FIR boundary. That is false in the production wiring: the CIC `_4x`
decimator turns clk_400m into a 100 M-pulse-per-second stream, then
cdc_adc_to_processing crosses that into clk_100m where dst_valid asserts
every cycle in steady state. The 4:1 ratio applies between the two clock
domains, not as further sub-sampling inside clk_100m.
This matters because the 32-tap coefficients were designed for the
25 MSPS rate the wrong comment described, but the FIR is actually being
driven at 100 MSPS. The cutoff sits 4x higher than intended; existing
tests pass because the 36-bit accumulator silently wraps on large
sustained inputs (see RX-NEW-3 in the project ledger).
Comment-only commit. No RTL behaviour change. Any future DSP-saving
rework — symmetric pre-adder, 4:1 fold, Xilinx FIR Compiler — needs a
designer call on whether to redesign coefficients for 100 MSPS, add a
real decimation stage to hit 25 MSPS, or keep the current accidental
behaviour.
Add (* USE_DSP = "no" *) attribute to FIR lowpass adder tree registers
(add_l1, add_l2, add_l3, accumulator_reg) to prevent Vivado from
inferring DSP48E1 slices for pure addition operations.
Each fir_lowpass_parallel_enhanced instance was using 47 DSPs (32 for
multiply + 15 for the adder tree). The 15 adder-tree DSPs per instance
(30 total for I/Q pair) performed only PCIN+A:B additions with no
multiplier usage. On the XC7A50T with only 120 DSP48E1 slices, this
caused 100% DSP utilization and forced FFT butterfly complex multipliers
to spill into 18-level fabric carry chains (WNS=-1.103ns).
Moving these 36-bit additions to fabric CARRY4 chains (~9 CARRY4 per
add, ~2ns propagation) is well within the 10ns clock period and frees
~30 DSPs for the FFT engine to use native DSP48E1 multipliers.
Regression: 23/23 FPGA tests PASS (attribute is synthesis-only).
P0-1: nco_400m_enhanced.v — DSP48E1 OPMODE corrected from PCIN to P
feedback (was routing stale cascade data into accumulator)
P0-2: radar_receiver_final.v — removed same-clock CDC that corrupted
ADC data path between ad9484_interface and DDC
P1-5: fir_lowpass.v — fixed zero replication count in coefficient
symmetric extension ({0{1'b0}} is empty, now uses explicit 0)
Also updates .gitignore to exclude debug/scratch artifacts.
All 30+ testbenches pass (unit, co-sim, integration).