mirror of
https://github.com/NawfalMotii79/PLFM_RADAR.git
synced 2026-06-09 15:07:14 +00:00
perf(fpga): move CIC comb stages to fabric — 80→70 DSPs (-10)
Strip the explicit DSP48E1 instance from comb stage 0 and the (* use_dsp = "yes" *) attribute from comb stages 1-4. The combs are gated by data_valid_comb_pipe (fires once every 4 clk_400m cycles post-decimation), so a multicycle path of 4 -setup / 3 -hold scoped to the comb registers in xc7a50t_ftg256.xdc gives STA 10 ns of slack for fabric carry-chain to close 28-bit subtracts comfortably. Pipeline depth and bit-widths unchanged: the new fabric model mirrors the prior CREG+AREG+BREG+PREG structure exactly, so data_valid_comb_0_out alignment and downstream stages 1-4 see bit-identical samples. CIC behavioral simulation model now lives outside the SIMULATION ifdef branch (used unconditionally) since there is no longer a synthesis-only DSP48E1 to replace. 50T post-impl results (Vivado 2025.2): DSPs: 80 → 70 / 120 (66.7% → 58.3%, freed 10) LUTs: 22114 / 32600 (67.8%) BRAM: 55.5 / 75 (74.0%, unchanged) adc_dco_p WNS: +0.022 ns → +0.906 ns (margin improved) All clocks meet timing, 0 failing endpoints. Local regression: 32/34 PASS — same as baseline; the two failures (Receiver Integration, Matched Filter Chain) are pre-existing RX-NEW-3 (FFT throughput) and unaffected by this change. Bit-exact through DDC chain (NCO→CIC→FIR) and MF cosim verified. Cumulative DSP savings today: 112 → 70 (freed 42), enough headroom for Xilinx LogiCORE FFT Pipelined Streaming swap (~33 DSPs for the 3-instance matched-filter chain) with 17 DSPs to spare.
This commit is contained in:
@@ -457,6 +457,33 @@ set_false_path -from [get_cells -hierarchical -filter {NAME =~ *reset_sync*_reg*
|
||||
set_false_path -from [get_clocks clk_100m] -to [get_clocks adc_dco_p]
|
||||
set_false_path -from [get_clocks adc_dco_p] -to [get_clocks clk_100m]
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# CIC comb stages — multicycle path (4-cycle setup / 3-cycle hold)
|
||||
# --------------------------------------------------------------------------
|
||||
# Comb registers (cic_*/comb_reg[*], cic_*/comb_delay_reg[*][*],
|
||||
# cic_*/comb_0_c_reg, cic_*/comb_0_ab_reg, cic_*/comb_0_p_reg) are clocked at
|
||||
# adc_dco_p (400 MHz) but their CE pins are driven by data_valid_comb_pipe /
|
||||
# data_valid_comb_0_out, which fire once every 4 cycles after the 4× decimator.
|
||||
# Effective throughput is 100 MHz, so STA can budget 4·2.5 ns = 10 ns of setup
|
||||
# slack instead of 2.5 ns. This frees the DSP48E1s these stages previously
|
||||
# occupied (5 per channel × 2 channels = 10 DSPs) and lets fabric carry-chain
|
||||
# subtracts close timing comfortably. See cic_decimator_4x_enhanced.v header
|
||||
# comment on the comb array declaration.
|
||||
set_multicycle_path 4 -setup \
|
||||
-from [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}] \
|
||||
-to [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}]
|
||||
set_multicycle_path 3 -hold \
|
||||
-from [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}] \
|
||||
-to [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}]
|
||||
# Also relax the launch path from integrator_sampled_comb (fed by integrator_4
|
||||
# DSP48E1 at decimated rate) into comb_0_c_reg.
|
||||
set_multicycle_path 4 -setup \
|
||||
-from [get_cells -hierarchical -filter {NAME =~ *cic_*/integrator_sampled_comb_reg*}] \
|
||||
-to [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}]
|
||||
set_multicycle_path 3 -hold \
|
||||
-from [get_cells -hierarchical -filter {NAME =~ *cic_*/integrator_sampled_comb_reg*}] \
|
||||
-to [get_cells -hierarchical -filter {NAME =~ *cic_*/comb_*reg*}]
|
||||
|
||||
# clk_100m ↔ clk_120m_dac: CDC via synchronizers in radar_system_top
|
||||
set_false_path -from [get_clocks clk_100m] -to [get_clocks clk_120m_dac]
|
||||
set_false_path -from [get_clocks clk_120m_dac] -to [get_clocks clk_100m]
|
||||
|
||||
Reference in New Issue
Block a user