NawfalMotii79-PLFM_RADAR

mirror of https://github.com/NawfalMotii79/PLFM_RADAR.git synced 2026-06-09 15:07:14 +00:00

Author	SHA1	Message	Date
Jason	bf63d64533	AUDIT-S17: document fir_lowpass +4.96 dB DC gain and CIC-droop comp The coefficient ROM has a deliberate positive DC pre-emphasis. Sum of 32 signed coefficients = 231,944; with the output slice at accumulator[34:17] (effective Q17), DC gain = 231944 / 2^17 = 1.7696 = +4.96 dB. Bit-exact against the in-header golden-model line (DC=5000 → 8847). The +4.96 dB pre-emphasis compensates the upstream 4-stage CIC's ~3-4 dB passband droop. Without this note in the header, a future engineer rebuilding the filter from a clean FIR design tool would silently lose the pre-emphasis; AGC/saturation budgets in downstream stages must also account for the +4.96 dB rather than assume 0 dB. Audit's original "+7 dB" estimate was directionally correct but quantitatively wrong (no Q-format reconciles to +7 dB; Q15 → +17 dB, Q16 → +11 dB, Q17 → +4.96 dB). Documented at the verified +4.96 dB. No coefficient or RTL change. Verified: full FPGA regression 41/41 PASS, 0 lint errors (FIR Lowpass: 13 checks PASS).	2026-04-30 10:08:34 +05:45
Jason	0b2f75620e	perf(fpga): symmetric pre-adder FIR — 32→16 DSPs/channel (-32 total) Re-group the 32-tap symmetric lowpass into 16 (D+A)*B operations using the DSP48E1 pre-adder, exploiting coeff[k] == coeff[31-k]. Production silicon (XC7A50T) drops from 112/120 DSPs (93.3%) to 80/120 (66.7%), freeing the budget needed for the matched-filter FFT swap (RX-NEW-3). Bit-exact contract preserved at non-saturating signal levels: DC=5000 → 8847 and 45 MHz tone → ±16 LSB match the unfolded design and the Python golden model. Throughput unchanged (1 sample/cycle, 100 MSPS); latency +2 cycles for the pre-adder stage. Saturation thresholds rebuilt via bit concatenation to dodge the Verilog 32-bit-literal trap (1 <<< 34 silently wraps to 0, which made the earlier symmetric draft assert positive saturation on all non-negative accumulator values). Local regression: 32/34 PASS — same as baseline; the two failures (Receiver Integration, Matched Filter Chain) are pre-existing RX-NEW-3 (FFT throughput) and unaffected by this change.	2026-04-23 10:08:19 +05:45
Jason	977434a5f6	docs(fpga): correct fir_lowpass.v rate comment + flag rate/coeff mismatch The header had two claims that "valid samples arrive every ~4 cycles" at the FIR boundary. That is false in the production wiring: the CIC `_4x` decimator turns clk_400m into a 100 M-pulse-per-second stream, then cdc_adc_to_processing crosses that into clk_100m where dst_valid asserts every cycle in steady state. The 4:1 ratio applies between the two clock domains, not as further sub-sampling inside clk_100m. This matters because the 32-tap coefficients were designed for the 25 MSPS rate the wrong comment described, but the FIR is actually being driven at 100 MSPS. The cutoff sits 4x higher than intended; existing tests pass because the 36-bit accumulator silently wraps on large sustained inputs (see RX-NEW-3 in the project ledger). Comment-only commit. No RTL behaviour change. Any future DSP-saving rework — symmetric pre-adder, 4:1 fold, Xilinx FIR Compiler — needs a designer call on whether to redesign coefficients for 100 MSPS, add a real decimation stage to hit 25 MSPS, or keep the current accidental behaviour.	2026-04-23 09:26:23 +05:45
Jason	8d7b6e04a0	fix(rtl): force FIR adder tree to fabric to free 30 DSPs for FFT butterfly on 50T Add (* USE_DSP = "no" *) attribute to FIR lowpass adder tree registers (add_l1, add_l2, add_l3, accumulator_reg) to prevent Vivado from inferring DSP48E1 slices for pure addition operations. Each fir_lowpass_parallel_enhanced instance was using 47 DSPs (32 for multiply + 15 for the adder tree). The 15 adder-tree DSPs per instance (30 total for I/Q pair) performed only PCIN+A:B additions with no multiplier usage. On the XC7A50T with only 120 DSP48E1 slices, this caused 100% DSP utilization and forced FFT butterfly complex multipliers to spill into 18-level fabric carry chains (WNS=-1.103ns). Moving these 36-bit additions to fabric CARRY4 chains (~9 CARRY4 per add, ~2ns propagation) is well within the 10ns clock period and frees ~30 DSPs for the FFT engine to use native DSP48E1 multipliers. Regression: 23/23 FPGA tests PASS (attribute is synthesis-only).	2026-04-07 14:45:47 +03:00
Jason	ed6f79c6d3	FIR DSP48 pipelining (BREG+MREG) + matched filter BRAM migration with overlap cache FIR: Add coeff_reg/mult_reg pipeline stages to fix 68 DPIP-1 + 35 DPOP-2 DRC warnings. Valid pipeline widened 7→9 bits (+2 cycle latency). Matched filter: Migrate input_buffer_i/q from register arrays to BRAM (~33K FF savings). Overlap-save uses register cache captured during ST_PROCESSING to avoid BRAM read/write conflicts during overlap copy. New ST_OVERLAP_COPY state writes cached tail samples back sequentially. Both changes pass 18/18 FPGA regression. Golden data regenerated for +2 FIR latency baseline.	2026-03-19 20:39:01 +02:00
Jason	d8a8532097	Convert CIC comb + FIR delay_line to sync reset for DSP48 absorption (Build 10) CIC: async→sync reset on decimation control, valid pipeline, and comb section. Added (* use_dsp = "yes" *) on comb[] to force DSP48E1 absorption of 28-bit subtracts (was 7-deep CARRY4, Build 9 critical path at WNS +0.128ns). Targets ~10 additional DSP48E1s. FIR: async→sync reset on delay_line block, enabling DSP48E1 AREG/BREG absorption. Targets elimination of ~2,522 DPIR-1 methodology warnings. 13/13 regression suites pass. Integration golden: 2048/2048 exact match.	2026-03-17 20:56:42 +02:00
Jason	1558f17d05	Convert async→sync reset on DSP/BRAM datapath registers for timing closure P1-CRITICAL: doppler_processor.v — split FSM into control (async reset) and BRAM/DSP datapath (sync reset) blocks. Fixes REQP-1839/1840 BRAM address register corruption risk; enables DSP48 absorption of window multipliers (mult_i/q). P1-CRITICAL: frequency_matched_filter.v — convert all 4 pipeline stages (input capture, multiply, add, saturate) from async to sync reset. Enables DSP48E1 absorption of complex multiplier registers. P1-HIGH: fir_lowpass.v — convert adder tree (L0-L4), output stage, and valid pipeline from async to sync reset. Fixes 856 DPOR-1 warnings (428 per FIR instance × 2 I/Q channels), enabling DSP48 absorption of the entire pipelined adder tree. Expected impact: eliminate ~1000 DRC warnings, improve WNS from +0.019ns by enabling Vivado to absorb hundreds of registers into DSP48E1/BRAM hard blocks. Full regression: 13/13 test suites pass (257+ assertions).	2026-03-17 20:11:13 +02:00
Jason	fd6094ee9e	Fix P0/P1 RTL bugs found during pre-hardware audit P0-1: nco_400m_enhanced.v — DSP48E1 OPMODE corrected from PCIN to P feedback (was routing stale cascade data into accumulator) P0-2: radar_receiver_final.v — removed same-clock CDC that corrupted ADC data path between ad9484_interface and DDC P1-5: fir_lowpass.v — fixed zero replication count in coefficient symmetric extension ({0{1'b0}} is empty, now uses explicit 0) Also updates .gitignore to exclude debug/scratch artifacts. All 30+ testbenches pass (unit, co-sim, integration).	2026-03-16 22:24:06 +02:00
Jason	00fbab6c9d	Achieve full timing closure on xc7a100tcsg324-1 at 400 MHz (0 violations) Complete FPGA timing closure across all clock domains after 9 iterative Vivado builds. WNS improved from -48.325ns to +0.018ns (107,886 endpoints). RTL fixes for 400 MHz timing: - NCO: 6-stage pipeline with DSP48E1 phase accumulator, registered LUT index (Fix D splits address decode from ROM read), distributed RAM - CIC: explicit DSP48E1 PCOUT->PCIN cascade for 5 integrator stages, CREG=1 on integrator_0 to eliminate fabric->DSP setup violation - DDC: 400 MHz reset synchronizer (async-assert/sync-deassert), active-high reset register for DSP48E1 RST ports, posedge output stage - FIR: 5-stage binary adder tree pipeline (7-cycle latency) - FFT: 5-cycle butterfly pipeline with registered twiddle index, XPM_MEMORY_TDPRAM for data storage - XDC: CDC false paths, async reset false paths, CIC comb multicycle paths Final Build 9 timing (all MET): adc_dco_p (400 MHz): WNS = +0.278ns clk_100m (100 MHz): WNS = +0.018ns clk_120m_dac (120 MHz): WNS = +0.992ns ft601_clk_in (100 MHz): WNS = +5.229ns Cross-domain (adc_dco_p->clk_100m): WNS = +7.105ns	2026-03-16 15:02:35 +02:00
NawfalMotii79	5fbe97fa5f	Add files via upload	2026-03-09 00:17:39 +00:00

10 Commits