fix(fpga): C-4 — replace IDDR DDR demux with negedge IFF for AD9484 SDR

The AD9484 is SDR LVDS — datasheet p.5 lists "Output (LVDS—SDR)" as the
only output mode and p.16 confirms "data outputs are valid on the rising
edge of DCO." DCO runs at fs (400 MHz), one new sample per period, held
stable across the period. There is no DDR mode and no SPI access (CSB is
tied to +1V8 on the production board, RADAR_Main_Board.sch:46719).

ad9484_interface_400m.v previously instantiated an IDDR per data bit and
alternated Q1/Q2 via a `dco_phase` FSM, expecting to demux a "DDR" stream
into 400 MSPS. Because the chip is SDR, both Q1 and Q2 represent the same
sample, and the alternation produced approximately
  [s_{-1}, s_1, s_1, s_3, s_3, s_5, …]
— odd-sample duplication with even-sample loss, equivalent to
decimate-by-2 followed by ZOH-upsample-by-2. In the frequency domain
that's a fold around fs/4 = 100 MHz; our 120-150 MHz IF lands at
50-80 MHz, so the DDC's 120 MHz NCO mixes the wrong frequency and the
matched filter sees baseband 40-70 MHz off where it expects.

The bug was hidden by tb/ad9484_interface_400m_stub.v, which has always
done single-rising-edge SDR-correct capture, so all iverilog regression
ran against the correct semantics — only the synthesizable Xilinx-
primitive path was wrong. This bug only fires on real silicon.

Fix:
- ad9484_interface_400m.v: drop IDDR + dco_phase; capture each data bit
  with a single (* IOB = "TRUE" *) negedge-clocked IFF on adc_dco_bufio.
  Falling DCO sits 1.25 ns inside AD9484's stable window, giving ~0.4 ns
  setup margin against tPD = 0.85 ns. Same pattern on the OR (overrange)
  path. Output FSM now emits one Q per BUFG cycle = clean 400 MSPS.
- tb_ad9484_xsim.v: add Test Group 8 (AUDIT-C4) that drives a 64-sample
  counter ramp synchronously with rising DCO, captures the output, and
  asserts (a) consecutive deltas equal +1 for ≥ (captured-6) of the
  stream, (b) zero duplicate samples (catches DDR-style demux), (c) zero
  unexpected jumps (catches DDR-style sample drops). This locks in SDR
  semantics so any future regression that reintroduces a DDR demux on
  this chip fails loudly.
- ad9484_interface_400m_stub.v: comment-only update — the stub already
  does correct SDR capture; document AUDIT-C4 + why iverilog regression
  was silent on the synth-path bug.
- xc7a200t_fbg484.xdc: fix stale "DDR class" comment near the OR pair
  (now "SDR LVDS").

Verification: bash run_regression.sh — 42 passed, 0 failed, 1 skipped
(the skip is the T-6 drift cosim, which needs scipy from the dev group;
CI installs it via uv sync --group dev). Test Group 8 in the xsim TB
runs against the real UNISIM primitives and is exercised separately on
the Vivado host (run_xfft_xsim.sh-style flow).
This commit is contained in:
Jason
2026-05-01 23:12:55 +05:45
parent abde60dd7e
commit 6f5ff792fa
4 changed files with 174 additions and 91 deletions
+65 -84
View File
@@ -4,9 +4,10 @@ module ad9484_interface_400m (
input wire [7:0] adc_d_n, // ADC Data N
input wire adc_dco_p, // Data Clock Output P (400MHz)
input wire adc_dco_n, // Data Clock Output N (400MHz)
// Audit F-0.1: AD9484 OR (overrange) LVDS pair, DDR like data.
// Routed on the 50T main board to bank 14 pins M6/N6. Asserts for any
// sample whose absolute value exceeds full-scale.
// Audit F-0.1: AD9484 OR (overrange) LVDS pair (SDR, like data — see
// AUDIT-C4 note below; an earlier comment incorrectly described this as
// DDR). Routed on the 50T main board to bank 14 pins M6/N6. Asserts for
// any sample whose absolute value exceeds full-scale.
input wire adc_or_p,
input wire adc_or_n,
@@ -59,14 +60,16 @@ IBUFDS #(
// ============================================================================
// Clock buffering strategy for source-synchronous ADC interface:
//
// BUFIO: Near-zero insertion delay, can only drive IOB primitives (IDDR).
// Used for IDDR clocking to match the data path delay through IBUFDS.
// This eliminates the hold violation caused by BUFG insertion delay.
// BUFIO: Near-zero insertion delay, drives IOB primitives only (in this
// module: the IOB-packed input FF on each data bit and the OR pair).
// Eliminates the hold violation that a BUFG-clocked capture would
// suffer from BUFG insertion delay.
//
// BUFG: Global clock buffer for fabric logic (downstream processing).
// Has ~4 ns insertion delay but that's fine for fabric-to-fabric paths.
// BUFG: Global clock buffer for fabric logic (downstream processing and
// the BUFIO→BUFG re-register stage). ~4 ns insertion delay, fine
// for fabric-to-fabric paths.
// ============================================================================
wire adc_dco_bufio; // Near-zero delay — drives IDDR only
wire adc_dco_bufio; // Near-zero delay — drives IOB IFFs only
wire adc_dco_buffered; // BUFG output — drives fabric logic
BUFIO bufio_dco (
@@ -87,57 +90,58 @@ adc_clk_mmcm mmcm_inst (
);
assign adc_dco_bufg = adc_dco_buffered;
// IDDR for capturing DDR data
wire [7:0] adc_data_rise; // Data on rising edge (BUFIO domain)
wire [7:0] adc_data_fall; // Data on falling edge (BUFIO domain)
// AUDIT-C4 (2026-05-01): AD9484 outputs SDR LVDS (datasheet p.5: "Output
// (LVDS—SDR)"; p.16: "data outputs are valid on the rising edge of DCO").
// One new sample per DCO period (DCO=fs=400 MHz); data is held stable for
// the full period. The chip has no DDR mode and no SPI access (CSB tied
// high on the production board, see Main_Board.sch:46719) so no register
// can change this.
//
// Capture on the FALLING DCO edge — 1.25 ns inside AD9484's stable data
// window. The rising DCO edge coincides with the AD9484 data transition
// (tSKEW = ±70 ps vs typical IFF setup ~150 ps), which would be
// functionally metastable; the falling edge has ~0.4 ns of setup margin
// against tPD = 0.85 ns. IOB=TRUE forces the FF into the input I/O block,
// giving near-zero clock-to-Q insertion delay matched to BUFIO.
//
// Previous (broken) behaviour: an IDDR captured both edges and a `dco_phase`
// FSM alternated Q1/Q2 in an attempt to demux a "DDR" stream. Because the
// chip is SDR, both edges represent the same sample, and the alternation
// produced approximately [s_{-1}, s_1, s_1, s_3, s_3, …] — odd-sample
// duplication with even-sample loss, equivalent to decimate-by-2 +
// ZOH upsample-by-2. The downstream 120 MHz IF then folded to ~80 MHz
// and corrupted the DDC. See git log for the audit trail.
(* IOB = "TRUE" *) reg [7:0] adc_data_iff;
genvar j;
generate
for (j = 0; j < 8; j = j + 1) begin : iddr_gen
IDDR #(
.DDR_CLK_EDGE("SAME_EDGE_PIPELINED"),
.INIT_Q1(1'b0),
.INIT_Q2(1'b0),
.SRTYPE("SYNC")
) iddr_inst (
.Q1(adc_data_rise[j]), // Rising edge data
.Q2(adc_data_fall[j]), // Falling edge data
.C(adc_dco_bufio), // BUFIO clock (near-zero insertion delay)
.CE(1'b1),
.D(adc_data[j]),
.R(1'b0),
.S(1'b0)
);
end
endgenerate
always @(negedge adc_dco_bufio) begin
adc_data_iff <= adc_data;
end
// ============================================================================
// Re-register IDDR outputs into BUFG domain
// IDDR with SAME_EDGE_PIPELINED produces outputs stable for a full clock cycle.
// Re-register the IFF output into the BUFG domain
// adc_data_iff is stable from one falling DCO edge to the next (full DCO
// period of validity), so the rising-edge BUFG capture has ample margin.
// BUFIO and BUFG are derived from the same source (adc_dco), so they are
// frequency-matched. This single register stage transfers from IOB (BUFIO)
// to fabric (BUFG) with guaranteed timing.
// frequency-matched.
// ============================================================================
// Timing on the BUFIO→BUFG CDC edge is governed by a 3.000 ns
// set_max_delay in constraints/adc_clk_mmcm.xdc (1.2× the 2.500 ns period),
// which leaves the placer free and still fits inside the ADC data-valid
// window. IOB=TRUE and a pblock around the IDDR column were both tried
// and rejected: IOB packing fails because the BUFG clock on these
// capture FFs can't share the ILOGIC clock mux with the BUFIO-clocked
// IDDR, and the pblock pulled fanout logic into the I/O region and
// triggered router congestion on 51 unrelated paths.
reg [7:0] adc_data_rise_bufg;
reg [7:0] adc_data_fall_bufg;
// window. The fabric BUFG-clocked re-register intentionally lives outside
// the IOB — packing it back into the IOB column was tried in earlier
// builds and rejected (the BUFG clock can't share the ILOGIC clock mux
// with a BUFIO-domain capture, and a pblock around the IDDR column
// pulled fanout into the I/O region and caused router congestion on 51
// unrelated paths).
reg [7:0] adc_data_iff_bufg;
always @(posedge adc_dco_buffered) begin
adc_data_rise_bufg <= adc_data_rise;
adc_data_fall_bufg <= adc_data_fall;
adc_data_iff_bufg <= adc_data_iff;
end
// Combine rising and falling edge data to get 400MSPS stream
// SDR output: one sample per BUFG cycle (400 MHz BUFG = 400 MSPS)
reg [7:0] adc_data_400m_reg;
reg adc_data_valid_400m_reg;
reg dco_phase;
// ── Reset synchronizer ────────────────────────────────────────
// reset_n comes from the 100 MHz sys_clk domain. Assertion (going low)
@@ -178,19 +182,9 @@ always @(posedge adc_dco_buffered or negedge reset_n_400m) begin
if (!reset_n_400m) begin
adc_data_400m_reg <= 8'b0;
adc_data_valid_400m_reg <= 1'b0;
dco_phase <= 1'b0;
end else begin
dco_phase <= ~dco_phase;
if (dco_phase) begin
// Output falling edge data (completes the 400MSPS stream)
adc_data_400m_reg <= adc_data_fall_bufg;
end else begin
// Output rising edge data
adc_data_400m_reg <= adc_data_rise_bufg;
end
adc_data_valid_400m_reg <= 1'b1; // Always valid when ADC is running
adc_data_400m_reg <= adc_data_iff_bufg;
adc_data_valid_400m_reg <= 1'b1;
end
end
@@ -198,11 +192,12 @@ assign adc_data_400m = adc_data_400m_reg;
assign adc_data_valid_400m = adc_data_valid_400m_reg;
// ============================================================================
// Audit F-0.1: AD9484 OR (overrange) capture
// OR is a DDR LVDS pair (same as data). Buffer it, capture both edges with an
// IDDR in the BUFIO domain, then OR the two phases into a single clk_400m
// flag. Register once for stability. No latching downstream is expected to
// stickify in its own domain.
// Audit F-0.1 / AUDIT-C4: AD9484 OR (overrange) capture
// OR is an SDR LVDS pair (per AD9484 datasheet — the chip outputs SDR,
// not DDR; comment in earlier revision was wrong). One assertion per DCO
// period, valid on the rising DCO edge, held stable for the rest of the
// period. Captured at the falling DCO edge in the same IFF style as the
// data path. Downstream stickifies in its own domain.
// ============================================================================
wire adc_or_raw;
IBUFDS #(
@@ -214,28 +209,14 @@ IBUFDS #(
.IB(adc_or_n)
);
wire adc_or_rise;
wire adc_or_fall;
IDDR #(
.DDR_CLK_EDGE("SAME_EDGE_PIPELINED"),
.INIT_Q1(1'b0),
.INIT_Q2(1'b0),
.SRTYPE("SYNC")
) iddr_or (
.Q1(adc_or_rise),
.Q2(adc_or_fall),
.C(adc_dco_bufio),
.CE(1'b1),
.D(adc_or_raw),
.R(1'b0),
.S(1'b0)
);
(* IOB = "TRUE" *) reg adc_or_iff;
always @(negedge adc_dco_bufio) begin
adc_or_iff <= adc_or_raw;
end
reg adc_or_rise_bufg;
reg adc_or_fall_bufg;
reg adc_or_iff_bufg;
always @(posedge adc_dco_buffered) begin
adc_or_rise_bufg <= adc_or_rise;
adc_or_fall_bufg <= adc_or_fall;
adc_or_iff_bufg <= adc_or_iff;
end
reg adc_overrange_r;
@@ -243,7 +224,7 @@ always @(posedge adc_dco_buffered or negedge reset_n_400m) begin
if (!reset_n_400m)
adc_overrange_r <= 1'b0;
else
adc_overrange_r <= adc_or_rise_bufg | adc_or_fall_bufg;
adc_overrange_r <= adc_or_iff_bufg;
end
assign adc_overrange_400m = adc_overrange_r;
@@ -137,13 +137,15 @@ set_property DIFF_TERM TRUE [get_ports {adc_d_p[*]}]
# --------------------------------------------------------------------------
# Audit F-0.1 / C-15: AD9484 OR (overrange) LVDS pair
# Pins: U20/V20 = IO_L11P/L11N_T1_SRCC_14 (same T1 clock tile as adc_dco on
# L12_MRCC at W19/W20). Co-locating OR in T1 keeps the IBUFDS→BUFIO→IDDR
# capture topology identical to adc_d_p[*] (data is the same DDR class).
# L12_MRCC at W19/W20). Co-locating OR in T1 keeps the IBUFDS→BUFIO→IFF
# capture topology identical to adc_d_p[*] (both are AD9484 SDR LVDS — see
# AUDIT-C4 in ad9484_interface_400m.v; an earlier comment incorrectly
# called this DDR).
# This is the FPGA-team RECOMMENDATION for the production PCB (NEW design);
# the PCB designer must route AD9484 OR+ → U20 and OR → V20.
# AD9484 datasheet: pin 23 = OR+ (LVDS_OUTPUT_OR_P), pin 24 = OR (LVDS_OUTPUT_OR_N).
# (50T board uses M6/N6 = L19_T3 on FTG256 — different package, different
# tile, but same end-to-end IBUFDS→BUFIO→IDDR topology in RTL.
# tile, but same end-to-end IBUFDS→BUFIO→IFF topology in RTL.
# Hold false_path on adc_or_p is applied in adc_clk_mmcm.xdc.)
# --------------------------------------------------------------------------
set_property PACKAGE_PIN U20 [get_ports {adc_or_p}]
@@ -5,12 +5,20 @@
// Replaces the real ad9484_interface_400m which uses Xilinx primitives
// (IBUFDS, BUFG, IDDR) that cannot compile in iverilog.
//
// AUDIT-C4 (2026-05-01): AD9484 is SDR LVDS — one new sample per rising
// DCO edge, data stable across the full DCO period. This stub captures
// adc_d_p on the rising edge of adc_dco_p, which already matches the
// chip's SDR semantics. The synthesizable path was rewritten in the same
// audit to route IDDR Q2 only (falling-edge stable capture) instead of
// alternating Q1/Q2 — the previous behaviour produced a 2× duplicated
// output stream that masqueraded as 400 MSPS DDR.
//
// Convention for testbench use:
// - Drive adc_d_p[7:0] with single-ended 8-bit ADC data
// - Drive adc_dco_p with the 400MHz clock (testbench-generated)
// - Drive adc_dco_p with the 400 MHz clock (testbench-generated)
// - adc_d_n and adc_dco_n are ignored
// - adc_dco_bufg = adc_dco_p (pass-through, no BUFG)
// - 1-cycle pipeline latency on data, same as real IDDR+register path
// - 1-cycle pipeline latency on data
// ============================================================================
module ad9484_interface_400m (
+94 -2
View File
@@ -8,10 +8,13 @@
//
// Key things tested:
// 1. Differential LVDS data capture (IBUFDS)
// 2. DDR data capture (IDDR, SAME_EDGE_PIPELINED mode)
// 2. IDDR Q2 (falling-edge) data capture in SAME_EDGE_PIPELINED mode
// 3. Reset synchronizer (P1-7 fix: async assert, sync de-assert)
// 4. Data integrity through full pipeline
// 5. Phase interleaving (rising/falling edge multiplexing)
// 5. AUDIT-C4: SDR-correctness — every rising-DCO sample appears exactly
// once in the output stream (no duplications, no drops). Catches any
// future regression that reintroduces a DDR-style Q1/Q2 demux on this
// SDR-only chip.
// ============================================================================
module tb_ad9484_xsim;
@@ -293,6 +296,95 @@ module tb_ad9484_xsim;
// adc_pwdn is not part of this module (it's in radar_system_top)
// Just verify the module port list is complete
// ════════════════════════════════════════════════════════
// TEST GROUP 8: AUDIT-C4 — SDR-correctness counter ramp
// ════════════════════════════════════════════════════════
// The AD9484 is SDR LVDS: one new sample per rising DCO edge,
// held stable across the full DCO period. We model that by
// launching a new counter value just after each rising DCO
// edge (~tPD = 0.85 ns) and holding it stable. The interface
// captures Q2 of the IDDR (falling-edge, in the stable window)
// and re-registers into BUFG → output.
//
// Correctness invariant: once the pipeline fills, the output
// stream must increment by exactly 1 per BUFG cycle — no
// duplications, no skips. A DDR-style Q1/Q2 demux (the previous
// bug) would emit each sample twice and drop alternating ones,
// which this test catches.
$display("\n--- Test Group 8: SDR Counter Ramp (AUDIT-C4) ---");
reset_n = 0;
adc_d_p = 8'h00;
#100;
reset_n = 1;
repeat (8) @(posedge adc_dco_p);
begin : sdr_ramp
reg [7:0] ramp_value;
reg [7:0] last_out;
reg have_last;
integer captured;
integer diff_one_count;
integer diff_zero_count;
integer diff_other_count;
ramp_value = 8'h10; // start away from 0
have_last = 0;
last_out = 8'h00;
captured = 0;
diff_one_count = 0;
diff_zero_count = 0;
diff_other_count = 0;
// Drive 64 samples in SDR style: launch new value just after
// each rising DCO edge, hold stable. Wrap-around in 8-bit is
// fine (consecutive +1 across 0xFF→0x00 still passes the
// diff-of-1 check).
for (i = 0; i < 64; i = i + 1) begin
@(posedge adc_dco_p);
#0.5; // ≈ AD9484 tPD launch delay
adc_d_p = ramp_value;
ramp_value = ramp_value + 8'd1;
// Sample the output near the next rising edge, after
// the BUFG-domain pipeline has settled.
if (adc_data_valid_400m) begin
if (have_last) begin
case (adc_data_400m - last_out)
8'd1: diff_one_count = diff_one_count + 1;
8'd0: diff_zero_count = diff_zero_count + 1;
default: diff_other_count = diff_other_count + 1;
endcase
end
last_out = adc_data_400m;
have_last = 1;
captured = captured + 1;
end
end
$display(" SDR ramp: captured=%0d diff=+1: %0d diff=0 (dup): %0d other: %0d",
captured, diff_one_count, diff_zero_count, diff_other_count);
// Need a meaningful number of samples to draw conclusions.
check(captured >= 32, "SDR ramp captured >= 32 output samples");
// Most consecutive output deltas must be +1. Allow a small
// slack window for pipeline-startup transients.
check(diff_one_count >= (captured - 6),
"SDR ramp: consecutive output samples increment by +1");
// Zero deltas would indicate the broken DDR-style Q1/Q2
// duplication. With Q2-only routing this should be 0 (allow
// 1 for the very first comparison if pipeline-startup races).
check(diff_zero_count <= 1,
"SDR ramp: no sample duplication (would indicate DDR demux bug)");
// No skips either — the broken demux dropped odd-indexed
// samples (delta == 2). Anything beyond +1/0 is a fail.
check(diff_other_count == 0,
"SDR ramp: no sample skips or unexpected jumps");
end
// ════════════════════════════════════════════════════════
// Summary
// ════════════════════════════════════════════════════════