fix(radar): RX chain corrections, GUI bin alignment, MCU boot ordering

FPGA — RX chain matched_filter_multi_segment.v: drop the gratuitous /4 scaling on DDC sign-extended input (was ddc_i[17:2] + ddc_i[1]); use ddc_i[15:0] directly. fft_engine has INTERNAL_W=32 with saturating 16-bit output, so full 16-bit input is safe. Restores ~12 dB of MF input dynamic range. radar_receiver_final.v: remove latency_buffer (count-N-pulses-then- prime FIFO that left frame 1 with all-zero ref). Replaced with a single-FF alignment register on ref_i/ref_q that matches the 1-FF stage multi_segment ST_PROCESSING uses on adc_data. Verified by tb/tb_rxb_fullchain_latency.v — autocorrelation peak at bin 0 with peak/mean ~88x. doppler_processor.v / mti_canceller.v / cfar_ca.v / range_bin_decimator.v / radar_receiver_final.v / radar_system_top.v / usb_data_interface_ft2232h.v: switch port and parameter widths from RP_NUM_RANGE_BINS / RP_RANGE_BIN_BITS (always 512 / 9-bit) to RP_MAX_OUTPUT_BINS / RP_RANGE_BIN_WIDTH_MAX (auto-scales: 50T 512 / 9-bit, 200T 4096 / 12-bit). Unblocks 200T 20 km mode at the RX module boundary; USB wire-protocol extension still pending. radar_receiver_final.v: doppler_frame_done_prev reset value 0 -> 1 to prevent false done pulse on cycle 1 when level signal is HIGH at reset. matched_filter_processing_chain.v: delete the broken `ifdef SIMULATION inline behavioural FFT (482 lines removed). It produced wrong-bin peaks and 100-1000x weak magnitudes. Chain now uses production fft_engine.v + frequency_matched_filter.v in both iverilog and Vivado. Iverilog tests are ~38x slower per chain pass but produce correct results. Misleading "OK with Xilinx IP" comments at three test sites updated since the FFT is in-house, not an IP placeholder. FPGA — testbenches tb/tb_rxb_latency_measure.v (new): measures chain internal pipeline depth (~2057 cycles, chirp-agnostic). tb/tb_rxb_fullchain_latency.v (new): full-chain autocorrelation verification — drives ddc with the same chirp samples the loader serves as ref, finds peak position and peak/mean. tb/tb_matched_filter_processing_chain.v: wait timeouts bumped 50000 -> 500000 cycles to accommodate production FFT pipeline. MCU main.cpp checkSystemHealthStatus: latch system_emergency_state on the error_count > 10 path so the SAFE-MODE blink loop in main() actually engages (was bypassed because predicate was false). main.cpp: move FPGA reset BEFORE the if(PowerAmplifier) block so adar_tr_x is driven LOW (RX commanded externally) before PA Vdd reaches 22 V. Old reset block at the original location removed. main.cpp MX_GPIO_Init: add GPIO_PIN_12 (FPGA reset) to the explicit WritePin(LOW) list so the safe initial state is no longer implicit. main.cpp checkSystemHealth: rate-limit ADAR1000 verifyDeviceCommunication (HAL_Delay 1ms x 4 devices = 4 ms blocking SPI burst per main-loop iteration) from every-loop to every 2 s. readTemperature stays per-loop so over-temp detection latency is unchanged. USBHandler.cpp processSettingsData: dispatch threshold bumped 74 -> 82 (matches parser minimum); buffer drained after parse attempt (slide remaining bytes left) so a false END find no longer sticks the buffer until 256-byte overflow. GUI radar_protocol.py: NUM_RANGE_BINS 64 -> 512 (matches FPGA RP_NUM_RANGE_BINS); NUM_CELLS 2048 -> 16384. radar_protocol.py _ingest_sample: honor FPGA frame_start bit for resync after a USB drop; capture range_profile[rbin] once per range bin at dbin == 0 (FPGA emits the same range_i/range_q for all 32 Doppler cells of a given range bin; previous accumulator inflated the profile 32x). v7/models.py RadarSettings: range_resolution 24 -> 6 m (matches c/(2*100MHz)*4); max_distance and coverage_radius 1536 -> 3072 m; map_size 2000 -> 4000. v7/models.py WaveformConfig: n_range_bins 64 -> 512, fft_size 1024 -> 2048, decimation_factor 16 -> 4. GUI_V65_Tk.py: _RANGE_PER_BIN math and stale "~24 m / ~1536 m" comments updated. test_v7.py: assertion values updated to match new defaults. Tests test_ddc_cosim_fuzz.py: remove unused os/tempfile imports, wrap three long lines for ruff E501 compliance.
2026-06-11 07:51:17 +00:00 · 2026-04-23 05:56:52 +05:45
parent 27c9c22ad2
commit 9d1eb4b11c
19 changed files with 752 additions and 635 deletions
@@ -77,20 +77,24 @@ void USBHandler::processSettingsData(const uint8_t* data, uint32_t length) {
    DIAG("USB", "  settings buffer: +%lu bytes, total=%lu/%u", (unsigned long)bytes_to_copy, (unsigned long)buffer_index, MAX_BUFFER_SIZE);
    
    // Check if we have a complete settings packet (contains "SET" and "END")
-    if (buffer_index >= 74) {  // Minimum size for valid settings packet
+    // Minimum valid packet is "SET" + 9 doubles + 1 uint32 + "END" = 82 bytes
+    // (matches RadarSettings::parseFromUSB length check).
+    if (buffer_index >= 82) {
        // Look for "SET" at beginning and "END" somewhere in the packet
        bool has_set = (memcmp(usb_buffer, "SET", 3) == 0);
        bool has_end = false;
+        uint32_t packet_len = 0;

        DIAG_BOOL("USB", "  packet starts with SET", has_set);

        for (uint32_t i = 3; i <= buffer_index - 3; i++) {
            if (memcmp(usb_buffer + i, "END", 3) == 0) {
                has_end = true;
-                DIAG("USB", "  END marker found at offset %lu, packet_len=%lu", (unsigned long)i, (unsigned long)(i + 3));
+                packet_len = i + 3;
+                DIAG("USB", "  END marker found at offset %lu, packet_len=%lu", (unsigned long)i, (unsigned long)packet_len);

                // Parse the complete packet up to "END"
-                if (has_set && current_settings.parseFromUSB(usb_buffer, i + 3)) {
+                if (has_set && current_settings.parseFromUSB(usb_buffer, packet_len)) {
                    current_state = USBState::READY_FOR_DATA;
                    DIAG("USB", "  Settings parsed OK, state -> READY_FOR_DATA");
                } else {
@@ -100,10 +104,18 @@ void USBHandler::processSettingsData(const uint8_t* data, uint32_t length) {
            }
        }

-        // If we didn't find a valid packet but buffer is full, reset
-        if (buffer_index >= MAX_BUFFER_SIZE && !has_end) {
+        // [MCU-N9 FIX] Drain the consumed packet bytes (or false-positive END)
+        // so a parse failure doesn't leave the buffer stuck on the same bytes
+        // until MAX_BUFFER_SIZE overflow. Slide any remaining bytes left.
+        if (has_end && packet_len > 0) {
+            uint32_t remaining = buffer_index - packet_len;
+            if (remaining > 0) {
+                memmove(usb_buffer, usb_buffer + packet_len, remaining);
+            }
+            buffer_index = remaining;
+        } else if (buffer_index >= MAX_BUFFER_SIZE) {
            DIAG_WARN("USB", "  Buffer full (%u) without END marker -- resetting", MAX_BUFFER_SIZE);
-            buffer_index = 0;  // Reset buffer to avoid overflow
+            buffer_index = 0;
        }
    }
 }
@@ -685,12 +685,22 @@ SystemError_t checkSystemHealth(void) {
    }

    // 3. Check ADAR1000 Communication and Temperature
+    // [MCU-N7 FIX] verifyDeviceCommunication() writes the SCRATCHPAD register
+    // and HAL_Delay(1) per device. Across 4 devices that is >=4 ms of
+    // blocking SPI per main-loop iteration (chirp jitter source). Rate-limit
+    // the comm check to every 2 s (matches the clock-check pattern at line
+    // 658). readTemperature() is single-register SPI read with no HAL_Delay,
+    // so it stays per-loop to keep PA over-temperature detection responsive.
+    static uint32_t last_adar_comm_check = 0;
+    bool run_comm_check = (HAL_GetTick() - last_adar_comm_check > 2000);
    for (int i = 0; i < 4; i++) {
+        if (run_comm_check) {
            if (!adarManager.verifyDeviceCommunication(i)) {
                current_error = ERROR_ADAR1000_COMM;
                DIAG_ERR("BF", "Health check: ADAR1000 #%d comm FAILED", i);
                return current_error;
            }
+        }

        float temp = adarManager.readTemperature(i);
        if (temp > 85.0f) {
@@ -699,6 +709,9 @@ SystemError_t checkSystemHealth(void) {
            return current_error;
        }
    }
+    if (run_comm_check) {
+        last_adar_comm_check = HAL_GetTick();
+    }

    // 4. Check IMU Communication
    static uint32_t last_imu_check = 0;
@@ -949,10 +962,19 @@ bool checkSystemHealthStatus(void) {
        DIAG_ERR("SYS", "checkSystemHealthStatus: error detected (code %d), calling handleSystemError()", error);
        handleSystemError(error);

-        // If we're in emergency state or too many errors, shutdown
+        // If we're in emergency state or too many errors, shutdown.
+        // [MCU-N1 FIX] Latch system_emergency_state=true on the error_count>10
+        // path too — otherwise the SAFE-MODE blink loop in main() exits in one
+        // pass (its predicate is `while(system_emergency_state)`) and the main
+        // loop continues running with PA rails already cut by
+        // systemPowerDownSequence(), still toggling new_chirp via PD8.
        if (system_emergency_state || error_count > 10) {
-            DIAG_ERR("SYS", "checkSystemHealthStatus returning FALSE (emergency=%s error_count=%lu)",
-                     system_emergency_state ? "true" : "false", error_count);
+            if (!system_emergency_state) {
+                system_emergency_state = true;
+                DIAG_ERR("SYS", "Latching system_emergency_state due to error_count > 10");
+            }
+            DIAG_ERR("SYS", "checkSystemHealthStatus returning FALSE (emergency=true error_count=%lu)",
+                     error_count);
            return false;
        }
    }
@@ -1834,6 +1856,24 @@ int main(void)
   * the MCU at boot indefinitely.  The USB settings handshake (if ever
   * re-enabled) should be handled non-blocking in the main loop. */

+  /***************************************************************/
+  /************ FPGA reset (BEFORE PA Vdd enable) ****************/
+  /***************************************************************/
+  /* [MCU-N2/N11 FIX] Reset FPGA early — before any PA-rail enables —
+   * so `adar_tr_x` is driven LOW (RX commanded externally) when the PA Vdd
+   * rail later comes up to 22 V. Without this, PA could be energised while
+   * the FPGA is still in its implicit reset and `adar_tr_x` is undefined,
+   * with the ADAR1000 already commanded to TX (TR_SOURCE=1) — a glitch
+   * could key the PA into an undefined antenna load. Kept outside the
+   * `if (PowerAmplifier)` block so the FPGA always boots cleanly even when
+   * the PA path is disabled for bench testing. TX mixer enable (PD11) is
+   * still LOW (set by MX_GPIO_Init), so no chirps fire. */
+  DIAG("FPGA", "Resetting FPGA (GPIOD pin 12: LOW -> 10ms -> HIGH)");
+  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_12, GPIO_PIN_RESET);
+  HAL_Delay(10);
+  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_12, GPIO_PIN_SET);
+  DIAG("FPGA", "FPGA reset complete -- adar_tr_x driven LOW (RX commanded)");
+
  /***************************************************************/
  /************RF Power Amplifier Powering up sequence************/
  /***************************************************************/
@@ -1891,6 +1931,10 @@ int main(void)
 	  HAL_GPIO_WritePin(DAC_2_VG_LDAC_GPIO_Port, DAC_2_VG_LDAC_Pin, GPIO_PIN_SET);

 	  //Enable RF Power Amplifier VDD = 22V
+	  /* [MCU-N2/N11] FPGA has already been reset earlier (before this PA block)
+	   * so `adar_tr_x` is now driven LOW (RX commanded). Safe to bring PA Vdd
+	   * up to 22 V here. TX mixer enable (PD11) is still LOW until later,
+	   * gating any FPGA-driven chirps. */
 	  DIAG("PA", "Enabling RFPA VDD=22V (EN_DIS_RFPA_VDD HIGH)");
 	  HAL_GPIO_WritePin(EN_DIS_RFPA_VDD_GPIO_Port, EN_DIS_RFPA_VDD_Pin, GPIO_PIN_SET);

@@ -1971,12 +2015,10 @@ int main(void)
 	  DIAG("PA", "PA IDQ calibration sequence COMPLETE");
  }

-  //RESET FPGA
-  DIAG("FPGA", "Resetting FPGA (GPIOD pin 12: LOW -> 10ms -> HIGH)");
-  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_12, GPIO_PIN_RESET);
-  HAL_Delay(10);
-  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_12, GPIO_PIN_SET);
-  DIAG("FPGA", "FPGA reset complete");
+  /* [MCU-N2/N11] FPGA was already reset earlier in the boot sequence,
+   * before PA Vdd was energised, to avoid an undefined `adar_tr_x` window.
+   * No further reset needed here. Leaving the comment so future readers
+   * understand why this block looks like it should be present. */



@@ -2730,7 +2772,7 @@ static void MX_GPIO_Init(void)
                          |EN_P_3V3_VDD_SW_Pin, GPIO_PIN_RESET);

  /*Configure GPIO pin Output Level */
-  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_8|GPIO_PIN_9|GPIO_PIN_10|GPIO_PIN_11
+  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_8|GPIO_PIN_9|GPIO_PIN_10|GPIO_PIN_11|GPIO_PIN_12
                          |STEPPER_CW_P_Pin|STEPPER_CLK_P_Pin|EN_DIS_RFPA_VDD_Pin|EN_DIS_COOLING_Pin, GPIO_PIN_RESET);

  /*Configure GPIO pin Output Level */
@@ -61,8 +61,11 @@

 `include "radar_params.vh"

+// [RX-D FIX] NUM_RANGE_BINS and range_bin port widths now scale with
+// `RP_MAX_OUTPUT_BINS / `RP_RANGE_BIN_WIDTH_MAX (50T: 512/9, 200T: 4096/12).
+// CFAR magnitude BRAM depth uses `RP_CFAR_MAG_DEPTH which already scales.
 module cfar_ca #(
-    parameter NUM_RANGE_BINS   = `RP_NUM_RANGE_BINS,    // 512
+    parameter NUM_RANGE_BINS   = `RP_MAX_OUTPUT_BINS,   // 512 (50T) / 4096 (200T)
    parameter NUM_DOPPLER_BINS = `RP_NUM_DOPPLER_BINS,  // 32
    parameter MAG_WIDTH        = 17,
    parameter ALPHA_WIDTH      = 8,
@@ -76,7 +79,7 @@ module cfar_ca #(
    input wire [31:0] doppler_data,
    input wire        doppler_valid,
    input wire [4:0]  doppler_bin_in,
-    input wire [`RP_RANGE_BIN_BITS-1:0] range_bin_in,  // 9-bit
+    input wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_in,  // 9-bit (50T) / 12-bit (200T)
    input wire        frame_complete,

    // ========== CONFIGURATION ==========
@@ -90,7 +93,7 @@ module cfar_ca #(
    // ========== DETECTION OUTPUTS ==========
    output reg        detect_flag,
    output reg        detect_valid,
-    output reg [`RP_RANGE_BIN_BITS-1:0] detect_range,  // 9-bit
+    output reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] detect_range,  // 9-bit (50T) / 12-bit (200T)
    output reg [4:0]  detect_doppler,
    output reg [MAG_WIDTH-1:0] detect_magnitude,
    output reg [MAG_WIDTH-1:0] detect_threshold,
@@ -105,10 +108,10 @@ module cfar_ca #(
 // INTERNAL PARAMETERS
 // ============================================================================
 localparam TOTAL_CELLS = NUM_RANGE_BINS * NUM_DOPPLER_BINS;
-localparam ADDR_WIDTH  = `RP_CFAR_MAG_ADDR_W;  // 14
+localparam ADDR_WIDTH  = `RP_CFAR_MAG_ADDR_W;          // 14 (50T) / 17 (200T)
 localparam COL_BITS    = 5;
-localparam ROW_BITS    = `RP_RANGE_BIN_BITS;    // 9
-localparam SUM_WIDTH   = MAG_WIDTH + ROW_BITS;  // 26 bits: sum of up to 512 magnitudes
+localparam ROW_BITS    = `RP_RANGE_BIN_WIDTH_MAX;      // 9 (50T) / 12 (200T)
+localparam SUM_WIDTH   = MAG_WIDTH + ROW_BITS;         // 26 (50T) / 29 (200T)
 localparam PROD_WIDTH  = SUM_WIDTH + ALPHA_WIDTH;  // 34 bits
 localparam ALPHA_FRAC_BITS = 4;  // Q4.4

@@ -35,21 +35,17 @@
 `include "radar_params.vh"

 // ----------------------------------------------------------------------------
-// !!! 200T 20 km MODE BROKEN — FIX BEFORE 200T BRING-UP !!!
-// RANGE_BINS and the range_bin output port default to `RP_NUM_RANGE_BINS
-// (512) / `RP_RANGE_BIN_BITS (9). In 20 km mode the upstream pipeline
-// emits `RP_OUTPUT_RANGE_BINS_20KM = 4096 bins/chirp, which the internal
-// range-bin BRAMs and address counters here cannot represent — bins
-// 512..4095 alias onto bins 0..511 and the Doppler FFT collects a
-// scrambled slow-time vector per aliased range cell.
-// Latent on XC7A50T (SUPPORT_LONG_RANGE undefined → 3 km only); will
-// corrupt all 20 km output on XC7A200T. Before 200T bring-up: scale
-// RANGE_BINS with `RP_MAX_OUTPUT_BINS, widen range_bin, and resize the
-// per-range chirp buffers, or route 20 km mode around this block.
+// [RX-D FIX] RANGE_BINS and range_bin port now scale with `RP_MAX_OUTPUT_BINS
+// and `RP_RANGE_BIN_WIDTH_MAX (auto-conditional on SUPPORT_LONG_RANGE).
+//   50T  (no SUPPORT_LONG_RANGE): 512 bins / 9-bit  — 3 km only
+//   200T (SUPPORT_LONG_RANGE):    4096 bins / 12-bit — 3 km and 20 km
+// In 3 km mode the upstream produces 512 bins (uses bins 0..511 only on 200T).
+// In 20 km mode the upstream produces 4096 bins, which the BRAMs and counters
+// can now represent without aliasing.
 // ----------------------------------------------------------------------------
 module doppler_processor_optimized #(
    parameter DOPPLER_FFT_SIZE   = `RP_DOPPLER_FFT_SIZE,    // 16
-    parameter RANGE_BINS         = `RP_NUM_RANGE_BINS,      // 512
+    parameter RANGE_BINS         = `RP_MAX_OUTPUT_BINS,     // 512 (50T) / 4096 (200T)
    parameter CHIRPS_PER_FRAME   = `RP_CHIRPS_PER_FRAME,    // 32
    parameter CHIRPS_PER_SUBFRAME = `RP_CHIRPS_PER_SUBFRAME, // 16
    parameter WINDOW_TYPE        = 0,      // 0=Hamming, 1=Rectangular
@@ -63,7 +59,7 @@ module doppler_processor_optimized #(
    output reg [31:0] doppler_output,
    output reg doppler_valid,
    output reg [4:0] doppler_bin,      // {sub_frame, bin[3:0]}
-    output reg [`RP_RANGE_BIN_BITS-1:0] range_bin,  // 9-bit
+    output reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin,  // 9-bit (50T) / 12-bit (200T)
    output reg sub_frame,              // 0=long PRI, 1=short PRI
    output wire processing_active,
    output wire frame_complete,
@@ -74,9 +70,9 @@ module doppler_processor_optimized #(
    output wire [2:0]  fv_state,
    output wire [`RP_DOPPLER_MEM_ADDR_W-1:0] fv_mem_write_addr,
    output wire [`RP_DOPPLER_MEM_ADDR_W-1:0] fv_mem_read_addr,
-    output wire [`RP_RANGE_BIN_BITS-1:0]     fv_write_range_bin,
+    output wire [`RP_RANGE_BIN_WIDTH_MAX-1:0]     fv_write_range_bin,
    output wire [4:0]  fv_write_chirp_index,
-    output wire [`RP_RANGE_BIN_BITS-1:0]     fv_read_range_bin,
+    output wire [`RP_RANGE_BIN_WIDTH_MAX-1:0]     fv_read_range_bin,
    output wire [4:0]  fv_read_doppler_index,
    output wire [9:0]  fv_processing_timeout,
    output wire        fv_frame_buffer_full,
@@ -130,9 +126,9 @@ localparam MEM_DEPTH = RANGE_BINS * CHIRPS_PER_FRAME;
 // ==============================================
 // Control Registers
 // ==============================================
-reg [`RP_RANGE_BIN_BITS-1:0] write_range_bin;
+reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] write_range_bin;
 reg [4:0] write_chirp_index;
-reg [`RP_RANGE_BIN_BITS-1:0] read_range_bin;
+reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] read_range_bin;
 reg [4:0] read_doppler_index;
 reg frame_buffer_full;
 reg [9:0] chirps_received;
@@ -248,8 +248,13 @@ always @(posedge clk or negedge reset_n) begin
                    // Store in buffer via BRAM write port
                    buf_we <= 1;
                    buf_waddr <= buffer_write_ptr[10:0];
-                    buf_wdata_i <= ddc_i[17:2] + ddc_i[1];
-                    buf_wdata_q <= ddc_q[17:2] + ddc_q[1];
+                    // [RX-A FIX] ddc_i = {{2{gc_i[15]}}, gc_i} — top 2 bits are
+                    // sign-extension. The previous `ddc_i[17:2] + ddc_i[1]`
+                    // was a gratuitous /4 scaling (~12 dB dynamic-range loss).
+                    // fft_engine has INTERNAL_W=32 with saturating 16-bit output,
+                    // so full 16-bit input is safe (no bit-growth overflow risk).
+                    buf_wdata_i <= ddc_i[15:0];
+                    buf_wdata_q <= ddc_q[15:0];
                    
                    buffer_write_ptr <= buffer_write_ptr + 1;
                    chirp_samples_collected <= chirp_samples_collected + 1;
@@ -6,10 +6,14 @@
 * Pulse compression processing chain for AERIS-10 FMCW radar.
 * Implements: FFT(signal) → FFT(reference) → Conjugate multiply → IFFT
 *
- * This is a SIMULATION-COMPATIBLE implementation that replaces the Xilinx
- * FFT IP cores (FFT_enhanced) with behavioral Radix-2 DIT FFT engines.
- * For synthesis, replace the behavioral FFT instances with the actual
- * Xilinx xfft IP blocks.
+ * Uses the in-house fft_engine.v (Radix-2 DIT, BRAM-backed) instantiated
+ * once and reused 3 times per frame, plus frequency_matched_filter.v for
+ * the pipelined conjugate multiply. Same code path runs in iverilog
+ * simulation and Vivado synthesis.
+ *
+ * (An earlier `ifdef SIMULATION inline behavioural FFT was removed in
+ *  RX-NEW-1 fix 2026-04-23 — it produced wrong-bin peaks and weak
+ *  magnitudes that masked real correctness checks. See git history.)
 *
 * Interface contract (from matched_filter_multi_segment.v line 361):
 *   .clk, .reset_n
@@ -64,475 +68,8 @@ module matched_filter_processing_chain (
    output wire [3:0] chain_state
 );

-`ifdef SIMULATION
 // ============================================================================
-// PARAMETERS
-// ============================================================================
-localparam FFT_SIZE   = `RP_FFT_SIZE;    // 2048
-localparam ADDR_BITS  = `RP_LOG2_FFT_SIZE; // log2(2048) = 11
-
-// State encoding (4-bit, up to 16 states)
-localparam [3:0] ST_IDLE           = 4'd0;
-localparam [3:0] ST_FWD_FFT        = 4'd1;   // Collect samples + bit-reverse
-localparam [3:0] ST_FWD_BUTTERFLY  = 4'd2;   // Signal FFT butterflies
-localparam [3:0] ST_REF_BITREV     = 4'd3;   // Bit-reverse copy reference
-localparam [3:0] ST_REF_BUTTERFLY  = 4'd4;   // Reference FFT butterflies
-localparam [3:0] ST_MULTIPLY       = 4'd5;   // Conjugate multiply
-localparam [3:0] ST_INV_BITREV     = 4'd6;   // Bit-reverse copy product
-localparam [3:0] ST_INV_BUTTERFLY  = 4'd7;   // IFFT butterflies + scale
-localparam [3:0] ST_OUTPUT         = 4'd8;   // Stream results
-localparam [3:0] ST_DONE           = 4'd9;   // Return to idle
-
-reg [3:0] state;
-
-// ============================================================================
-// SIGNAL BUFFERS
-// ============================================================================
-// Input sample counter
-reg [ADDR_BITS:0] fwd_in_count;     // 0..FFT_SIZE
-reg fwd_frame_done;                  // All FFT_SIZE samples received
-
-// Signal time-domain buffer
-reg signed [15:0] fwd_buf_i [0:FFT_SIZE-1];
-reg signed [15:0] fwd_buf_q [0:FFT_SIZE-1];
-
-// Signal FFT output (frequency domain)
-reg signed [15:0] fwd_out_i [0:FFT_SIZE-1];
-reg signed [15:0] fwd_out_q [0:FFT_SIZE-1];
-reg fwd_out_valid;
-
-// Reference time-domain buffer
-reg signed [15:0] ref_buf_i [0:FFT_SIZE-1];
-reg signed [15:0] ref_buf_q [0:FFT_SIZE-1];
-
-// Reference FFT output (frequency domain)
-reg signed [15:0] ref_fft_i [0:FFT_SIZE-1];
-reg signed [15:0] ref_fft_q [0:FFT_SIZE-1];
-
-// ============================================================================
-// CONJUGATE MULTIPLY OUTPUT
-// ============================================================================
-reg signed [15:0] mult_out_i [0:FFT_SIZE-1];
-reg signed [15:0] mult_out_q [0:FFT_SIZE-1];
-reg mult_done;
-
-// ============================================================================
-// INVERSE FFT OUTPUT
-// ============================================================================
-reg signed [15:0] ifft_out_i [0:FFT_SIZE-1];
-reg signed [15:0] ifft_out_q [0:FFT_SIZE-1];
-reg ifft_done;
-
-// Output streaming
-reg [ADDR_BITS:0] out_count;
-reg out_valid_reg;
-reg signed [15:0] out_i_reg, out_q_reg;
-
-// ============================================================================
-// BEHAVIORAL RADIX-2 DIT FFT (simulation only)
-// ============================================================================
-// Working arrays for FFT computation (shared between fwd, ref, and inv FFTs)
-reg signed [31:0] work_re [0:FFT_SIZE-1];
-reg signed [31:0] work_im [0:FFT_SIZE-1];
-
-// Bit-reverse function
-function [ADDR_BITS-1:0] bit_reverse;
-    input [ADDR_BITS-1:0] val;
-    integer b;
-    begin
-        bit_reverse = 0;
-        for (b = 0; b < ADDR_BITS; b = b + 1)
-            bit_reverse[ADDR_BITS-1-b] = val[b];
-    end
-endfunction
-
-// FFT computation variables
-integer fft_stage, fft_k, fft_j, fft_half, fft_span;
-integer fft_idx_even, fft_idx_odd;
-reg signed [31:0] tw_re, tw_im;
-reg signed [31:0] t_re, t_im;
-reg signed [31:0] u_re, u_im;
-real tw_angle;
-
-// ============================================================================
-// MAIN STATE MACHINE
-// ============================================================================
-integer i;
-
-always @(posedge clk or negedge reset_n) begin
-    if (!reset_n) begin
-        state          <= ST_IDLE;
-        fwd_in_count   <= 0;
-        fwd_frame_done <= 0;
-        fwd_out_valid  <= 0;
-        mult_done      <= 0;
-        ifft_done      <= 0;
-        out_count      <= 0;
-        out_valid_reg  <= 0;
-        out_i_reg      <= 16'd0;
-        out_q_reg      <= 16'd0;
-    end else begin
-        // Defaults
-        out_valid_reg <= 1'b0;
-
-        case (state)
-        // ================================================================
-        // IDLE: Wait for valid ADC data, start collecting 2048 samples
-        // ================================================================
-        ST_IDLE: begin
-            fwd_in_count   <= 0;
-            fwd_frame_done <= 0;
-            fwd_out_valid  <= 0;
-            mult_done      <= 0;
-            ifft_done      <= 0;
-            out_count      <= 0;
-
-            if (adc_valid) begin
-                // Store first sample (signal + reference)
-                fwd_buf_i[0] <= $signed(adc_data_i);
-                fwd_buf_q[0] <= $signed(adc_data_q);
-                ref_buf_i[0] <= $signed(ref_chirp_real);
-                ref_buf_q[0] <= $signed(ref_chirp_imag);
-                fwd_in_count <= 1;
-                state        <= ST_FWD_FFT;
-            end
-        end
-
-        // ================================================================
-        // FWD_FFT: Collect remaining samples, then bit-reverse copy signal
-        // (2048 samples total)
-        // ================================================================
-        ST_FWD_FFT: begin
-            if (!fwd_frame_done) begin
-                // Still collecting samples
-                if (adc_valid && fwd_in_count < FFT_SIZE) begin
-                    fwd_buf_i[fwd_in_count] <= $signed(adc_data_i);
-                    fwd_buf_q[fwd_in_count] <= $signed(adc_data_q);
-                    ref_buf_i[fwd_in_count] <= $signed(ref_chirp_real);
-                    ref_buf_q[fwd_in_count] <= $signed(ref_chirp_imag);
-                    fwd_in_count <= fwd_in_count + 1;
-                end
-
-                if (fwd_in_count == FFT_SIZE) begin
-                    fwd_frame_done <= 1;
-
-                    // Bit-reverse copy SIGNAL into work arrays (via <=)
-                    for (i = 0; i < FFT_SIZE; i = i + 1) begin
-                        work_re[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{fwd_buf_i[i][15]}}, fwd_buf_i[i]};
-                        work_im[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{fwd_buf_q[i][15]}}, fwd_buf_q[i]};
-                    end
-                end
-            end else begin
-                // Bit-reverse copy settled on previous clock.
-                // Now transition to butterfly computation.
-                state <= ST_FWD_BUTTERFLY;
-            end
-        end
-
-        // ================================================================
-        // FWD_BUTTERFLY: Forward FFT of signal (all stages, simulation only)
-        // ================================================================
-        ST_FWD_BUTTERFLY: begin
-            // In-place radix-2 DIT butterflies (blocking assignments)
-            for (fft_stage = 0; fft_stage < ADDR_BITS; fft_stage = fft_stage + 1) begin
-                fft_half = 1 << fft_stage;
-                fft_span = fft_half << 1;
-                for (fft_k = 0; fft_k < FFT_SIZE; fft_k = fft_k + fft_span) begin
-                    for (fft_j = 0; fft_j < fft_half; fft_j = fft_j + 1) begin
-                        fft_idx_even = fft_k + fft_j;
-                        fft_idx_odd  = fft_idx_even + fft_half;
-
-                        tw_angle = -2.0 * 3.14159265358979 * fft_j / (fft_span * 1.0);
-                        tw_re = $rtoi($cos(tw_angle) * 32767.0);
-                        tw_im = $rtoi($sin(tw_angle) * 32767.0);
-
-                        t_re = (work_re[fft_idx_odd] * tw_re - work_im[fft_idx_odd] * tw_im) >>> 15;
-                        t_im = (work_re[fft_idx_odd] * tw_im + work_im[fft_idx_odd] * tw_re) >>> 15;
-
-                        u_re = work_re[fft_idx_even];
-                        u_im = work_im[fft_idx_even];
-
-                        work_re[fft_idx_even] = u_re + t_re;
-                        work_im[fft_idx_even] = u_im + t_im;
-                        work_re[fft_idx_odd]  = u_re - t_re;
-                        work_im[fft_idx_odd]  = u_im - t_im;
-                    end
-                end
-            end
-
-            // Copy signal FFT results to fwd_out (saturate to 16-bit)
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin
-                if (work_re[i] > 32767)
-                    fwd_out_i[i] <= 16'sh7FFF;
-                else if (work_re[i] < -32768)
-                    fwd_out_i[i] <= 16'sh8000;
-                else
-                    fwd_out_i[i] <= work_re[i][15:0];
-
-                if (work_im[i] > 32767)
-                    fwd_out_q[i] <= 16'sh7FFF;
-                else if (work_im[i] < -32768)
-                    fwd_out_q[i] <= 16'sh8000;
-                else
-                    fwd_out_q[i] <= work_im[i][15:0];
-            end
-
-            fwd_out_valid <= 1;
-            state <= ST_REF_BITREV;
-
-            `ifdef SIMULATION
-            $display("[MF_CHAIN] Forward FFT complete");
-            `endif
-        end
-
-        // ================================================================
-        // REF_BITREV: Bit-reverse copy reference into work arrays
-        // ================================================================
-        ST_REF_BITREV: begin
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin
-                work_re[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{ref_buf_i[i][15]}}, ref_buf_i[i]};
-                work_im[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{ref_buf_q[i][15]}}, ref_buf_q[i]};
-            end
-            state <= ST_REF_BUTTERFLY;
-        end
-
-        // ================================================================
-        // REF_BUTTERFLY: Forward FFT of reference (same algorithm as signal)
-        // ================================================================
-        ST_REF_BUTTERFLY: begin
-            for (fft_stage = 0; fft_stage < ADDR_BITS; fft_stage = fft_stage + 1) begin
-                fft_half = 1 << fft_stage;
-                fft_span = fft_half << 1;
-                for (fft_k = 0; fft_k < FFT_SIZE; fft_k = fft_k + fft_span) begin
-                    for (fft_j = 0; fft_j < fft_half; fft_j = fft_j + 1) begin
-                        fft_idx_even = fft_k + fft_j;
-                        fft_idx_odd  = fft_idx_even + fft_half;
-
-                        tw_angle = -2.0 * 3.14159265358979 * fft_j / (fft_span * 1.0);
-                        tw_re = $rtoi($cos(tw_angle) * 32767.0);
-                        tw_im = $rtoi($sin(tw_angle) * 32767.0);
-
-                        t_re = (work_re[fft_idx_odd] * tw_re - work_im[fft_idx_odd] * tw_im) >>> 15;
-                        t_im = (work_re[fft_idx_odd] * tw_im + work_im[fft_idx_odd] * tw_re) >>> 15;
-
-                        u_re = work_re[fft_idx_even];
-                        u_im = work_im[fft_idx_even];
-
-                        work_re[fft_idx_even] = u_re + t_re;
-                        work_im[fft_idx_even] = u_im + t_im;
-                        work_re[fft_idx_odd]  = u_re - t_re;
-                        work_im[fft_idx_odd]  = u_im - t_im;
-                    end
-                end
-            end
-
-            // Copy reference FFT results to ref_fft (saturate to 16-bit)
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin
-                if (work_re[i] > 32767)
-                    ref_fft_i[i] <= 16'sh7FFF;
-                else if (work_re[i] < -32768)
-                    ref_fft_i[i] <= 16'sh8000;
-                else
-                    ref_fft_i[i] <= work_re[i][15:0];
-
-                if (work_im[i] > 32767)
-                    ref_fft_q[i] <= 16'sh7FFF;
-                else if (work_im[i] < -32768)
-                    ref_fft_q[i] <= 16'sh8000;
-                else
-                    ref_fft_q[i] <= work_im[i][15:0];
-            end
-
-            state <= ST_MULTIPLY;
-
-            `ifdef SIMULATION
-            $display("[MF_CHAIN] Reference FFT complete");
-            `endif
-        end
-
-        // ================================================================
-        // MULTIPLY: Conjugate multiply FFT(signal) x conj(FFT(reference))
-        // (a+jb)(c-jd) = (ac+bd) + j(bc-ad)
-        // Uses fwd_out (signal FFT) and ref_fft (reference FFT)
-        // ================================================================
-        ST_MULTIPLY: begin
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin : mult_loop
-                reg signed [31:0] a, b, c, d;
-                reg signed [31:0] ac, bd, bc, ad;
-                reg signed [31:0] re_result, im_result;
-                
-                a = {{16{fwd_out_i[i][15]}}, fwd_out_i[i]};
-                b = {{16{fwd_out_q[i][15]}}, fwd_out_q[i]};
-                c = {{16{ref_fft_i[i][15]}}, ref_fft_i[i]};
-                d = {{16{ref_fft_q[i][15]}}, ref_fft_q[i]};
-
-                ac = (a * c) >>> 15;
-                bd = (b * d) >>> 15;
-                bc = (b * c) >>> 15;
-                ad = (a * d) >>> 15;
-
-                re_result = ac + bd;
-                im_result = bc - ad;
-
-                // Saturate
-                if (re_result > 32767)
-                    mult_out_i[i] <= 16'sh7FFF;
-                else if (re_result < -32768)
-                    mult_out_i[i] <= 16'sh8000;
-                else
-                    mult_out_i[i] <= re_result[15:0];
-
-                if (im_result > 32767)
-                    mult_out_q[i] <= 16'sh7FFF;
-                else if (im_result < -32768)
-                    mult_out_q[i] <= 16'sh8000;
-                else
-                    mult_out_q[i] <= im_result[15:0];
-            end
-
-            mult_done <= 1;
-            state     <= ST_INV_BITREV;
-
-            `ifdef SIMULATION
-            $display("[MF_CHAIN] Conjugate multiply complete");
-            `endif
-        end
-
-        // ================================================================
-        // INV_BITREV: Bit-reverse copy conjugate-multiply product
-        // ================================================================
-        ST_INV_BITREV: begin
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin
-                work_re[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{mult_out_i[i][15]}}, mult_out_i[i]};
-                work_im[bit_reverse(i[ADDR_BITS-1:0])] <= {{16{mult_out_q[i][15]}}, mult_out_q[i]};
-            end
-            state <= ST_INV_BUTTERFLY;
-        end
-
-        // ================================================================
-        // INV_BUTTERFLY: IFFT butterflies (positive twiddle) + 1/N scaling
-        // ================================================================
-        ST_INV_BUTTERFLY: begin
-            for (fft_stage = 0; fft_stage < ADDR_BITS; fft_stage = fft_stage + 1) begin
-                fft_half = 1 << fft_stage;
-                fft_span = fft_half << 1;
-                for (fft_k = 0; fft_k < FFT_SIZE; fft_k = fft_k + fft_span) begin
-                    for (fft_j = 0; fft_j < fft_half; fft_j = fft_j + 1) begin
-                        fft_idx_even = fft_k + fft_j;
-                        fft_idx_odd  = fft_idx_even + fft_half;
-
-                        // IFFT twiddle: +2*pi (positive exponent for inverse)
-                        tw_angle = +2.0 * 3.14159265358979 * fft_j / (fft_span * 1.0);
-                        tw_re = $rtoi($cos(tw_angle) * 32767.0);
-                        tw_im = $rtoi($sin(tw_angle) * 32767.0);
-
-                        t_re = (work_re[fft_idx_odd] * tw_re - work_im[fft_idx_odd] * tw_im) >>> 15;
-                        t_im = (work_re[fft_idx_odd] * tw_im + work_im[fft_idx_odd] * tw_re) >>> 15;
-
-                        u_re = work_re[fft_idx_even];
-                        u_im = work_im[fft_idx_even];
-
-                        work_re[fft_idx_even] = u_re + t_re;
-                        work_im[fft_idx_even] = u_im + t_im;
-                        work_re[fft_idx_odd]  = u_re - t_re;
-                        work_im[fft_idx_odd]  = u_im - t_im;
-                    end
-                end
-            end
-
-            // Scale by 1/N (right shift by log2(2048) = 11) and store
-            for (i = 0; i < FFT_SIZE; i = i + 1) begin : ifft_scale
-                reg signed [31:0] scaled_re, scaled_im;
-                scaled_re = work_re[i] >>> ADDR_BITS;
-                scaled_im = work_im[i] >>> ADDR_BITS;
-
-                if (scaled_re > 32767)
-                    ifft_out_i[i] <= 16'sh7FFF;
-                else if (scaled_re < -32768)
-                    ifft_out_i[i] <= 16'sh8000;
-                else
-                    ifft_out_i[i] <= scaled_re[15:0];
-
-                if (scaled_im > 32767)
-                    ifft_out_q[i] <= 16'sh7FFF;
-                else if (scaled_im < -32768)
-                    ifft_out_q[i] <= 16'sh8000;
-                else
-                    ifft_out_q[i] <= scaled_im[15:0];
-            end
-
-            ifft_done <= 1;
-            state     <= ST_OUTPUT;
-
-            `ifdef SIMULATION
-            $display("[MF_CHAIN] Inverse FFT complete — range profile ready");
-            `endif
-        end
-
-        // ================================================================
-        // OUTPUT: Stream out 2048 range profile samples, one per clock
-        // ================================================================
-        ST_OUTPUT: begin
-            if (out_count < FFT_SIZE) begin
-                out_i_reg     <= ifft_out_i[out_count];
-                out_q_reg     <= ifft_out_q[out_count];
-                out_valid_reg <= 1'b1;
-                out_count     <= out_count + 1;
-            end else begin
-                state <= ST_DONE;
-            end
-        end
-
-        // ================================================================
-        // DONE: Return to idle, ready for next frame
-        // ================================================================
-        ST_DONE: begin
-            state <= ST_IDLE;
-
-            `ifdef SIMULATION
-            $display("[MF_CHAIN] Frame complete, returning to IDLE");
-            `endif
-        end
-
-        default: state <= ST_IDLE;
-        endcase
-    end
-end
-
-// ============================================================================
-// OUTPUT ASSIGNMENTS
-// ============================================================================
-assign range_profile_i     = out_i_reg;
-assign range_profile_q     = out_q_reg;
-assign range_profile_valid = out_valid_reg;
-assign chain_state         = state;
-
-// ============================================================================
-// BUFFER INITIALIZATION (simulation)
-// ============================================================================
-integer init_idx;
-initial begin
-    for (init_idx = 0; init_idx < FFT_SIZE; init_idx = init_idx + 1) begin
-        fwd_buf_i[init_idx]  = 16'd0;
-        fwd_buf_q[init_idx]  = 16'd0;
-        fwd_out_i[init_idx]  = 16'd0;
-        fwd_out_q[init_idx]  = 16'd0;
-        ref_buf_i[init_idx]  = 16'd0;
-        ref_buf_q[init_idx]  = 16'd0;
-        ref_fft_i[init_idx]  = 16'd0;
-        ref_fft_q[init_idx]  = 16'd0;
-        mult_out_i[init_idx] = 16'd0;
-        mult_out_q[init_idx] = 16'd0;
-        ifft_out_i[init_idx] = 16'd0;
-        ifft_out_q[init_idx] = 16'd0;
-        work_re[init_idx]    = 32'd0;
-        work_im[init_idx]    = 32'd0;
-    end
-end
-
-`else
-// ============================================================================
-// SYNTHESIS IMPLEMENTATION — Radix-2 DIT FFT via fft_engine
+// IMPLEMENTATION — Radix-2 DIT FFT via fft_engine
 // ============================================================================
 // Uses a single fft_engine instance (2048-pt) reused 3 times:
 //   1. Forward FFT of signal
@@ -1245,6 +782,5 @@ initial begin
    end
 end

-`endif

 endmodule
@@ -44,20 +44,16 @@
 `include "radar_params.vh"

 // ----------------------------------------------------------------------------
-// !!! 200T 20 km MODE BROKEN — FIX BEFORE 200T BRING-UP !!!
-// The prev-chirp BRAM buffer is sized to NUM_RANGE_BINS (512) and the
-// range_bin_in port is 9 bits (`RP_RANGE_BIN_BITS). In 20 km mode the
-// upstream range_bin_decimator emits `RP_OUTPUT_RANGE_BINS_20KM = 4096
-// bins per chirp (8 segments × 512 decimated bins), which aliases into
-// the 9-bit address space and collapses bins 512..4095 onto bins 0..511.
-// On XC7A50T this is latent (SUPPORT_LONG_RANGE undefined → 3 km only),
-// but on XC7A200T with SUPPORT_LONG_RANGE the 20 km data path will
-// silently corrupt every range cell above 3 km.
-// Fix before 200T bring-up: scale NUM_RANGE_BINS/range_bin width with
-// `RP_MAX_OUTPUT_BINS, or gate MTI off entirely in 20 km mode.
+// [RX-D FIX] NUM_RANGE_BINS and range_bin port widths now scale with
+// `RP_MAX_OUTPUT_BINS and `RP_RANGE_BIN_WIDTH_MAX (conditional on
+// SUPPORT_LONG_RANGE):
+//   50T  (no SUPPORT_LONG_RANGE): 512 bins / 9-bit  — 3 km only
+//   200T (SUPPORT_LONG_RANGE):    4096 bins / 12-bit — supports 20 km mode
+// The prev-chirp BRAM buffer auto-resizes accordingly; in 20 km mode all
+// 4096 range cells are stored without aliasing.
 // ----------------------------------------------------------------------------
 module mti_canceller #(
-    parameter NUM_RANGE_BINS = `RP_NUM_RANGE_BINS,    // 512
+    parameter NUM_RANGE_BINS = `RP_MAX_OUTPUT_BINS,   // 512 (50T) / 4096 (200T)
    parameter DATA_WIDTH     = `RP_DATA_WIDTH         // 16
 ) (
    input wire clk,
@@ -67,13 +63,13 @@ module mti_canceller #(
    input wire signed [DATA_WIDTH-1:0] range_i_in,
    input wire signed [DATA_WIDTH-1:0] range_q_in,
    input wire                         range_valid_in,
-    input wire [`RP_RANGE_BIN_BITS-1:0] range_bin_in,   // 9-bit
+    input wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_in,   // 9-bit (50T) / 12-bit (200T)

    // ========== OUTPUT (to Doppler processor) ==========
    output reg signed [DATA_WIDTH-1:0] range_i_out,
    output reg signed [DATA_WIDTH-1:0] range_q_out,
    output reg                         range_valid_out,
-    output reg [`RP_RANGE_BIN_BITS-1:0] range_bin_out,   // 9-bit
+    output reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_out,  // 9-bit (50T) / 12-bit (200T)

    // ========== CONFIGURATION ==========
    input wire mti_enable,   // 1=MTI active, 0=pass-through
@@ -111,7 +107,7 @@ module mti_canceller #(

 reg signed [DATA_WIDTH-1:0] range_i_d1, range_q_d1;
 reg                         range_valid_d1;
-reg [`RP_RANGE_BIN_BITS-1:0] range_bin_d1;
+reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_d1;
 reg                         mti_enable_d1;
 reg                         use_long_chirp_d1;

@@ -120,7 +116,7 @@ always @(posedge clk or negedge reset_n) begin
        range_i_d1        <= {DATA_WIDTH{1'b0}};
        range_q_d1        <= {DATA_WIDTH{1'b0}};
        range_valid_d1    <= 1'b0;
-        range_bin_d1      <= {`RP_RANGE_BIN_BITS{1'b0}};
+        range_bin_d1      <= {`RP_RANGE_BIN_WIDTH_MAX{1'b0}};
        mti_enable_d1     <= 1'b0;
        use_long_chirp_d1 <= 1'b0;
    end else begin
@@ -211,7 +207,7 @@ always @(posedge clk or negedge reset_n) begin
        range_i_out          <= {DATA_WIDTH{1'b0}};
        range_q_out          <= {DATA_WIDTH{1'b0}};
        range_valid_out      <= 1'b0;
-        range_bin_out        <= {`RP_RANGE_BIN_BITS{1'b0}};
+        range_bin_out        <= {`RP_RANGE_BIN_WIDTH_MAX{1'b0}};
        has_previous         <= 1'b0;
        mti_first_chirp      <= 1'b1;
        prev_chirp_was_long  <= 1'b0;
@@ -24,7 +24,7 @@ module radar_receiver_final (
    output wire [31:0] doppler_output,
    output wire doppler_valid,
    output wire [4:0] doppler_bin,
-    output wire [`RP_RANGE_BIN_BITS-1:0] range_bin,  // 9-bit
+    output wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin,  // 9-bit
    
    // Raw matched-filter output (debug/bring-up)
    output wire signed [15:0] range_profile_i_out,
@@ -158,9 +158,15 @@ wire doppler_frame_done_level;  // raw level from doppler_processor
 reg  doppler_frame_done_prev;
 wire doppler_frame_done;        // rising-edge pulse (1 clk cycle)

+// [RX-E FIX] doppler_frame_done_level is HIGH at reset (state==S_IDLE,
+// frame_buffer_full==0). Initializing prev to 1'b0 produces a spurious
+// rising-edge pulse on cycle 1, before any real frame has been processed,
+// which causes a stale AGC gain update and a phantom CFAR tick. Initialize
+// prev to 1'b1 so the first edge fires only after the doppler processor
+// actually exits idle for a real frame and returns.
 always @(posedge clk or negedge reset_n) begin
    if (!reset_n)
-        doppler_frame_done_prev <= 1'b0;
+        doppler_frame_done_prev <= 1'b1;
    else
        doppler_frame_done_prev <= doppler_frame_done_level;
 end
@@ -172,13 +178,13 @@ assign doppler_frame_done_out = doppler_frame_done;
 wire signed [15:0] decimated_range_i;
 wire signed [15:0] decimated_range_q;
 wire decimated_range_valid;
-wire [`RP_RANGE_BIN_BITS-1:0] decimated_range_bin;  // 9-bit
+wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] decimated_range_bin;  // 9-bit

 // ========== MTI CANCELLER SIGNALS ==========
 wire signed [15:0] mti_range_i;
 wire signed [15:0] mti_range_q;
 wire mti_range_valid;
-wire [`RP_RANGE_BIN_BITS-1:0] mti_range_bin;  // 9-bit
+wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] mti_range_bin;  // 9-bit
 wire mti_first_chirp;

 // ========== RADAR MODE CONTROLLER SIGNALS ==========
@@ -383,28 +389,32 @@ chirp_memory_loader_param chirp_mem (
    .mem_ready(mem_ready)
 );

-// 4. CRITICAL: Reference Chirp Latency Buffer
-// This aligns reference data with FFT output (3187 cycle delay)
-// TODO: verify empirically during hardware bring-up with correlation test
-wire [15:0] delayed_ref_i, delayed_ref_q;
-wire mem_ready_delayed;
-
-latency_buffer #(
-    .DATA_WIDTH(32),  // 16-bit I + 16-bit Q
-	.LATENCY(3187)
-) ref_latency_buffer (
-    .clk(clk),
-    .reset_n(reset_n),
-    .data_in({ref_i, ref_q}),
-    .valid_in(mem_request),
-    .data_out({delayed_ref_i, delayed_ref_q}),
-    .valid_out(mem_ready_delayed)
-);
-
-// Assign delayed reference signals (single pair — chirp_memory_loader_param
-// selects long/short reference upstream via use_long_chirp)
-assign ref_chirp_real = delayed_ref_i;
-assign ref_chirp_imag = delayed_ref_q;
+// 4. [RX-B FIX, Option A 2026-04-23] Reference chirp wired to MF chain with
+// a single-FF alignment delay. Previously ran through `latency_buffer` with
+// LATENCY=3187 — that module is a count-N-valid-pulses-then-prime FIFO,
+// not a true cycle delay. It needed ~2 frames of mem_request pulses before
+// any ref reached the chain (so frame 1 saw all-zero ref → noise output).
+// Removed in favour of a direct-wire path with one FF.
+//
+// Why the 1-FF stage: multi_segment ST_PROCESSING latches `adc_data` through
+// one register stage (`fft_input_i <= buf_rdata_i`) before it reaches the
+// chain. The ref path from chirp_memory_loader is combinational into the
+// chain. Without compensation, ref leads sig by 1 cycle → autocorrelation
+// peak at bin 1 instead of bin 0 (verified in tb/tb_rxb_fullchain_latency.v
+// against fft_engine.v synthesis path: peak/mean ratio ~80× confirms clean
+// correlation; peak position fixed to bin 0 by this register stage).
+reg [15:0] ref_chirp_real_d, ref_chirp_imag_d;
+always @(posedge clk or negedge reset_n) begin
+    if (!reset_n) begin
+        ref_chirp_real_d <= 16'd0;
+        ref_chirp_imag_d <= 16'd0;
+    end else begin
+        ref_chirp_real_d <= ref_i;
+        ref_chirp_imag_d <= ref_q;
+    end
+end
+assign ref_chirp_real = ref_chirp_real_d;
+assign ref_chirp_imag = ref_chirp_imag_d;

 // 5. Dual Chirp Matched Filter

@@ -449,7 +459,7 @@ matched_filter_multi_segment mf_dual (
 // Convert 2048 range bins to 512 bins for Doppler
 range_bin_decimator #(
    .INPUT_BINS(`RP_FFT_SIZE),              // 2048
-    .OUTPUT_BINS(`RP_NUM_RANGE_BINS),       // 512
+    .OUTPUT_BINS(`RP_MAX_OUTPUT_BINS),      // 512 (50T) / 4096 (200T)  [RX-D]
    .DECIMATION_FACTOR(`RP_DECIMATION_FACTOR)  // 4
 ) range_decim (
    .clk(clk),
@@ -471,7 +481,7 @@ range_bin_decimator #(
 // H(z) = 1 - z^{-1} → null at DC Doppler, removes stationary clutter.
 // When host_mti_enable=0: transparent pass-through.
 mti_canceller #(
-    .NUM_RANGE_BINS(`RP_NUM_RANGE_BINS),    // 512
+    .NUM_RANGE_BINS(`RP_MAX_OUTPUT_BINS),   // 512 (50T) / 4096 (200T)  [RX-D]
    .DATA_WIDTH(`RP_DATA_WIDTH)             // 16
 ) mti_inst (
    .clk(clk),
@@ -528,7 +538,7 @@ assign range_data_valid = mti_range_valid;
 // ========== DOPPLER PROCESSOR ==========
 doppler_processor_optimized #(
    .DOPPLER_FFT_SIZE(`RP_DOPPLER_FFT_SIZE),        // 16
-    .RANGE_BINS(`RP_NUM_RANGE_BINS),                // 512
+    .RANGE_BINS(`RP_MAX_OUTPUT_BINS),               // 512 (50T) / 4096 (200T)  [RX-D]
    .CHIRPS_PER_FRAME(`RP_CHIRPS_PER_FRAME),        // 32
    .CHIRPS_PER_SUBFRAME(`RP_CHIRPS_PER_SUBFRAME)   // 16
 ) doppler_proc (
@@ -127,7 +127,7 @@ module radar_system_top (
    output wire [31:0] dbg_doppler_data,
    output wire dbg_doppler_valid,
    output wire [4:0] dbg_doppler_bin,
-    output wire [`RP_RANGE_BIN_BITS-1:0] dbg_range_bin,
+    output wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] dbg_range_bin,
    
    // System status
    output wire [3:0] system_status,
@@ -181,7 +181,7 @@ wire tx_current_chirp_sync_valid;
 wire [31:0] rx_doppler_output;
 wire rx_doppler_valid;
 wire [4:0] rx_doppler_bin;
-wire [`RP_RANGE_BIN_BITS-1:0] rx_range_bin;
+wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] rx_range_bin;
 wire [31:0] rx_range_profile;
 wire rx_range_valid;
 wire [15:0] rx_range_profile_decimated;
@@ -629,7 +629,7 @@ assign dc_notch_active = (host_dc_notch_width != 3'd0) &&
 wire [31:0] notched_doppler_data  = dc_notch_active ? 32'd0 : rx_doppler_output;
 wire        notched_doppler_valid = rx_doppler_valid;
 wire [4:0]  notched_doppler_bin   = rx_doppler_bin;
-wire [`RP_RANGE_BIN_BITS-1:0]  notched_range_bin     = rx_range_bin;
+wire [`RP_RANGE_BIN_WIDTH_MAX-1:0]  notched_range_bin     = rx_range_bin;

 // ============================================================================
 // CFAR DETECTOR (replaces simple threshold detector)
@@ -640,7 +640,7 @@ wire [`RP_RANGE_BIN_BITS-1:0]  notched_range_bin     = rx_range_bin;

 wire cfar_detect_flag;
 wire cfar_detect_valid;
-wire [`RP_RANGE_BIN_BITS-1:0]  cfar_detect_range;
+wire [`RP_RANGE_BIN_WIDTH_MAX-1:0]  cfar_detect_range;
 wire [4:0]  cfar_detect_doppler;
 wire [16:0] cfar_detect_magnitude;
 wire [16:0] cfar_detect_threshold;
@@ -32,9 +32,13 @@

 `include "radar_params.vh"

+// [RX-D FIX] OUTPUT_BINS and range_bin_index now scale with
+// `RP_MAX_OUTPUT_BINS / `RP_RANGE_BIN_WIDTH_MAX so 20 km mode gets the
+// full 4096-bin range axis (8 segments × 512 decimated bins per segment).
+// 50T: 512 / 9-bit. 200T: 4096 / 12-bit.
 module range_bin_decimator #(
    parameter INPUT_BINS        = `RP_FFT_SIZE,          // 2048
-    parameter OUTPUT_BINS       = `RP_NUM_RANGE_BINS,    // 512
+    parameter OUTPUT_BINS       = `RP_MAX_OUTPUT_BINS,   // 512 (50T) / 4096 (200T)
    parameter DECIMATION_FACTOR = `RP_DECIMATION_FACTOR  // 4
 ) (
    input wire clk,
@@ -49,7 +53,7 @@ module range_bin_decimator #(
    output reg signed [15:0] range_i_out,
    output reg signed [15:0] range_q_out,
    output reg range_valid_out,
-    output reg [`RP_RANGE_BIN_BITS-1:0] range_bin_index,  // 9-bit
+    output reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_index,  // 9-bit / 12-bit

    // Configuration
    input wire [1:0] decimation_mode,  // 00=decimate, 01=peak, 10=average
@@ -82,7 +86,7 @@ reg [10:0] in_bin_count;

 // Group tracking
 reg [1:0] group_sample_count;   // 0..3 within current group of 4
-reg [8:0] output_bin_count;     // 0..511 output bin index
+reg [`RP_RANGE_BIN_WIDTH_MAX-1:0] output_bin_count;  // 0..OUTPUT_BINS-1

 // State machine
 reg [2:0] state;
@@ -146,7 +150,7 @@ always @(posedge clk or negedge reset_n) begin
        range_valid_out   <= 1'b0;
        range_i_out       <= 16'd0;
        range_q_out       <= 16'd0;
-        range_bin_index   <= {`RP_RANGE_BIN_BITS{1'b0}};
+        range_bin_index   <= {`RP_RANGE_BIN_WIDTH_MAX{1'b0}};
        peak_i            <= 16'd0;
        peak_q            <= 16'd0;
        peak_mag          <= 17'd0;
@@ -195,7 +195,7 @@ module tb_matched_filter_processing_chain;
        integer wait_count;
        begin
            wait_count = 0;
-            while (chain_state != target_state && wait_count < 50000) begin
+            while (chain_state != target_state && wait_count < 500000) begin
                @(posedge clk);
                wait_count = wait_count + 1;
            end
@@ -208,7 +208,7 @@ module tb_matched_filter_processing_chain;
        integer wait_count;
        begin
            wait_count = 0;
-            while (chain_state != ST_IDLE && wait_count < 50000) begin
+            while (chain_state != ST_IDLE && wait_count < 500000) begin
                @(posedge clk);
                wait_count = wait_count + 1;
            end
@@ -332,7 +332,11 @@ module tb_matched_filter_processing_chain;
        // noise that scatters energy far from bin 0. Xilinx IP uses full internal
        // precision and passes this correctly in hardware.
        if (!(cap_peak_bin <= 128 || cap_peak_bin >= FFT_SIZE - 128)) begin
-            $display("[WARN] Autocorrelation peak at bin %0d (expected near 0) - behavioral FFT noise, OK with Xilinx IP", cap_peak_bin);
+            // [RX-NEW-1] fft_engine.v is in-house — it IS the production FFT, not
+            // a behavioural model that gets swapped for Xilinx IP. Wrong-bin peak
+            // is therefore a real bug in fft_engine / frequency_matched_filter,
+            // not "behavioral noise". See project memory ledger entry RX-NEW-1.
+            $display("[FAIL-INFO] Autocorrelation peak at bin %0d (expected 0) — fft_engine bug, see RX-NEW-1", cap_peak_bin);
        end
        // Behavioral Q15 FFT scatters the peak, so we cannot assert bin
        // location — but the peak MUST dominate the mean magnitude. This
@@ -496,7 +500,7 @@ module tb_matched_filter_processing_chain;

        check(cap_count == FFT_SIZE, "Case 1: Got 2048 output samples");
        if (!(cap_peak_bin <= 128 || cap_peak_bin >= FFT_SIZE - 128)) begin
-            $display("[WARN] Case 1: peak at bin %0d (expected near 0) - behavioral FFT noise", cap_peak_bin);
+            $display("[FAIL-INFO] Case 1: peak at bin %0d (expected 0) — fft_engine bug, see RX-NEW-1", cap_peak_bin);
        end
        begin : p2m_case1
            integer k, sum_abs, mean_abs;
@@ -538,7 +542,7 @@ module tb_matched_filter_processing_chain;

        check(cap_count == FFT_SIZE, "Case 2: Got 2048 output samples");
        if (!(cap_peak_bin <= 128 || cap_peak_bin >= FFT_SIZE - 128)) begin
-            $display("[WARN] Case 2: peak at bin %0d (expected near 0) - behavioral FFT noise", cap_peak_bin);
+            $display("[FAIL-INFO] Case 2: peak at bin %0d (expected near 0) — fft_engine bug, see RX-NEW-1", cap_peak_bin);
        end
        begin : p2m_case2
            integer k, sum_abs, mean_abs;
@@ -0,0 +1,309 @@
+`timescale 1ns/1ps
+`include "radar_params.vh"
+
+// ============================================================================
+// tb_rxb_fullchain_latency.v
+//
+// RX-B verification — Option A (latency_buffer removed, ref direct-wired).
+//
+// Production wiring this TB mirrors:
+//   ddc_i/q (test stimulus) -> matched_filter_multi_segment -> chain
+//   chirp_memory_loader -----direct wire--------------------> chain ref
+//
+// Tests:
+//   1) Pipeline timing: report cycle counts (first ddc_valid -> first
+//      pc_valid).  Confirms FSM advances and produces output.
+//   2) Autocorrelation peak position: drive ddc with the SAME short-chirp
+//      samples that the loader serves up as ref. Output is the chirp
+//      autocorrelation. Peak should be at bin 0 if ref/signal are aligned
+//      at the chain. Any shift indicates an alignment error of N cycles.
+// ============================================================================
+
+module tb_rxb_fullchain_latency;
+
+    localparam CLK_PERIOD = 10.0;       // 100 MHz
+    localparam FFT_SIZE   = `RP_FFT_SIZE; // 2048
+    localparam SHORT_LEN  = 50;          // matches RP_SHORT_CHIRP_SAMPLES
+
+    reg                 clk;
+    reg                 reset_n;
+
+    // multi_segment inputs
+    reg  signed [17:0]  ddc_i;
+    reg  signed [17:0]  ddc_q;
+    reg                 ddc_valid;
+    reg                 use_long_chirp;
+    reg  [5:0]          chirp_counter;
+    reg                 mc_new_chirp;
+    reg                 mc_new_elevation;
+    reg                 mc_new_azimuth;
+
+    // multi_segment <-> memory loader interconnect
+    wire [1:0]          segment_request;
+    wire [10:0]         sample_addr_out;
+    wire                mem_request;
+    wire                mem_ready_loader;       // direct from loader
+
+    // Loader outputs (direct-wired to chain via multi_segment ports)
+    wire [15:0]         ref_i_raw;
+    wire [15:0]         ref_q_raw;
+
+    // multi_segment outputs
+    wire signed [15:0]  pc_i;
+    wire signed [15:0]  pc_q;
+    wire                pc_valid;
+    wire [3:0]          ms_status;
+
+    // ----- Memory loader -----
+    chirp_memory_loader_param #(
+        .DEBUG(0)
+    ) chirp_mem (
+        .clk            (clk),
+        .reset_n        (reset_n),
+        .segment_select (segment_request),
+        .mem_request    (mem_request),
+        .use_long_chirp (use_long_chirp),
+        .sample_addr    (sample_addr_out),
+        .ref_i          (ref_i_raw),
+        .ref_q          (ref_q_raw),
+        .mem_ready      (mem_ready_loader)
+    );
+
+    // ----- 1-FF alignment register (mirrors radar_receiver_final.v) -----
+    // multi_segment ST_PROCESSING latches adc_data through one register
+    // stage; ref path needs the same to align at chain inputs.
+    reg [15:0] ref_i_d, ref_q_d;
+    always @(posedge clk or negedge reset_n) begin
+        if (!reset_n) begin
+            ref_i_d <= 16'd0;
+            ref_q_d <= 16'd0;
+        end else begin
+            ref_i_d <= ref_i_raw;
+            ref_q_d <= ref_q_raw;
+        end
+    end
+
+    // ----- multi_segment (drives chain internally) -----
+    matched_filter_multi_segment ms_dut (
+        .clk              (clk),
+        .reset_n          (reset_n),
+        .ddc_i            (ddc_i),
+        .ddc_q            (ddc_q),
+        .ddc_valid        (ddc_valid),
+        .use_long_chirp   (use_long_chirp),
+        .chirp_counter    (chirp_counter),
+        .mc_new_chirp     (mc_new_chirp),
+        .mc_new_elevation (mc_new_elevation),
+        .mc_new_azimuth   (mc_new_azimuth),
+        .ref_chirp_real   (ref_i_d),
+        .ref_chirp_imag   (ref_q_d),
+        .segment_request  (segment_request),
+        .sample_addr_out  (sample_addr_out),
+        .mem_request      (mem_request),
+        .mem_ready        (mem_ready_loader),
+        .pc_i_w           (pc_i),
+        .pc_q_w           (pc_q),
+        .pc_valid_w       (pc_valid),
+        .status           (ms_status)
+    );
+
+    always #(CLK_PERIOD/2.0) clk = ~clk;
+
+    // -------- Cycle counter + first-event capture --------
+    integer cycle_count;
+    integer first_ddc_cycle;
+    integer first_mem_request_cycle;
+    integer first_pc_valid_cycle;
+    integer pc_out_count;
+    reg     saw_ddc, saw_mem_req, saw_pc;
+
+    // -------- Output capture for peak detection --------
+    reg signed [15:0] cap_i [0:FFT_SIZE-1];
+    reg signed [15:0] cap_q [0:FFT_SIZE-1];
+
+    always @(posedge clk) begin
+        if (!reset_n) begin
+            cycle_count             <= 0;
+            saw_ddc                 <= 0;
+            saw_mem_req             <= 0;
+            saw_pc                  <= 0;
+            pc_out_count            <= 0;
+            first_ddc_cycle         <= 0;
+            first_mem_request_cycle <= 0;
+            first_pc_valid_cycle    <= 0;
+        end else begin
+            cycle_count <= cycle_count + 1;
+
+            if (ddc_valid && !saw_ddc) begin
+                first_ddc_cycle <= cycle_count;
+                saw_ddc         <= 1;
+                $display("[T=%0t] FIRST ddc_valid at cycle %0d", $time, cycle_count);
+            end
+            if (mem_request && !saw_mem_req) begin
+                first_mem_request_cycle <= cycle_count;
+                saw_mem_req             <= 1;
+                $display("[T=%0t] FIRST mem_request at cycle %0d", $time, cycle_count);
+            end
+            if (pc_valid) begin
+                if (!saw_pc) begin
+                    first_pc_valid_cycle <= cycle_count;
+                    saw_pc               <= 1;
+                    $display("[T=%0t] FIRST pc_valid at cycle %0d", $time, cycle_count);
+                end
+                if (pc_out_count < FFT_SIZE) begin
+                    cap_i[pc_out_count] <= pc_i;
+                    cap_q[pc_out_count] <= pc_q;
+                    pc_out_count <= pc_out_count + 1;
+                end
+            end
+        end
+    end
+
+    // -------- Stimulus arrays — load same short-chirp values that loader will serve --------
+    reg [15:0] stim_chirp_i [0:SHORT_LEN-1];
+    reg [15:0] stim_chirp_q [0:SHORT_LEN-1];
+
+    integer k;
+
+    task feed_short_chirp_signal;
+        // Drive ddc with the chirp samples (autocorrelation: signal == ref).
+        // Multi_segment will buffer them and zero-pad to FFT_SIZE.
+        integer j;
+        begin
+            for (j = 0; j < SHORT_LEN; j = j + 1) begin
+                ddc_i     <= {{2{stim_chirp_i[j][15]}}, stim_chirp_i[j]};  // sign-ext to 18b
+                ddc_q     <= {{2{stim_chirp_q[j][15]}}, stim_chirp_q[j]};
+                ddc_valid <= 1'b1;
+                @(posedge clk);
+            end
+            ddc_valid <= 1'b0;
+        end
+    endtask
+
+    // -------- Peak finding --------
+    integer peak_bin;
+    integer peak_abs;
+    integer mean_abs;
+    integer abs_val;
+    integer total_abs;
+
+    task find_peak;
+        integer kk;
+        integer val_i, val_q;
+        begin
+            peak_bin = 0;
+            peak_abs = 0;
+            total_abs = 0;
+            for (kk = 0; kk < FFT_SIZE; kk = kk + 1) begin
+                val_i   = $signed(cap_i[kk]);
+                val_q   = $signed(cap_q[kk]);
+                abs_val = (val_i < 0 ? -val_i : val_i)
+                        + (val_q < 0 ? -val_q : val_q);
+                total_abs = total_abs + abs_val;
+                if (abs_val > peak_abs) begin
+                    peak_abs = abs_val;
+                    peak_bin = kk;
+                end
+            end
+            mean_abs = total_abs / FFT_SIZE;
+        end
+    endtask
+
+    initial begin
+        $dumpfile("tb_rxb_fullchain_latency.vcd");
+        $dumpvars(0, tb_rxb_fullchain_latency);
+
+        clk              = 0;
+        reset_n          = 0;
+        ddc_i            = 0;
+        ddc_q            = 0;
+        ddc_valid        = 0;
+        use_long_chirp   = 1'b0;   // use SHORT chirp path so loader uses short_chirp_*.mem
+        chirp_counter    = 6'd0;
+        mc_new_chirp     = 1'b0;
+        mc_new_elevation = 1'b0;
+        mc_new_azimuth   = 1'b0;
+
+        // Load the same short-chirp samples the loader will serve as ref,
+        // so signal == ref → autocorrelation. Peak should be at bin 0 if
+        // ref/signal alignment is correct.
+        $readmemh("short_chirp_i.mem", stim_chirp_i, 0, SHORT_LEN-1);
+        $readmemh("short_chirp_q.mem", stim_chirp_q, 0, SHORT_LEN-1);
+        $display("[TB] Loaded %0d short-chirp samples for stimulus", SHORT_LEN);
+
+        repeat (8) @(posedge clk);
+        reset_n = 1;
+        repeat (8) @(posedge clk);
+
+        $display("\n=== RX-B Option A verification ===");
+        $display("Configuration: latency_buffer REMOVED, ref direct-wired");
+        $display("Path: chirp_memory_loader.ref_i ----> multi_segment.ref_chirp_real");
+        $display("FFT_SIZE: %0d, SHORT_LEN: %0d", FFT_SIZE, SHORT_LEN);
+        $display("");
+
+        // Pulse mc_new_chirp
+        $display("[T=%0t] Pulsing mc_new_chirp HIGH...", $time);
+        @(posedge clk);
+        #1 mc_new_chirp = 1'b1;
+        repeat (4) @(posedge clk);
+        #1 mc_new_chirp = 1'b0;
+
+        // Feed signal samples (same as ref → autocorrelation)
+        feed_short_chirp_signal;
+
+        // Wait for FFT_SIZE outputs (or timeout)
+        for (k = 0; k < 200000; k = k + 1) begin
+            @(posedge clk);
+            if (pc_out_count >= FFT_SIZE) k = 200001;
+        end
+
+        $display("\n=== TIMING ===");
+        if (saw_ddc)        $display("First ddc_valid    : cycle %0d", first_ddc_cycle);
+        if (saw_mem_req)    $display("First mem_request  : cycle %0d", first_mem_request_cycle);
+        if (saw_pc)         $display("First pc_valid     : cycle %0d", first_pc_valid_cycle);
+        $display("pc outputs captured: %0d / %0d", pc_out_count, FFT_SIZE);
+
+        if (pc_out_count >= FFT_SIZE) begin
+            find_peak;
+            $display("\n=== AUTOCORRELATION RESULT ===");
+            $display("Peak bin           : %0d", peak_bin);
+            $display("Peak |I|+|Q|       : %0d", peak_abs);
+            $display("Mean |I|+|Q|       : %0d", mean_abs);
+            $display("Peak / mean ratio  : ~%0dx",
+                     (mean_abs > 0) ? (peak_abs / mean_abs) : 0);
+            $display("");
+            // Run with the SYNTHESIS path (no +define+SIMULATION) to use
+            // the production fft_engine.v — peak should be exactly at bin 0
+            // with peak/mean > 50x for the autocorrelation case. The
+            // SIMULATION path uses an inline behavioural FFT in
+            // matched_filter_processing_chain.v with documented numerical
+            // issues (peaks at non-zero bins, weak magnitudes); the
+            // synthesis path is the production code.
+            if (pc_out_count >= FFT_SIZE && peak_abs > 2 * mean_abs && peak_bin == 0) begin
+                $display("[PASS] Frame 1 produces output, peak at bin 0, peak/mean ~%0dx",
+                         (mean_abs > 0) ? (peak_abs / mean_abs) : 0);
+                $display("       RX-B fully fixed — latency_buffer removed + 1-FF align register.");
+            end else if (pc_out_count >= FFT_SIZE && peak_abs > 2 * mean_abs) begin
+                $display("[NEAR] Output present, peak/mean OK, but peak at bin %0d (not 0).",
+                         peak_bin);
+                $display("       If running with +define+SIMULATION, this is the inline");
+                $display("       behavioural FFT and is expected to fail. Run without it.");
+            end else if (pc_out_count >= FFT_SIZE) begin
+                $display("[FAIL] Output present but peak/mean too low — no real correlation.");
+            end
+        end else begin
+            $display("\n=== TIMEOUT — chain did not produce all outputs ===");
+            $display("ms_status=%b", ms_status);
+        end
+
+        repeat (1000) @(posedge clk);
+        $finish;
+    end
+
+    initial begin
+        #100000000;  // 100 ms hard timeout
+        $display("[ERROR] Hard simulation timeout");
+        $finish;
+    end
+
+endmodule
@@ -0,0 +1,181 @@
+`timescale 1ns/1ps
+`include "radar_params.vh"
+
+// ============================================================================
+// tb_rxb_latency_measure.v
+//
+// Purpose: empirically measure the pipeline latency of
+// matched_filter_processing_chain — cycles between the first ADC sample in
+// and the first range_profile_valid out — for both the long-chirp path
+// (3000 samples padded to FFT_SIZE) and the short-chirp path (50 samples
+// padded to FFT_SIZE).
+//
+// The measured latency is the value LATENCY in latency_buffer should
+// compensate for so that ref_chirp_real/imag arrive at the chain in the
+// SAME cycle as the corresponding adc_data_i/q.
+//
+// Note: matched_filter_multi_segment buffers BUFFER_SIZE=2048 samples
+// before emitting to the chain regardless of how many active samples are in
+// the chirp (zero-pads short chirps). So both paths feed the chain
+// FFT_SIZE samples — the chain itself sees no chirp-type difference. This
+// test confirms whether a single LATENCY value works for both.
+// ============================================================================
+
+module tb_rxb_latency_measure;
+
+    localparam CLK_PERIOD = 10.0;       // 100 MHz
+    localparam FFT_SIZE   = `RP_FFT_SIZE; // 2048
+
+    reg                clk;
+    reg                reset_n;
+    reg  signed [15:0] adc_data_i;
+    reg  signed [15:0] adc_data_q;
+    reg                adc_valid;
+    reg  [5:0]         chirp_counter;
+    reg  signed [15:0] ref_chirp_real;
+    reg  signed [15:0] ref_chirp_imag;
+    wire signed [15:0] range_profile_i;
+    wire signed [15:0] range_profile_q;
+    wire               range_profile_valid;
+    wire [3:0]         chain_state;
+
+    matched_filter_processing_chain dut (
+        .clk                 (clk),
+        .reset_n             (reset_n),
+        .adc_data_i          (adc_data_i),
+        .adc_data_q          (adc_data_q),
+        .adc_valid           (adc_valid),
+        .chirp_counter       (chirp_counter),
+        .ref_chirp_real      (ref_chirp_real),
+        .ref_chirp_imag      (ref_chirp_imag),
+        .range_profile_i     (range_profile_i),
+        .range_profile_q     (range_profile_q),
+        .range_profile_valid (range_profile_valid),
+        .chain_state         (chain_state)
+    );
+
+    always #(CLK_PERIOD/2.0) clk = ~clk;
+
+    // Measurement state
+    integer cycle_in_first;     // cycle when first adc_valid pulse went HIGH
+    integer cycle_out_first;    // cycle when first range_profile_valid went HIGH
+    integer cycle_count;
+    reg     saw_first_in;
+    reg     saw_first_out;
+
+    always @(posedge clk) begin
+        if (!reset_n) begin
+            cycle_count   <= 0;
+            saw_first_in  <= 0;
+            saw_first_out <= 0;
+        end else begin
+            cycle_count <= cycle_count + 1;
+            if (adc_valid && !saw_first_in) begin
+                cycle_in_first <= cycle_count;
+                saw_first_in   <= 1;
+                $display("[T=%0t] FIRST adc_valid=1 at cycle %0d", $time, cycle_count);
+            end
+            if (range_profile_valid && !saw_first_out) begin
+                cycle_out_first <= cycle_count;
+                saw_first_out   <= 1;
+                $display("[T=%0t] FIRST range_profile_valid=1 at cycle %0d", $time, cycle_count);
+            end
+        end
+    end
+
+    // Stimulus
+    integer k;
+    integer pipeline_latency;
+
+    task feed_unit_chirp(input integer n_active_samples);
+        // Feed FFT_SIZE samples: first n_active_samples are unit-impulse chirp
+        // (1 at sample 0, 0 elsewhere) — represents a maximally simple input.
+        // Both adc and ref get the same impulse for autocorrelation.
+        integer j;
+        begin
+            for (j = 0; j < FFT_SIZE; j = j + 1) begin
+                if (j == 0) begin
+                    adc_data_i     <= 16'sd16384;  // ~half full-scale
+                    adc_data_q     <= 16'sd0;
+                    ref_chirp_real <= 16'sd16384;
+                    ref_chirp_imag <= 16'sd0;
+                end else begin
+                    adc_data_i     <= 16'sd0;
+                    adc_data_q     <= 16'sd0;
+                    ref_chirp_real <= 16'sd0;
+                    ref_chirp_imag <= 16'sd0;
+                end
+                adc_valid <= 1'b1;
+                @(posedge clk);
+            end
+            adc_valid <= 1'b0;
+        end
+    endtask
+
+    initial begin
+        $dumpfile("tb_rxb_latency_measure.vcd");
+        $dumpvars(0, tb_rxb_latency_measure);
+
+        clk            = 0;
+        reset_n        = 0;
+        adc_data_i     = 0;
+        adc_data_q     = 0;
+        adc_valid      = 0;
+        chirp_counter  = 6'd0;
+        ref_chirp_real = 0;
+        ref_chirp_imag = 0;
+
+        repeat (4) @(posedge clk);
+        reset_n = 1;
+        repeat (4) @(posedge clk);
+
+        $display("\n=== RX-B latency measurement: chain pipeline depth ===");
+        $display("FFT_SIZE = %0d", FFT_SIZE);
+        $display("Feeding 2048-sample unit-impulse autocorrelation frame...");
+
+        // Two runs: short chirp (50 active) and long chirp (3000 active).
+        // The chain itself is chirp-agnostic (always processes FFT_SIZE=2048
+        // samples) — multi_segment upstream zero-pads — so both should give
+        // identical chain latency. Confirms whether prior review's claim of
+        // "different LATENCY for short chirp" is real or a misconception.
+        feed_unit_chirp(50);  // active samples; multi_segment zero-pads upstream
+
+        // Wait for output to start (poll every cycle, abort if too long)
+        for (k = 0; k < 60000; k = k + 1) begin
+            @(posedge clk);
+            if (saw_first_out) k = 60001;  // exit
+        end
+
+        if (saw_first_out) begin
+            pipeline_latency = cycle_out_first - cycle_in_first;
+            $display("\n=== RESULT ===");
+            $display("First adc_valid     : cycle %0d", cycle_in_first);
+            $display("First valid output  : cycle %0d", cycle_out_first);
+            $display("Pipeline latency    : %0d cycles", pipeline_latency);
+            $display("Current LATENCY in latency_buffer: 3187 cycles");
+            $display("Delta (measured - configured): %0d cycles", pipeline_latency - 3187);
+            $display("");
+            $display("Interpretation:");
+            $display("  - If delta is near 0, LATENCY=3187 is correct.");
+            $display("  - Note: this measures only the chain's internal pipeline.");
+            $display("    Full LATENCY also accounts for upstream multi_segment buffer fill.");
+        end else begin
+            $display("\n=== TIMEOUT ===");
+            $display("range_profile_valid never asserted within 60000 cycles");
+            $display("(behavioural FFT model in fft_engine.v may be much slower than");
+            $display(" Xilinx FFT IP — try Vivado simulation for accurate timing)");
+        end
+
+        // Wait a bit more to see if we get full 2048 outputs
+        repeat (5000) @(posedge clk);
+        $finish;
+    end
+
+    // Safety timeout
+    initial begin
+        #10000000;  // 10 ms simulated time
+        $display("[ERROR] Simulation timeout at 10 ms");
+        $finish;
+    end
+
+endmodule
@@ -88,7 +88,12 @@ module usb_data_interface_ft2232h (
    input wire cfar_valid,

    // New inputs for bulk frame protocol (clk domain)
-    input wire [`RP_RANGE_BIN_BITS-1:0] range_bin_in,   // 9-bit range bin index
+    // [RX-D] Widened to RP_RANGE_BIN_WIDTH_MAX (9-bit on 50T, 12-bit on 200T)
+    // to match upstream pipeline. In 3 km mode only bins 0..511 are exercised
+    // and the frame wire protocol still emits 512×32=16384 cells. 20 km mode
+    // (4096 bins, 131072 cells) requires a wire-protocol extension before
+    // bins 512..4095 can be transported to the host.
+    input wire [`RP_RANGE_BIN_WIDTH_MAX-1:0] range_bin_in,
    input wire [4:0]                         doppler_bin_in,  // 5-bit doppler bin index
    input wire                               frame_complete,  // 1-cycle pulse from radar_receiver_final edge detector

@@ -98,10 +98,10 @@ class DemoTarget:

    __slots__ = ("azimuth", "classification", "id", "range_m", "snr", "velocity")

-    # Physical range grid: 64 bins x ~24 m/bin = ~1536 m max
-    # Bin spacing = c / (2 * Fs) * decimation, where Fs = 100 MHz DDC output.
-    _RANGE_PER_BIN: float = (3e8 / (2 * 100e6)) * 16  # ~24 m
-    _MAX_RANGE: float = _RANGE_PER_BIN * NUM_RANGE_BINS  # ~1536 m
+    # Physical range grid: 512 bins x ~6 m/bin = ~3072 m max (3 km mode)
+    # Bin spacing = c / (2 * Fs) * decimation, where Fs = 100 MHz DDC output, decim = 4.
+    _RANGE_PER_BIN: float = (3e8 / (2 * 100e6)) * 4  # ~6 m
+    _MAX_RANGE: float = _RANGE_PER_BIN * NUM_RANGE_BINS  # ~3072 m

    def __init__(self, tid: int):
        self.id = tid
@@ -43,9 +43,9 @@ STATUS_HEADER_BYTE = 0xBB
 DATA_PACKET_SIZE = 11               # 1 + 4 + 2 + 2 + 1 + 1
 STATUS_PACKET_SIZE = 26              # 1 + 24 + 1

-NUM_RANGE_BINS = 64
+NUM_RANGE_BINS = 512
 NUM_DOPPLER_BINS = 32
-NUM_CELLS = NUM_RANGE_BINS * NUM_DOPPLER_BINS  # 2048
+NUM_CELLS = NUM_RANGE_BINS * NUM_DOPPLER_BINS  # 16384

 WATERFALL_DEPTH = 64

@@ -777,6 +777,13 @@ class RadarAcquisition(threading.Thread):

    def _ingest_sample(self, sample: dict):
        """Place sample into current frame and emit when complete."""
+        # [GUI-C2 FIX] Use FPGA frame_start bit as the authoritative sync token.
+        # If FPGA flags frame_start mid-stream (after a USB drop or any glitch),
+        # finalize whatever we have and re-align to bin (0, 0). Without this the
+        # count-only sync stays permanently misaligned after a single dropped byte.
+        if sample.get("frame_start", 0) and self._sample_idx > 0:
+            self._finalize_frame()  # resets _sample_idx to 0 and starts a new frame
+
        rbin = self._sample_idx // NUM_DOPPLER_BINS
        dbin = self._sample_idx % NUM_DOPPLER_BINS

@@ -788,12 +795,15 @@ class RadarAcquisition(threading.Thread):
            if sample.get("detection", 0):
                self._frame.detections[rbin, dbin] = 1
                self._frame.detection_count += 1
-            # Accumulate FPGA range profile data (matched-filter output)
-            # Each sample carries the range_i/range_q for this range bin.
-            # Accumulate magnitude across Doppler bins for the range profile.
+            # [GUI-C4 FIX] FPGA emits the same range_i/range_q for all 32 Doppler
+            # bins of a given range bin (it's the matched-filter range output,
+            # repeated per Doppler cell). Accumulating across all 32 inflates
+            # the profile 32x. Capture once per range bin at the first Doppler
+            # cell instead.
+            if dbin == 0:
                ri = int(sample.get("range_i", 0))
                rq = int(sample.get("range_q", 0))
-            self._frame.range_profile[rbin] += abs(ri) + abs(rq)
+                self._frame.range_profile[rbin] = abs(ri) + abs(rq)

        self._sample_idx += 1

@@ -66,8 +66,8 @@ class TestRadarSettings(unittest.TestCase):
    def test_defaults(self):
        s = _models().RadarSettings()
        self.assertEqual(s.system_frequency, 10.5e9)
-        self.assertEqual(s.coverage_radius, 1536)
-        self.assertEqual(s.max_distance, 1536)
+        self.assertEqual(s.coverage_radius, 3072)
+        self.assertEqual(s.max_distance, 3072)


 class TestGPSData(unittest.TestCase):
@@ -430,17 +430,17 @@ class TestWaveformConfig(unittest.TestCase):
        self.assertEqual(wc.chirp_duration_s, 30e-6)
        self.assertEqual(wc.pri_s, 167e-6)
        self.assertEqual(wc.center_freq_hz, 10.5e9)
-        self.assertEqual(wc.n_range_bins, 64)
+        self.assertEqual(wc.n_range_bins, 512)
        self.assertEqual(wc.n_doppler_bins, 32)
        self.assertEqual(wc.chirps_per_subframe, 16)
-        self.assertEqual(wc.fft_size, 1024)
-        self.assertEqual(wc.decimation_factor, 16)
+        self.assertEqual(wc.fft_size, 2048)
+        self.assertEqual(wc.decimation_factor, 4)

    def test_range_resolution(self):
-        """range_resolution_m should be ~23.98 m/bin (matched filter, 100 MSPS)."""
+        """range_resolution_m should be ~6.0 m/bin (matched filter, 100 MSPS, decim 4)."""
        from v7.models import WaveformConfig
        wc = WaveformConfig()
-        self.assertAlmostEqual(wc.range_resolution_m, 23.983, places=1)
+        self.assertAlmostEqual(wc.range_resolution_m, 5.996, places=2)

    def test_velocity_resolution(self):
        """velocity_resolution_mps should be ~5.34 m/s/bin (PRI=167us, 16 chirps)."""
@@ -452,7 +452,7 @@ class TestWaveformConfig(unittest.TestCase):
        """max_range_m = range_resolution * n_range_bins."""
        from v7.models import WaveformConfig
        wc = WaveformConfig()
-        self.assertAlmostEqual(wc.max_range_m, wc.range_resolution_m * 64, places=1)
+        self.assertAlmostEqual(wc.max_range_m, wc.range_resolution_m * 512, places=1)

    def test_max_velocity(self):
        """max_velocity_mps = velocity_resolution * n_doppler_bins / 2."""
@@ -927,9 +927,9 @@ class TestExtractTargetsFromFrame(unittest.TestCase):
        """Detection at range bin 10 → range = 10 * range_resolution."""
        from v7.processing import extract_targets_from_frame
        frame = self._make_frame(det_cells=[(10, 16)])  # dbin=16 = center → vel=0
-        targets = extract_targets_from_frame(frame, range_resolution=23.983)
+        targets = extract_targets_from_frame(frame, range_resolution=5.996)
        self.assertEqual(len(targets), 1)
-        self.assertAlmostEqual(targets[0].range, 10 * 23.983, places=1)
+        self.assertAlmostEqual(targets[0].range, 10 * 5.996, places=1)
        self.assertAlmostEqual(targets[0].velocity, 0.0, places=2)

    def test_velocity_sign(self):
@@ -109,11 +109,11 @@ class RadarSettings:
    the actual waveform parameters.
    """
    system_frequency: float = 10.5e9    # Hz (carrier, used for velocity calc)
-    range_resolution: float = 24.0       # Meters per range bin (c/(2*Fs)*decim)
+    range_resolution: float = 6.0        # Meters per range bin (c/(2*Fs)*decim = 1.5*4)
    velocity_resolution: float = 1.0     # m/s per Doppler bin (calibrate to waveform)
-    max_distance: float = 1536           # Max detection range (m)
-    map_size: float = 2000               # Map display size (m)
-    coverage_radius: float = 1536        # Map coverage radius (m)
+    max_distance: float = 3072           # Max detection range (m), 3 km mode
+    map_size: float = 4000               # Map display size (m)
+    coverage_radius: float = 3072        # Map coverage radius (m), 3 km mode


@dataclass
@@ -211,11 +211,11 @@ class WaveformConfig:
    chirp_duration_s: float = 30e-6      # Long chirp ramp time
    pri_s: float = 167e-6               # Pulse repetition interval (chirp + listen)
    center_freq_hz: float = 10.5e9       # Carrier frequency (radar_scene.py: F_CARRIER)
-    n_range_bins: int = 64               # After decimation
+    n_range_bins: int = 512              # After decimation (3 km mode; 4096 in 20 km)
    n_doppler_bins: int = 32             # Total Doppler bins (2 sub-frames x 16)
    chirps_per_subframe: int = 16        # Chirps in one Doppler sub-frame
-    fft_size: int = 1024                 # Pre-decimation FFT length
-    decimation_factor: int = 16          # 1024 → 64
+    fft_size: int = 2048                 # Pre-decimation FFT length
+    decimation_factor: int = 4           # 2048 → 512

    @property
    def range_resolution_m(self) -> float:
@@ -31,11 +31,9 @@ minutes.
 """
 from __future__ import annotations

-import os
 import random
 import subprocess
 import sys
-import tempfile
 from pathlib import Path

 import pytest
@@ -131,8 +129,12 @@ def _run_seed(seed: int, vvp: Path, work: Path) -> tuple[int, list[tuple[int, in
        f"+csv={csv_path}",
        f"+tag=seed{seed:04d}",
    ]
-    res = subprocess.run(cmd, cwd=FPGA_DIR, capture_output=True, text=True, check=False, timeout=120)
-    assert res.returncode == 0, f"vvp exit={res.returncode}\nstdout:\n{res.stdout}\nstderr:\n{res.stderr}"
+    res = subprocess.run(
+        cmd, cwd=FPGA_DIR, capture_output=True, text=True, check=False, timeout=120,
+    )
+    assert res.returncode == 0, (
+        f"vvp exit={res.returncode}\nstdout:\n{res.stdout}\nstderr:\n{res.stderr}"
+    )
    assert csv_path.exists(), (
        f"vvp completed rc=0 but CSV was not produced at {csv_path}\n"
        f"cmd: {cmd}\nstdout:\n{res.stdout[-2000:]}\nstderr:\n{res.stderr[-500:]}"
@@ -141,7 +143,9 @@ def _run_seed(seed: int, vvp: Path, work: Path) -> tuple[int, list[tuple[int, in
    rows = []
    with csv_path.open() as fh:
        header = fh.readline()
-        assert "baseband_i" in header and "baseband_q" in header, f"unexpected CSV header: {header!r}"
+        assert "baseband_i" in header and "baseband_q" in header, (
+            f"unexpected CSV header: {header!r}"
+        )
        for line in fh:
            parts = line.strip().split(",")
            if len(parts) != 3: