Skip to the content.
How U.S. Payments Really Work Part 20
How U.S. Payments Really Work Part 20

Handling ACH Returns with Missing or Broken Data

Prevent wrong-customer reversals by forcing every return through: immutable raw evidence → tolerant parsing → confidence scoring → conservative matching.

Suma Manjunath
Author: Suma Manjunath
Published on: November 05, 2025

Handling returns with missing or broken data

One-line metric: When a return is ambiguous, don’t guess—preserve immutable raw evidence and route it to manual resolution. Auto-actions are allowed only when identity is unique and high-confidence.When a return is ambiguous, don’t guess—preserve immutable raw evidence and route it to manual resolution. Auto-actions are allowed only when identity is unique and high-confidence.

Audience: Payments engineers, backend engineers, platform architectsPayments engineers, backend engineers, platform architect.
Reading time: 12–16 minutes.
Prerequisites: ACH returns basics, ability to run a single Ruby scriptACH returns basics, ability to run a single Ruby script.
Why now: The first time a bank portal export drops trace numbers, your “exact match or bust” system turns into guesswork—and guesswork reverses the wrong customer.

TL;DR:

⚠️ Disclaimer: All scenarios, accounts, names, and data used in examples are not real. They are realistic scenarios provided only for educational and illustrative purposes.


Problem Definition

ACH returns are supposed to be trackable. In reality, your “return feed” might be true NACHA, a processor’s transformed JSON, a bank portal CSV export, or an “exceptions report” that silently drops fields. That’s how you end up with returns missing:

When the feed is fully stripped, you’re no longer safely “processing ACH returns.” You’re processing a failure notice that may not be attributable to any specific payment without additional correlation.

The ugly truth: “last4_account_number + amount” fails at scale (and it fails earlier than people think)

Warning: At 10K+ monthly transactions, you will absolutely see collisions on last4_account_number + amount (and even last4_account_number + amount + company_id if you originate repetitive amounts). If your system auto-matches on these heuristics without guardrails, you are statistically guaranteed to reverse the wrong customer.

Rule: If a heuristic yields multiple candidates, you must never auto-pick.

When do you need this pattern?

Use this pattern if any of these are true:

If you’re early-stage:

Cost of getting it wrong (concrete, practical)

Wrong-customer reversals are expensive because they trigger cross-team investigation and reconciliation . The cost is typically hours of staff time plus any customer remediation.

ℹ️ Note: The exact number varies. The point is that “small” matching mistakes become expensive fast when your inbound return data isn’t clean.


Solution Implementation

This pattern has four non-negotiables:

  1. Immutable Raw Return System of Record Store the raw return payload exactly as received and never mutate it.
  2. Transmission System of Record (what you sent) Maintain immutable evidence of outbound transmissions: file identifiers, hashes, batch/entry metadata, and the identifiers you generated at send-time.
  3. Confidence-Based Matching Matching is probabilistic. You must produce a confidence score and rationale.
  4. Manual Review Lane If identity is ambiguous, the system must say: “I don’t know.”

Flow

flowchart TD
  A["Inbound Return Feed (NACHA File / Portal Export / Processor JSON)"] --> B["Immutable Raw Return System of Record (Store Raw Payload + Delivery Metadata)"]
  B --> C["Tolerant Parser (Best-Effort Normalization + Parse Error Capture)"]
  C --> D["Return Case (Nullable Fields + Parse Errors + Confidence Score)"]
  D --> E["Matching Engine (Exact Keys → Strong Heuristics → Weak Heuristics → Candidate Ranking)"]
  E --> F["Ledger Action Gate (Idempotency for Ledger Actions + No Double-Reversal Rules)"]
  E --> G["Manual Review Queue (Low Confidence / Multiple Candidates / Stripped Feed)"]
  G --> H["Ops Resolution (Attach to Customer + Notes + Finalize)"]
  F --> I["Accounting Ledger (Apply Reversal / Adjust Balance / Notify)"]
  J["Outbound Transmission System of Record (File Hashes + Batch/Entry Evidence + Trace Numbers at Send-Time)"] --> E

The Three Risks You Must Treat as “Production Stoppers”

1) Last-4 collision math (don’t bury this)

Warning: last4_account_number is a weak identifier. At scale, last4_account_number + amount is effectively a “bucket,” not an identity key. If you auto-match on it, you are committing to wrong-customer actions.

Rule: Multiple candidates → manual review.

2) Idempotency must exist in two places (and you need both)

Warning: Ingest idempotency prevents duplicate events. Ledger-action idempotency prevents duplicate financial actions. You need both.

If you only do ingest idempotency: reprocessing can still double-reverse. If you only do ledger-action idempotency: you can still lose audit evidence and re-ingest noise incorrectly.

3) The “fix corrupt data” trap (engineers WILL try this)

NEVER DO THIS (put this in your code review checklist)

Warning:

When data is corrupt, your job is to preserve evidence and reduce harm, not to fabricate certainty.


Identity Quality: How to behave when the feed is stripped

When you’ve seen missing names, missing payment identifiers, missing batch headers, and “any payment info fully stripped,” treat identity as a first-class field:

Suggested mapping:

💡 Tip: If identity_quality = none, your best “keys” often come from delivery context (SFTP path, file naming, timestamps) plus your transmission system of record (what you sent) and any provider/bank “detail view” you can query.


Matching Engine: Tiered + Conservative + Recurrence Guardrail

Matching is tiered and conservative. We first match on a payment identifier (trace, internal payment ID). If missing, we use a batch identifier to narrow the search space to a known transmission. If that’s also missing, we reconstruct candidates from batch header context (SEC code, company ID, effective date window) and match using entry evidence (amount, account signal, discretionary data, name). Heuristic combinations are only allowed when they produce exactly one candidate. For recurring payments, we apply a 7–10 banking-day window to avoid misattributing a return to the wrong cycle. When in doubt, the system routes the case to manual review.


Design for Resilience

Discretionary Data as a Correlation Handle

If you populate discretionary_data or addenda fields with a stable internal reference (payment ID suffix, short hash), it becomes a semi-identifier that banks won’t strip. Future-you will thank past-you for embedding correlation handles in fields that survive export transformations.If you populate discretionary_data or addenda fields with a stable internal reference (payment ID suffix, short hash), it becomes a semi-identifier that banks won’t strip. Future-you will thank past-you for embedding correlation handles in fields that survive export transformations.

Core Pattern in One Runnable Ruby File

This is the pattern without infrastructure distractions. Copy-paste into ach_return_matcher.rb and run.

#!/usr/bin/env ruby
# ach_return_matcher.rb
# Demonstrates: raw return system of record, tolerant parsing, confidence-based matching
# Run: ruby ach_return_matcher.rb

require 'json'
require 'digest'
require 'date'

# ============================================================================
# DATA STRUCTURES (in-memory for demo)
# ============================================================================

# Simulates your outbound transmission system of record (what you sent)
# Added: file_id, batch_id, discretionary_data, is_recurring
OUTBOUND_TRANSMISSIONS_SOR = [
  {
    id: 1,
    file_id: 'FILE_20240817_A',
    batch_id: 'BATCH_0007',
    trace_number: '061000050001234',
    routing_number: '061000052',
    account_last4: '6789',
    amount_cents: 12_500,
    effective_date: '20240817',
    company_id: 'ACMEPAY001',
    discretionary_data: 'PAY_9f12', # correlation handle you embed at send-time
    is_recurring: true
  },
  # Another entry in the same batch to show narrowing behavior
  {
    id: 2,
    file_id: 'FILE_20240817_A',
    batch_id: 'BATCH_0007',
    trace_number: '061000050009999',
    routing_number: '061000052',
    account_last4: '1111',
    amount_cents: 9_900,
    effective_date: '20240817',
    company_id: 'ACMEPAY001',
    discretionary_data: 'PAY_2a88',
    is_recurring: true
  }
].freeze

# Raw returns system of record (immutable)
$raw_return_events = []

# Return cases (normalized, mutable)
$return_cases = []

# ============================================================================
# CORE PATTERN: Immutable Raw Return System of Record
# ============================================================================

def store_raw_return_event(source:, filename:, payload:)
  # Ingest idempotency key
  idempotency_key = Digest::SHA256.hexdigest("#{source}|#{filename}|#{payload}")

  return nil if $raw_return_events.any? { |e| e[:idempotency_key] == idempotency_key }

  event = {
    id: $raw_return_events.size + 1,
    source: source,
    filename: filename,
    received_at: Time.now,
    idempotency_key: idempotency_key,
    raw_payload: payload # NEVER modify this
  }

  $raw_return_events << event
  event[:id]
end

# ============================================================================
# TOLERANT PARSING: Accept broken data, record errors
# ============================================================================

def parse_return(raw_payload)
  errors = []

  begin
    data = JSON.parse(raw_payload)
  rescue JSON::ParserError
    return { normalized: {}, batch_context: {}, errors: ['invalid_json'] }
  end

  normalized = {
    return_code: data['return_reason_code']&.strip,
    trace_number: data['original_trace_number']&.strip,
    routing: data['routing_number']&.strip,
    account_last4: data['account_number_last4']&.strip,
    amount_cents: data['amount_cents'],
    settlement_date: data['settlement_date']&.strip,
    company_id: data['company_id']&.strip,
    discretionary_data: data['discretionary_data']&.strip
  }

  # Batch context can come from a provider envelope, filename mapping, or extra fields in transformed feeds.
  batch_context = {
    file_id: data['file_id']&.strip,
    batch_id: data['batch_id']&.strip
  }

  if normalized[:trace_number] && !normalized[:trace_number].match?(/^\d{15}$/)
    errors << 'invalid_trace_number'
    normalized[:trace_number] = nil
  end

  if normalized[:routing] && !normalized[:routing].match?(/^\d{9}$/)
    errors << 'invalid_routing'
    normalized[:routing] = nil
  end

  if normalized[:account_last4] && !normalized[:account_last4].match?(/^\d{4}$/)
    errors << 'invalid_last4'
    normalized[:account_last4] = nil
  end

  if normalized[:amount_cents] && (!normalized[:amount_cents].is_a?(Integer) || normalized[:amount_cents] < 0)
    errors << 'invalid_amount'
    normalized[:amount_cents] = nil
  end

  { normalized: normalized, batch_context: batch_context, errors: errors }
end

# ============================================================================
# CONFIDENCE-BASED MATCHING (Corrected): Tier 0/1/2 + Recurrence guardrail
# ============================================================================

# Tier 0: Primary identifier (payment_id, trace, processor txn id)
# Tier 1: Batch identifier (file_id+batch_id narrows search space)
# Tier 2: Batch header + entry evidence (when identifiers are missing)
# Recurrence check: 7-10 banking day window prevents wrong-cycle matches
#
# NOTE: This demo approximates "banking days" using calendar days for simplicity.
# In production, compute true business days using your bank holiday calendar.

def match_return(normalized, batch_context = {})
  # Tier 0: payment identifier (confidence = 1.0)
  if normalized[:trace_number]
    exact = OUTBOUND_TRANSMISSIONS_SOR.find { |e| e[:trace_number] == normalized[:trace_number] }
    return { status: :matched, entry_id: exact[:id], confidence: 1.0, rationale: 'payment_identifier' } if exact
  end

  # Tier 1: batch identifier (confidence = 0.95, narrows search space)
  if batch_context[:batch_id]
    candidates = OUTBOUND_TRANSMISSIONS_SOR.select { |e| e[:batch_id] == batch_context[:batch_id] }

    # Now match within batch using entry evidence
    if candidates.size == 1 && normalized[:amount_cents] && candidates.first[:amount_cents] == normalized[:amount_cents]
      return { status: :matched, entry_id: candidates.first[:id], confidence: 0.95, rationale: 'batch_identifier' }
    end

    # If multiple in batch, use additional entry evidence to disambiguate (still conservative)
    if candidates.size > 1
      narrowed = candidates

      # Strongest low-friction signals first (amount + account_last4 + discretionary_data)
      if normalized[:amount_cents]
        narrowed = narrowed.select { |e| e[:amount_cents] == normalized[:amount_cents] }
      end
      if normalized[:account_last4]
        narrowed = narrowed.select { |e| e[:account_last4] == normalized[:account_last4] }
      end
      if normalized[:discretionary_data]
        narrowed = narrowed.select { |e| e[:discretionary_data] == normalized[:discretionary_data] }
      end

      if narrowed.size == 1
        return { status: :matched, entry_id: narrowed.first[:id], confidence: 0.95, rationale: 'batch_identifier_with_entry_evidence' }
      elsif narrowed.size > 1
        return { status: :needs_review, confidence: 0.6, rationale: 'multiple_candidates_in_batch', candidates: narrowed.map { |c| c[:id] } }
      end
    end
  end

  # Tier 2: Batch header context + entry evidence (confidence = 0.85)
  if normalized[:account_last4] && normalized[:amount_cents] && normalized[:company_id]
    candidates = OUTBOUND_TRANSMISSIONS_SOR.select do |e|
      e[:account_last4] == normalized[:account_last4] &&
      e[:amount_cents] == normalized[:amount_cents] &&
      e[:company_id] == normalized[:company_id]
    end

    # Optional extra disambiguator if present
    if normalized[:discretionary_data]
      candidates = candidates.select { |e| e[:discretionary_data] == normalized[:discretionary_data] } if candidates.size > 1
    end

    # Recurrence guardrail: prevent wrong-cycle matches
    if candidates.size == 1
      candidate = candidates.first
      days_since_payment = (Date.today - Date.strptime(candidate[:effective_date], '%Y%m%d')).to_i

      # Concrete guardrail: recurring payments inside a 7–10 banking-day window should not auto-match
      if candidate[:is_recurring] && days_since_payment < 10
        return { status: :needs_review, confidence: 0.6, rationale: 'recurrence_cooldown_window' }
      end

      return { status: :matched, entry_id: candidate[:id], confidence: 0.85, rationale: 'batch_header_entry_evidence' }
    elsif candidates.size > 1
      return { status: :needs_review, confidence: 0.6, rationale: 'multiple_candidates', candidates: candidates.map { |c| c[:id] } }
    end
  end

  { status: :needs_review, confidence: 0.0, rationale: 'insufficient_identity' }
end

# ============================================================================
# RETURN CASE: Normalized view with match state
# ============================================================================

def create_return_case(raw_event_id:, normalized:, batch_context:, errors:, match_result:)
  identity_quality =
    if normalized[:trace_number]
      :strong
    elsif batch_context[:batch_id]
      :medium
    elsif normalized[:account_last4] && normalized[:amount_cents] && normalized[:company_id]
      :weak
    else
      :none
    end

  {
    id: $return_cases.size + 1,
    raw_event_id: raw_event_id,
    batch_id: batch_context[:batch_id],
    file_id: batch_context[:file_id],
    **normalized,
    parse_errors: errors,
    identity_quality: identity_quality,
    match_confidence: match_result[:confidence],
    matched_entry_id: match_result[:entry_id],
    status: match_result[:status],
    rationale: match_result[:rationale],
    candidates: match_result[:candidates],
    created_at: Time.now
  }.tap { |rc| $return_cases << rc }
end

# ============================================================================
# END-TO-END PROCESSOR
# ============================================================================

def process_return_file(source:, filename:, lines:)
  results = { processed: 0, matched: 0, needs_review: 0, duplicates: 0 }

  lines.each do |line|
    next if line.strip.empty?

    raw_event_id = store_raw_return_event(source: source, filename: filename, payload: line)
    if raw_event_id.nil?
      results[:duplicates] += 1
      next
    end

    parsed = parse_return(line)
    match_result = match_return(parsed[:normalized], parsed[:batch_context])

    create_return_case(
      raw_event_id: raw_event_id,
      normalized: parsed[:normalized],
      batch_context: parsed[:batch_context],
      errors: parsed[:errors],
      match_result: match_result
    )

    results[:processed] += 1
    match_result[:status] == :matched ? results[:matched] += 1 : results[:needs_review] += 1
  end

  results
end

# ============================================================================
# DEMO
# ============================================================================

sample_returns = [
  # Perfect match via trace (Tier 0)
  '{"return_reason_code":"R01","original_trace_number":"061000050001234","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007","discretionary_data":"PAY_9f12"}',

  # Missing trace, match via batch_id narrowing + entry evidence (Tier 1)
  '{"return_reason_code":"R03","original_trace_number":"","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007"}',

  # Corrupt trace + missing amount = needs review (insufficient entry evidence)
  '{"return_reason_code":"R19","original_trace_number":"06100005000123X","routing_number":"061000052","account_number_last4":"6789","amount_cents":null,"settlement_date":"20251029","company_id":"ACMEPAY001","batch_id":"BATCH_0007"}',

  # Fully stripped (no usable identity) = needs review (identity_quality :none)
  '{"return_reason_code":"R03"}',

  # Duplicate of first (should skip)
  '{"return_reason_code":"R01","original_trace_number":"061000050001234","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007","discretionary_data":"PAY_9f12"}'
]

puts "Processing ACH returns...\n\n"

results = process_return_file(
  source: 'SFTP_BANK_X',
  filename: 'RETURNS_20251029.ndjson',
  lines: sample_returns
)

puts "Results:"
puts "  Processed: #{results[:processed]}"
puts "  Matched: #{results[:matched]}"
puts "  Needs Review: #{results[:needs_review]}"
puts "  Duplicates Skipped: #{results[:duplicates]}"
puts "\n"

puts "Return Cases:\n"
$return_cases.each do |rc|
  puts "  Case ##{rc[:id]}:"
  puts "    Status: #{rc[:status]}"
  puts "    Rationale: #{rc[:rationale]}"
  puts "    Identity Quality: #{rc[:identity_quality]}"
  puts "    Confidence: #{rc[:match_confidence]}"
  puts "    Matched Entry: #{rc[:matched_entry_id] || 'none'}"
  puts "    Candidates: #{rc[:candidates] ? rc[:candidates].join(', ') : 'n/a'}"
  puts "    Parse Errors: #{rc[:parse_errors].empty? ? 'none' : rc[:parse_errors].join(', ')}"
  puts "    File ID: #{rc[:file_id] || 'MISSING'}"
  puts "    Batch ID: #{rc[:batch_id] || 'MISSING'}"
  puts "    Trace: #{rc[:trace_number] || 'MISSING'}"
  puts "    Last4: #{rc[:account_last4] || 'MISSING'}"
  puts "    Amount: #{rc[:amount_cents] || 'MISSING'}"
  puts "    Discretionary: #{rc[:discretionary_data] || 'MISSING'}"
  puts ""
end

Run it

ruby ach_return_matcher.rb

Expected outcomes:


Practical Thresholds & Defaults (so engineers don’t improvise)

Suggested auto-action threshold

7–10 banking-day recurrence guardrail (make it policy, not folklore)

If a payment is marked recurring (subscriptions/tuition/payroll-like patterns), do not auto-match a return to a candidate whose effective date is within 7–10 banking days of “today” (or within your normal settlement/return latency window). Route to review unless you have a Tier 0 identifier .


Validation & Monitoring

Minimal validation checks

Minimal metrics

Warning : Alert on sudden spikes in identity_quality=none or parse errors—this often indicates a provider/export format change


Takeaways

Next steps

  1. Add an immutable raw return store (even if it’s just an append-only table/S3 bucket + pointer).
  2. Ensure you have a transmission system of record (file hashes + entry evidence at send-time).
  3. Implement Tier 0/1/2 matching + recurrence guardrail.
  4. Ship a manual review lane before enabling auto-actions.
  5. Start embedding a discretionary/addenda correlation handle in outbound entries.

Acronyms & Definitions


References

  1. Nacha ACH Volume Stats - NACHA ACH Volume Statistics, 2024

Comments & Discussion

Share your thoughts, ask questions, or start a discussion about this article.