Handling ACH Returns with Missing or Broken Data
Prevent wrong-customer reversals by forcing every return through: immutable raw evidence → tolerant parsing → confidence scoring → conservative matching.

One-line metric: When a return is ambiguous, don’t guess—preserve immutable raw evidence and route it to manual resolution. Auto-actions are allowed only when identity is unique and high-confidence.When a return is ambiguous, don’t guess—preserve immutable raw evidence and route it to manual resolution. Auto-actions are allowed only when identity is unique and high-confidence.
Audience: Payments engineers, backend engineers, platform architectsPayments engineers, backend engineers, platform architect.
Reading time: 12–16 minutes.
Prerequisites: ACH returns basics, ability to run a single Ruby scriptACH returns basics, ability to run a single Ruby script.
Why now: The first time a bank portal export drops trace numbers, your “exact match or bust” system turns into guesswork—and guesswork reverses the wrong customer.
TL;DR:
- Store every inbound return as an immutable raw return system of record (even if parsing fails).
- Maintain a transmission system of record for what you sent (file/batch/entry evidence + hashes).
- Normalize into a Return Case with nullable fields + parse errors.
- Match with tiers (exact → strong → weak → manual review) and output a confidence score.
- Never auto-pick when there are multiple candidates.
- You need two idempotency layers: ingest and ledger actions.
⚠️ Disclaimer: All scenarios, accounts, names, and data used in examples are not real. They are realistic scenarios provided only for educational and illustrative purposes.
Problem Definition
ACH returns are supposed to be trackable. In reality, your “return feed” might be true NACHA, a processor’s transformed JSON, a bank portal CSV export, or an “exceptions report” that silently drops fields. That’s how you end up with returns missing:
- original trace number
- names
- payment identifiers
- batch headers / file headers
- amount (null/zero/mismatched)
- addenda context
When the feed is fully stripped, you’re no longer safely “processing ACH returns.” You’re processing a failure notice that may not be attributable to any specific payment without additional correlation.
The ugly truth: “last4_account_number + amount” fails at scale (and it fails earlier than people think)
❗ Warning: At 10K+ monthly transactions, you will absolutely see collisions on last4_account_number + amount (and even last4_account_number + amount + company_id if you originate repetitive amounts). If your system auto-matches on these heuristics without guardrails, you are statistically guaranteed to reverse the wrong customer.
Rule: If a heuristic yields multiple candidates, you must never auto-pick.
When do you need this pattern?
Use this pattern if any of these are true:
- You’re a regulated entity (or audited like one) and need defensible evidence trails
- You process hundreds of returns per month
- Your return feed is not guaranteed to preserve trace numbers (portal exports, transformed files)
- You originate repetitive amounts at scale (subscriptions, tuition, payroll-like patterns)
If you’re early-stage:
- If you have <100 returns/month, it can be rational to manual-review everything (still store immutable raw events). You’re buying correctness while you learn your provider’s quirks.
Cost of getting it wrong (concrete, practical)
Wrong-customer reversals are expensive because they trigger cross-team investigation and reconciliation . The cost is typically hours of staff time plus any customer remediation.
ℹ️ Note: The exact number varies. The point is that “small” matching mistakes become expensive fast when your inbound return data isn’t clean.
Solution Implementation
This pattern has four non-negotiables:
- Immutable Raw Return System of Record Store the raw return payload exactly as received and never mutate it.
- Transmission System of Record (what you sent) Maintain immutable evidence of outbound transmissions: file identifiers, hashes, batch/entry metadata, and the identifiers you generated at send-time.
- Confidence-Based Matching Matching is probabilistic. You must produce a confidence score and rationale.
- Manual Review Lane If identity is ambiguous, the system must say: “I don’t know.”
Flow
flowchart TD
A["Inbound Return Feed (NACHA File / Portal Export / Processor JSON)"] --> B["Immutable Raw Return System of Record (Store Raw Payload + Delivery Metadata)"]
B --> C["Tolerant Parser (Best-Effort Normalization + Parse Error Capture)"]
C --> D["Return Case (Nullable Fields + Parse Errors + Confidence Score)"]
D --> E["Matching Engine (Exact Keys → Strong Heuristics → Weak Heuristics → Candidate Ranking)"]
E --> F["Ledger Action Gate (Idempotency for Ledger Actions + No Double-Reversal Rules)"]
E --> G["Manual Review Queue (Low Confidence / Multiple Candidates / Stripped Feed)"]
G --> H["Ops Resolution (Attach to Customer + Notes + Finalize)"]
F --> I["Accounting Ledger (Apply Reversal / Adjust Balance / Notify)"]
J["Outbound Transmission System of Record (File Hashes + Batch/Entry Evidence + Trace Numbers at Send-Time)"] --> E
The Three Risks You Must Treat as “Production Stoppers”
1) Last-4 collision math (don’t bury this)
❗ Warning: last4_account_number is a weak identifier. At scale, last4_account_number + amount is effectively a “bucket,” not an identity key. If you auto-match on it, you are committing to wrong-customer actions.
Rule: Multiple candidates → manual review.
2) Idempotency must exist in two places (and you need both)
❗ Warning: Ingest idempotency prevents duplicate events. Ledger-action idempotency prevents duplicate financial actions. You need both.
- Ingest idempotency: “Have I already recorded this raw return payload?”
- Ledger-action idempotency: “Have I already applied a reversal/adjustment for this Return Case?”
If you only do ingest idempotency: reprocessing can still double-reverse. If you only do ledger-action idempotency: you can still lose audit evidence and re-ingest noise incorrectly.
3) The “fix corrupt data” trap (engineers WILL try this)
NEVER DO THIS (put this in your code review checklist)
❗ Warning:
- ❌ Never “fix” a corrupt trace number by guessing characters.
- ❌ Never auto-match if your heuristic returns multiple candidates.
- ❌ Never delete raw return events after parsing (“we normalized it already”).
- ❌ Never disable idempotency because “it’s just a backfill/test run.”
When data is corrupt, your job is to preserve evidence and reduce harm, not to fabricate certainty.
Identity Quality: How to behave when the feed is stripped
When you’ve seen missing names, missing payment identifiers, missing batch headers, and “any payment info fully stripped,” treat identity as a first-class field:
identity_quality = strong | medium | weak | none
Suggested mapping:
- strong: valid original trace number (or other immutable payment identifier) → safe to auto-action
- medium: unique match from strong heuristic (e.g.,
last4 + amount + company_id) → auto-action only if exactly one candidate - weak:
last4 + amountonly → collisions likely → manual review unless you have additional constraints - none: stripped (no trace, no amount, no account signal) → manual review only
💡 Tip: If identity_quality = none, your best “keys” often come from delivery context (SFTP path, file naming, timestamps) plus your transmission system of record (what you sent) and any provider/bank “detail view” you can query.
Matching Engine: Tiered + Conservative + Recurrence Guardrail
Matching is tiered and conservative. We first match on a payment identifier (trace, internal payment ID). If missing, we use a batch identifier to narrow the search space to a known transmission. If that’s also missing, we reconstruct candidates from batch header context (SEC code, company ID, effective date window) and match using entry evidence (amount, account signal, discretionary data, name). Heuristic combinations are only allowed when they produce exactly one candidate. For recurring payments, we apply a 7–10 banking-day window to avoid misattributing a return to the wrong cycle. When in doubt, the system routes the case to manual review.
Design for Resilience
Discretionary Data as a Correlation Handle
If you populate discretionary_data or addenda fields with a stable internal reference (payment ID suffix, short hash), it becomes a semi-identifier that banks won’t strip. Future-you will thank past-you for embedding correlation handles in fields that survive export transformations.If you populate discretionary_data or addenda fields with a stable internal reference (payment ID suffix, short hash), it becomes a semi-identifier that banks won’t strip. Future-you will thank past-you for embedding correlation handles in fields that survive export transformations.
Core Pattern in One Runnable Ruby File
This is the pattern without infrastructure distractions. Copy-paste into ach_return_matcher.rb and run.
#!/usr/bin/env ruby
# ach_return_matcher.rb
# Demonstrates: raw return system of record, tolerant parsing, confidence-based matching
# Run: ruby ach_return_matcher.rb
require 'json'
require 'digest'
require 'date'
# ============================================================================
# DATA STRUCTURES (in-memory for demo)
# ============================================================================
# Simulates your outbound transmission system of record (what you sent)
# Added: file_id, batch_id, discretionary_data, is_recurring
OUTBOUND_TRANSMISSIONS_SOR = [
{
id: 1,
file_id: 'FILE_20240817_A',
batch_id: 'BATCH_0007',
trace_number: '061000050001234',
routing_number: '061000052',
account_last4: '6789',
amount_cents: 12_500,
effective_date: '20240817',
company_id: 'ACMEPAY001',
discretionary_data: 'PAY_9f12', # correlation handle you embed at send-time
is_recurring: true
},
# Another entry in the same batch to show narrowing behavior
{
id: 2,
file_id: 'FILE_20240817_A',
batch_id: 'BATCH_0007',
trace_number: '061000050009999',
routing_number: '061000052',
account_last4: '1111',
amount_cents: 9_900,
effective_date: '20240817',
company_id: 'ACMEPAY001',
discretionary_data: 'PAY_2a88',
is_recurring: true
}
].freeze
# Raw returns system of record (immutable)
$raw_return_events = []
# Return cases (normalized, mutable)
$return_cases = []
# ============================================================================
# CORE PATTERN: Immutable Raw Return System of Record
# ============================================================================
def store_raw_return_event(source:, filename:, payload:)
# Ingest idempotency key
idempotency_key = Digest::SHA256.hexdigest("#{source}|#{filename}|#{payload}")
return nil if $raw_return_events.any? { |e| e[:idempotency_key] == idempotency_key }
event = {
id: $raw_return_events.size + 1,
source: source,
filename: filename,
received_at: Time.now,
idempotency_key: idempotency_key,
raw_payload: payload # NEVER modify this
}
$raw_return_events << event
event[:id]
end
# ============================================================================
# TOLERANT PARSING: Accept broken data, record errors
# ============================================================================
def parse_return(raw_payload)
errors = []
begin
data = JSON.parse(raw_payload)
rescue JSON::ParserError
return { normalized: {}, batch_context: {}, errors: ['invalid_json'] }
end
normalized = {
return_code: data['return_reason_code']&.strip,
trace_number: data['original_trace_number']&.strip,
routing: data['routing_number']&.strip,
account_last4: data['account_number_last4']&.strip,
amount_cents: data['amount_cents'],
settlement_date: data['settlement_date']&.strip,
company_id: data['company_id']&.strip,
discretionary_data: data['discretionary_data']&.strip
}
# Batch context can come from a provider envelope, filename mapping, or extra fields in transformed feeds.
batch_context = {
file_id: data['file_id']&.strip,
batch_id: data['batch_id']&.strip
}
if normalized[:trace_number] && !normalized[:trace_number].match?(/^\d{15}$/)
errors << 'invalid_trace_number'
normalized[:trace_number] = nil
end
if normalized[:routing] && !normalized[:routing].match?(/^\d{9}$/)
errors << 'invalid_routing'
normalized[:routing] = nil
end
if normalized[:account_last4] && !normalized[:account_last4].match?(/^\d{4}$/)
errors << 'invalid_last4'
normalized[:account_last4] = nil
end
if normalized[:amount_cents] && (!normalized[:amount_cents].is_a?(Integer) || normalized[:amount_cents] < 0)
errors << 'invalid_amount'
normalized[:amount_cents] = nil
end
{ normalized: normalized, batch_context: batch_context, errors: errors }
end
# ============================================================================
# CONFIDENCE-BASED MATCHING (Corrected): Tier 0/1/2 + Recurrence guardrail
# ============================================================================
# Tier 0: Primary identifier (payment_id, trace, processor txn id)
# Tier 1: Batch identifier (file_id+batch_id narrows search space)
# Tier 2: Batch header + entry evidence (when identifiers are missing)
# Recurrence check: 7-10 banking day window prevents wrong-cycle matches
#
# NOTE: This demo approximates "banking days" using calendar days for simplicity.
# In production, compute true business days using your bank holiday calendar.
def match_return(normalized, batch_context = {})
# Tier 0: payment identifier (confidence = 1.0)
if normalized[:trace_number]
exact = OUTBOUND_TRANSMISSIONS_SOR.find { |e| e[:trace_number] == normalized[:trace_number] }
return { status: :matched, entry_id: exact[:id], confidence: 1.0, rationale: 'payment_identifier' } if exact
end
# Tier 1: batch identifier (confidence = 0.95, narrows search space)
if batch_context[:batch_id]
candidates = OUTBOUND_TRANSMISSIONS_SOR.select { |e| e[:batch_id] == batch_context[:batch_id] }
# Now match within batch using entry evidence
if candidates.size == 1 && normalized[:amount_cents] && candidates.first[:amount_cents] == normalized[:amount_cents]
return { status: :matched, entry_id: candidates.first[:id], confidence: 0.95, rationale: 'batch_identifier' }
end
# If multiple in batch, use additional entry evidence to disambiguate (still conservative)
if candidates.size > 1
narrowed = candidates
# Strongest low-friction signals first (amount + account_last4 + discretionary_data)
if normalized[:amount_cents]
narrowed = narrowed.select { |e| e[:amount_cents] == normalized[:amount_cents] }
end
if normalized[:account_last4]
narrowed = narrowed.select { |e| e[:account_last4] == normalized[:account_last4] }
end
if normalized[:discretionary_data]
narrowed = narrowed.select { |e| e[:discretionary_data] == normalized[:discretionary_data] }
end
if narrowed.size == 1
return { status: :matched, entry_id: narrowed.first[:id], confidence: 0.95, rationale: 'batch_identifier_with_entry_evidence' }
elsif narrowed.size > 1
return { status: :needs_review, confidence: 0.6, rationale: 'multiple_candidates_in_batch', candidates: narrowed.map { |c| c[:id] } }
end
end
end
# Tier 2: Batch header context + entry evidence (confidence = 0.85)
if normalized[:account_last4] && normalized[:amount_cents] && normalized[:company_id]
candidates = OUTBOUND_TRANSMISSIONS_SOR.select do |e|
e[:account_last4] == normalized[:account_last4] &&
e[:amount_cents] == normalized[:amount_cents] &&
e[:company_id] == normalized[:company_id]
end
# Optional extra disambiguator if present
if normalized[:discretionary_data]
candidates = candidates.select { |e| e[:discretionary_data] == normalized[:discretionary_data] } if candidates.size > 1
end
# Recurrence guardrail: prevent wrong-cycle matches
if candidates.size == 1
candidate = candidates.first
days_since_payment = (Date.today - Date.strptime(candidate[:effective_date], '%Y%m%d')).to_i
# Concrete guardrail: recurring payments inside a 7–10 banking-day window should not auto-match
if candidate[:is_recurring] && days_since_payment < 10
return { status: :needs_review, confidence: 0.6, rationale: 'recurrence_cooldown_window' }
end
return { status: :matched, entry_id: candidate[:id], confidence: 0.85, rationale: 'batch_header_entry_evidence' }
elsif candidates.size > 1
return { status: :needs_review, confidence: 0.6, rationale: 'multiple_candidates', candidates: candidates.map { |c| c[:id] } }
end
end
{ status: :needs_review, confidence: 0.0, rationale: 'insufficient_identity' }
end
# ============================================================================
# RETURN CASE: Normalized view with match state
# ============================================================================
def create_return_case(raw_event_id:, normalized:, batch_context:, errors:, match_result:)
identity_quality =
if normalized[:trace_number]
:strong
elsif batch_context[:batch_id]
:medium
elsif normalized[:account_last4] && normalized[:amount_cents] && normalized[:company_id]
:weak
else
:none
end
{
id: $return_cases.size + 1,
raw_event_id: raw_event_id,
batch_id: batch_context[:batch_id],
file_id: batch_context[:file_id],
**normalized,
parse_errors: errors,
identity_quality: identity_quality,
match_confidence: match_result[:confidence],
matched_entry_id: match_result[:entry_id],
status: match_result[:status],
rationale: match_result[:rationale],
candidates: match_result[:candidates],
created_at: Time.now
}.tap { |rc| $return_cases << rc }
end
# ============================================================================
# END-TO-END PROCESSOR
# ============================================================================
def process_return_file(source:, filename:, lines:)
results = { processed: 0, matched: 0, needs_review: 0, duplicates: 0 }
lines.each do |line|
next if line.strip.empty?
raw_event_id = store_raw_return_event(source: source, filename: filename, payload: line)
if raw_event_id.nil?
results[:duplicates] += 1
next
end
parsed = parse_return(line)
match_result = match_return(parsed[:normalized], parsed[:batch_context])
create_return_case(
raw_event_id: raw_event_id,
normalized: parsed[:normalized],
batch_context: parsed[:batch_context],
errors: parsed[:errors],
match_result: match_result
)
results[:processed] += 1
match_result[:status] == :matched ? results[:matched] += 1 : results[:needs_review] += 1
end
results
end
# ============================================================================
# DEMO
# ============================================================================
sample_returns = [
# Perfect match via trace (Tier 0)
'{"return_reason_code":"R01","original_trace_number":"061000050001234","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007","discretionary_data":"PAY_9f12"}',
# Missing trace, match via batch_id narrowing + entry evidence (Tier 1)
'{"return_reason_code":"R03","original_trace_number":"","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007"}',
# Corrupt trace + missing amount = needs review (insufficient entry evidence)
'{"return_reason_code":"R19","original_trace_number":"06100005000123X","routing_number":"061000052","account_number_last4":"6789","amount_cents":null,"settlement_date":"20251029","company_id":"ACMEPAY001","batch_id":"BATCH_0007"}',
# Fully stripped (no usable identity) = needs review (identity_quality :none)
'{"return_reason_code":"R03"}',
# Duplicate of first (should skip)
'{"return_reason_code":"R01","original_trace_number":"061000050001234","routing_number":"061000052","account_number_last4":"6789","amount_cents":12500,"settlement_date":"20251029","company_id":"ACMEPAY001","file_id":"FILE_20240817_A","batch_id":"BATCH_0007","discretionary_data":"PAY_9f12"}'
]
puts "Processing ACH returns...\n\n"
results = process_return_file(
source: 'SFTP_BANK_X',
filename: 'RETURNS_20251029.ndjson',
lines: sample_returns
)
puts "Results:"
puts " Processed: #{results[:processed]}"
puts " Matched: #{results[:matched]}"
puts " Needs Review: #{results[:needs_review]}"
puts " Duplicates Skipped: #{results[:duplicates]}"
puts "\n"
puts "Return Cases:\n"
$return_cases.each do |rc|
puts " Case ##{rc[:id]}:"
puts " Status: #{rc[:status]}"
puts " Rationale: #{rc[:rationale]}"
puts " Identity Quality: #{rc[:identity_quality]}"
puts " Confidence: #{rc[:match_confidence]}"
puts " Matched Entry: #{rc[:matched_entry_id] || 'none'}"
puts " Candidates: #{rc[:candidates] ? rc[:candidates].join(', ') : 'n/a'}"
puts " Parse Errors: #{rc[:parse_errors].empty? ? 'none' : rc[:parse_errors].join(', ')}"
puts " File ID: #{rc[:file_id] || 'MISSING'}"
puts " Batch ID: #{rc[:batch_id] || 'MISSING'}"
puts " Trace: #{rc[:trace_number] || 'MISSING'}"
puts " Last4: #{rc[:account_last4] || 'MISSING'}"
puts " Amount: #{rc[:amount_cents] || 'MISSING'}"
puts " Discretionary: #{rc[:discretionary_data] || 'MISSING'}"
puts ""
end
Run it
ruby ach_return_matcher.rb
Expected outcomes:
- Exact trace match → matched (confidence 1.0, identity_quality strong)
- batch_id narrowing + entry evidence → matched (confidence 0.95, identity_quality medium)
- corrupt trace + missing amount → needs_review (identity_quality medium/weak depending on batch context)
- fully stripped return → needs_review (identity_quality none)
- duplicate payload → skipped (ingest idempotency)
Practical Thresholds & Defaults (so engineers don’t improvise)
Suggested auto-action threshold
-
Auto-action allowed:
- Tier 0 (payment identifier): confidence = 1.0 (strong)
- Tier 1 (batch identifier + unique within batch): confidence ≥ 0.95 (medium)
- Tier 2 (batch header + entry evidence): confidence ≥ 0.85 only if exactly one candidate and passes recurrence guardrail
-
Manual review required: confidence < 0.85 or candidates > 1 or identity_quality is weak/none or recurrence guardrail trigger
- Auto-action allowed: confidence ≥ 0.85 and exactly one candidate and identity_quality in (strong, medium)
- Manual review required: confidence < 0.85 or candidates > 1 or identity_quality is weak/none
7–10 banking-day recurrence guardrail (make it policy, not folklore)
If a payment is marked recurring (subscriptions/tuition/payroll-like patterns), do not auto-match a return to a candidate whose effective date is within 7–10 banking days of “today” (or within your normal settlement/return latency window). Route to review unless you have a Tier 0 identifier .
Validation & Monitoring
Minimal validation checks
- Duplicate ingest: same payload twice → only one raw event, only one case
- Corrupt JSON → case created with
invalid_json - Invalid trace → trace nulled + parse error recorded
- Multiple candidates → must be
needs_review - Stripped feed → identity_quality none → must be
needs_review - Recurring within 7–10 banking days → must be
needs_reviewunless Tier 0 match
Minimal metrics
% matched at confidence >= 0.95(Tier 1+) and% matched at 1.0(Tier 0)% needs_reviewand% identity_quality=none% recurrence_cooldown_window- duplicates skipped
- top parse error codes
❗ Warning : Alert on sudden spikes in identity_quality=none or parse errors—this often indicates a provider/export format change
Takeaways
- Raw evidence is sacred. If you can’t prove what you received, you can’t debug or defend it.
- The transmission system of record matters. If the return feed is stripped, what you sent is often the only reliable starting point.
- Heuristics are not identity. At scale, last4 collisions are guaranteed.
- Idempotency is two-layer. Ingest and ledger actions are different problems.
- Never “fix” corruption. Your system should become more conservative as data quality declines.
Next steps
- Add an immutable raw return store (even if it’s just an append-only table/S3 bucket + pointer).
- Ensure you have a transmission system of record (file hashes + entry evidence at send-time).
- Implement Tier 0/1/2 matching + recurrence guardrail.
- Ship a manual review lane before enabling auto-actions.
- Start embedding a discretionary/addenda correlation handle in outbound entries.
Acronyms & Definitions
- ACH : Automated Clearing House
- ODFI : Originating Depository Financial Institution
- RDFI : Receiving Depository Financial Institution
- System of record (SoR) : The authoritative, immutable evidence store for “what we received” and “what we sent”
- Accounting ledger : The financial book of record where reversals/adjustments are applie
References
- Nacha ACH Volume Stats - NACHA ACH Volume Statistics, 2024
Comments & Discussion
Share your thoughts, ask questions, or start a discussion about this article.