Skip to the content.
Systems Series Part 5

Why Apps Fail in Production (and What ACH Teaches Us)

Whether you’re using Rails, Django, or Spring Boot, production failures rarely come from syntax errors—they come from timeouts, retries, duplicates, and slow dependencies that integration tests miss.

Suma Manjunath
Author: Suma Manjunath
Published on: August 30, 2025

Why Apps Fail in Production (and What ACH Teaches Us)

Audience Backend engineers, Rails developers, fintech architects
Reading Time: 14 minutes
Prerequisites: Rails app in production, basic ACH/payment knowledge
Why now: ACH volumes keep growing (29.1B payments in 2023). Rails teams building financial systems face outages not from syntax errors, but from distributed system realities.

TL;DR:

⚠️ Disclaimer: All scenarios, accounts, names, and data used in examples are not real. They are realistic scenarios provided only for educational and illustrative purposes.

Problem Definition

The challenge: Rails fintech apps pass tests, but fail in production when ACH payments introduce timeouts, duplicates, and slow dependencies.

Who faces this: Rails teams integrating payment flows at banks, lenders, payroll providers, and utilities.

Cost of inaction: Customer trust erosion, regulatory exposure, financial loss. Example: a payroll system resubmits duplicate debit files — employees get double-paid, compliance auditors get involved.

Why current solutions fail: Testing focuses on correctness of code, not resilience against network partitions, retries, and eventual consistency.

Reliability Fundamentals (Quick Primer)

These define the CAP theorem. ACH systems favor availability + partition tolerance over strict consistency. Example: users see “Payment Submitted” while actual settlement occurs hours later.

ℹ️ Note: These aren’t just theory — they are written into contracts, regulations, and customer trust.

Common Failure Modes in Rails ACH Apps

1. Network Instability

Warning: A single timeout on ACH API calls can block payroll for thousands of employees.

require 'net/http'
require 'uri'

uri = URI("https://ach-provider.example.com/debits")
payload = { routing_number: "061000052", account_number: "123456789", amount_cents: 12500 }.to_json

attempts = 0
begin
  response = Net::HTTP.post(uri, payload, "Content-Type" => "application/json")
  raise "Non-200 response: #{response.code}" unless response.code == "200"
  puts "✅ Success: #{response.body}"
rescue => e
  attempts += 1
  sleep(2**attempts) # exponential backoff: 2s, 4s, 8s
  retry if attempts < 3
  puts "❌ Failure after retries: #{e.message}"
end

2. Duplicate Submissions

💡 Tip: Always use idempotency keys or database uniqueness constraints.

payment = Payment.create!(
  idempotency_key: SecureRandom.uuid,
  routing_number: "061000052",
  account_number: "123456789",
  amount_cents: 12500,
  status: "submitted"
)

3. Slow Dependencies

One slow bank ties up Puma threads.

Solution: Circuit breakers + connection pool limits.

require 'circuitbox'

circuit = Circuitbox.circuit(:ach_api, exceptions: [Timeout::Error], sleep_window: 60)

response = circuit.run do
  Net::HTTP.post(uri, payload, "Content-Type" => "application/json")
end

if response.nil?
  puts "❌ ACH service unavailable — circuit open"
end

4. Poor Error Messaging

ACH return codes are specific. Don’t show “Something went wrong.”

Map codes → actionable messages.

RETURN_CODES = {
  "R01" => "Insufficient funds",
  "R02" => "Bank account closed",
  "R03" => "No account / unable to locate",
  "R29" => "Corporate customer refused"
}

def user_message(code)
  RETURN_CODES[code] || "Unexpected error — please contact support."
end

5. Operational Blind Spots

ℹ️ Note: If you don’t track retries, queue length, and settlement times, you are flying blind.

Use Prometheus/Grafana:

Example: Rails ACH Payment Flow

flowchart TD
  U["User (Customer Submits Payment)"] --> R["RailsApp (Payment Controller)"]
  R --> A["ACHProvider (Third-Party API Gateway)"]
  A --> B["Bank (ODFI - Originating Depository Financial Institution)"]
  B --> N["ACH Network (Clearing & Settlement)"]
  N --> B
  B --> A
  A --> R
  R --> U["User (Receives Payment Status)"]

  %% Failure Points
  R -. Timeout/Error .-> A
  A -. Retry/Duplicate Risk .-> B
  B -. Settlement Delay (1-2 days) .-> N

Recovery Procedures (When Things Go Wrong)

1. ACH File Corruption

Warning: Corrupted ACH files are rejected at the ODFI level.

Recovery: Validate checksums before submission; regenerate only if mismatch.

require 'digest'

file_path = "/tmp/ach_batch.ach"
checksum = Digest::SHA256.file(file_path).hexdigest

if checksum != expected_checksum
  puts "❌ File corruption detected, regenerating batch..."
  regenerate_ach_file(file_path)
else
  puts "✅ ACH file validated, safe to submit."
end

2. Partial Batch Failures

ℹ️ Note: A 1,000-entry file can partially fail (950 succeed, 50 rejected).

Recovery: Parse return codes, retry failed subset only, maintain per-entry queue.

3. Duplicate Submissions

💡 Tip: Never rollback completed ACH entries. Instead:

Compliance Considerations (Reliability in a Regulated Context)

1. Audit Trails

Every ACH submission must have a complete log (trace, retries, user ID).

AuditLog.create!(
  event: "ACH_SUBMISSION",
  trace_number: payment.trace_number,
  user_id: payment.user_id,
  retry_count: payment.retries,
  timestamp: Time.now.utc
)

2. Data Retention

ACH records must be stored 2+ years, often in immutable storage.

3. Error Reporting

Material incidents (delays, duplicates) may require reporting to ODFI/NACHA within 24 hours.

Warning: Noncompliance risks fines, audits, and loss of ACH privileges.

Lessons From Release It! Applied to ACH

Validation & Monitoring

Test cases:

📊 Success metrics:

Failure modes:

Key Takeaways

Next Steps

  1. Implement circuit breakers in your Rails payment flows.
  2. Add recovery handling for corrupted/partial batches.
  3. Build immutable audit logs and compliance runbooks.
  4. Review SLA + error budgets with your compliance officer.

📖 Acronyms & Terms

📚 References


Comments & Discussion

Share your thoughts, ask questions, or start a discussion about this article.