Arun Sunderraj Legolasan

👋 Hi, I'm Arun Sundararajan

Product-Oriented Technical Operations Leader with 10+ years operating at the intersection of Support, Engineering, and Product within cloud-based data platforms.

I translate production failure patterns and enterprise escalations into structured product improvements. I care about how systems actually behave in the real world, not just how they look in architecture diagrams.

🎯 What Drives Me

I operate in the "messy middle" between customers and code:

🔍 Pattern Recognition – Spotting recurring failure modes across 1000+ customers and converting them into reliability initiatives
🛠 Product Thinking – Translating support escalations into actionable roadmap items that prevent future issues
📊 Data-Driven Decisions – Using production telemetry, MTTR trends, and customer impact data to prioritize engineering work
🌉 Bridge Building – Aligning Support, SRE, Product, and Engineering teams around shared reliability goals

TL;DR: I turn "why did this break?" into "how do we make sure it never breaks again?"

🔧 What I Work On

🏗 Data Engineering & ELT Platforms

Distributed data pipelines (ingestion, transformation, destinations)
Source connector reliability (PostgreSQL, MySQL, SaaS APIs)
CDC, schema evolution, offset management
Handling real-world edge cases: rate limits, network partitions, zombie connections

🚨 Support Operations & Incident Management

24x7 global support operations for cloud ELT systems
P0/P1 production incident command & resolution
SLA/MTTR optimization through automation and process improvements
Enterprise escalation management

🧰 Automation & Tooling

AI-assisted ticket classification & RCA extraction
Operational dashboards & observability improvements
Knowledge base deflection strategies
Self-service diagnostic tools

🎓 Knowledge Sharing

Documentation, code snippets, utilities
Real-world debugging scenarios
Lessons from production incidents

🧠 Technical Toolkit

Languages & Scripting

Python – automation, APIs, data processing, PySpark
SQL – PostgreSQL, MySQL, Snowflake, Redshift, BigQuery
Bash – scripting, operational glue, incident response

Data Infrastructure

Databases: PostgreSQL, MySQL, Snowflake, Redshift
Streaming: Kafka, Debezium, CDC patterns
Cloud Platforms: AWS, GCP, Azure environments
ELT Tools: Experience debugging distributed ingestion systems

APIs & Integrations

REST APIs, OAuth flows, webhook systems
Rate limiting, pagination, retry strategies
Third-party connector troubleshooting (20+ integrations)

Operations

Docker, CI/CD pipelines
Incident management frameworks
RCA documentation & post-mortem culture
Observability & monitoring strategies

🛠 Selected Work & Experiments

📊 Support Operations Analytics

Automated ticket classification using LLMs
RCA pattern extraction across 10,000+ production incidents
Integration with Google Sheets for stakeholder reporting

🔄 ELT Pipeline Debugging

Source connector edge case handling (auth failures, schema drift, CDC lag)
Data consistency validation across sources and destinations
API rate limit & network timeout resilience patterns

🤖 AI-Powered Tooling

Ticket summarization & issue categorization
Focus on deterministic outputs and guardrails (not "magic")
Reducing support engineer toil through intelligent automation

🏗 Infrastructure Reliability

Production debugging: timeouts, data loss, retry storms
Root-cause analysis over symptom firefighting
Implementing preventive measures based on failure patterns

🧭 Engineering Philosophy

Simple > Clever
Observability before optimization
Evidence over hype
Root causes over symptoms
Pragmatism over perfection

Core Beliefs:

Systems fail in ways you didn't anticipate. Plan for it.
The best feature is the one that prevents customer pain.
Support engineers see patterns product teams don't. Listen to them.
Reliability is a product feature, not just an SRE concern.
Good documentation prevents more incidents than good code.

📈 GitHub Stats

📬 Connect With Me

🌐 Website: legolasan.in
💼 LinkedIn: linkedin.com/in/arunsunderraj
📧 Email: arunsunderraj@outlook.com
📍 Location: Bengaluru, India

"The best code is the code that never had to be written because you fixed the root cause."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly