Open to new opportunities

Aaryaman Katoch

Building resilient infrastructure that never sleeps.

Cloud Security & Site Reliability Engineer who turns infrastructure into a competitive advantage.
4x Google Cloud Certified. Multi-cloud. Equal parts architect, advisor, and operator.

0+ Years in Production
0x GCP Certified
0% Threats Stopped
0% Cloud Costs Saved
Scroll

The engineer behind the uptime

I'm the person teams calls when production needs to be bulletproof. Over the past 5+ years, I've built and hardened cloud platforms across GCP, AWS, and Azure for enterprises that can't afford outages — healthcare systems under HIPAA, financial services needing 99.99% availability, and fast-scaling startups breaking through traffic ceilings.

My day-to-day sits at the intersection of security, reliability, and velocity. I architect infrastructure that self-heals, write pipelines that ship code safely in minutes, and build observability stacks that catch issues before customers notice. When something goes wrong at 3 AM, I'm the one writing the blameless postmortem the next morning — and making sure it never happens again.

I'm equally comfortable in a customer-facing room as I am in a terminal. I've led technical discovery workshops, built and presented proof-of-concept architectures to C-suite stakeholders, and translated business pain points into cloud solutions that actually ship. I've directly contributed to closing pre-sales engagements by demonstrating architecture viability and quantifying ROI — including cloud spend optimization that saved customers 30-40% on annual infrastructure costs through rightsizing, committed use discounts, and workload scheduling.

I hold a Master's in Computer Science from Stevens Institute of Technology, backed by four Google Cloud Professional certifications spanning Architecture, DevOps, Data Engineering, and Network Engineering. I've also mentored and grown an entire cohort of junior engineers into confident L2 contributors.

Security as Code

Compliance isn't a checkbox — it's woven into every Terraform module, every pipeline gate, and every IAM policy I write.

Toil Killer

If I do it twice, I automate it. GitOps workflows, SOAR playbooks, self-healing infra — manual is the enemy of reliable.

Force Multiplier

Grew 6 junior engineers into promoted L2s through pairing, code reviews, and blameless RCA culture. Teams I touch ship faster.

Cloud FinOps

Rightsizing, committed use discounts, workload scheduling, and billing analysis. I've cut customer cloud spend by 30-40% without sacrificing performance.

Trusted Technical Advisor

Led discovery workshops, presented POC architectures to C-suite, and translated business pain into cloud solutions that close deals and deliver results.

How I think about reliability

01

SLOs Over Gut Feelings

Every system I own has clearly defined Service Level Objectives. Error budgets drive release decisions, not hunches. When the budget is healthy, we ship aggressively. When it's burning, we pause and stabilize.

02

Observability is Non-Negotiable

You can't fix what you can't see. I instrument everything — metrics, traces, structured logs; and build dashboards that tell a story. The goal is to detect anomalies before they become incidents, and resolve incidents before they become outages.

03

Automate the Boring, Own the Hard

Toil is the tax on your team's creativity. I relentlessly automate repetitive operational work — deployments, certificate rotations, scaling decisions, remediation — so engineers can focus on building, not babysitting.

04

Security Shifts Left, Not Bolted On

Security controls belong in the CI pipeline, not in a quarterly review. Policy-as-code, least-privilege IAM, encrypted-by-default, vulnerability scanning at build time — hardening happens before merge, not after breach.

05

Blameless Postmortems, Always

Incidents are learning opportunities, not blame games. Every outage gets a structured RCA focused on systemic causes — what controls failed, what monitoring gaps existed, and what changes prevent recurrence.

06

Infrastructure as Code or It Doesn't Exist

If it's not in a Terraform plan or a Helm chart, it's a liability. Reproducible, version-controlled, peer-reviewed infrastructure is the only kind I trust in production.

07

Every Dollar Should Earn Its Keep

Cloud spend without visibility is just waste. I build FinOps practices into every engagement — rightsizing underutilized resources, leveraging committed use discounts, scheduling non-critical workloads, and setting up billing alerts that keep stakeholders informed before costs spiral.

08

Start with the Customer's Problem

The best architecture starts with listening, not building. I invest time understanding a customer's business constraints, compliance landscape, and growth trajectory before touching a single config file. Technical excellence means nothing if it doesn't solve the real problem.

Building reliability from the ground up

Senior Cloud Security & Reliability Engineer

Searce Inc

Houston, TX Jul 2025 — Present
  • Architected a Gen AI-powered migration engine that reads existing AWS, Azure, and on-prem workloads and generates production-ready Terraform for Google Cloud — cutting migration timelines on multi-million-dollar engagements and tightening security posture by 40%.
  • Owned HIPAA compliance end-to-end for a healthcare enterprise: mapped regulatory requirements to technical controls, locked down IAM policies, encrypted data paths, and delivered a clean audit with zero findings.
  • Stood up a full GitOps delivery platform — Jenkins builds, Argo CD reconciliation, Kubernetes rollouts — that took release cycles from days to minutes and eliminated deployment-related incidents.
  • Led cloud cost optimization reviews for enterprise clients — identified idle resources, recommended committed use discounts, and implemented workload scheduling policies that cut monthly GCP spend by 30-40% across multiple accounts.
  • Served as the primary technical point of contact for key customer engagements, running architecture workshops, presenting solution designs to executive stakeholders, and driving technical decisions from discovery through production go-live.
Gen AITerraformHIPAAJenkinsArgo CDKubernetesGitOpsFinOpsCustomer Advisory

Cloud Reliability Engineer

Searce Inc

Houston, TX May 2023 — Jun 2025
  • Stood up a cloud-native IPS layer with next-gen firewalls, threat intel feeds, and auto-remediation workflows — all defined in Terraform. Result: 99% of malicious traffic blocked with a zero false-positive rate across production.
  • Took ownership of a 6-person junior cohort: ran pairing sessions, led code reviews, and facilitated blameless RCAs. All six promoted to L2 within the year, and the team's throughput jumped 20%.
IPS/IDSNGFWTerraformPolicy-as-CodeMentoringRCA

Cloud Engineer

Searce Cosourcing Services Pvt. Ltd.

Pune, India Jan 2021 — Jul 2022
  • Ran 10+ proof-of-concept builds on AWS and Azure that directly converted into 5 signed client engagements — presenting technical feasibility to C-suite stakeholders, fielding architecture questions, and building confidence that drove deal closure.
  • Conducted technical discovery sessions with prospective customers to map their existing workloads, identify migration blockers, and scope solution architectures — acting as the bridge between sales teams and engineering delivery.
  • Built tag-based CI pipelines with GitHub Actions that automated build, test, and deploy workflows — slashing manual deployment effort by 70% and piping status updates straight to Slack.
  • Joined the on-call rotation early in my career, triaging production incidents via PagerDuty and sharpening my instincts for incident response under pressure.
AWSAzureGitHub ActionsPagerDutyPre-SalesSolution ArchitectureOn-Call

Things I've built and shipped

Side projects & academic exploration

Search Engine with Elasticsearch

Built a full-text search engine backed by Elasticsearch and Kibana, with custom analyzers, relevance tuning, and a query interface. Applied text mining techniques to optimize result ranking.

ElasticsearchKibanaText Mining

NLP Tag Ranker (BERT)

Fine-tuned a BERT-based language model to automatically rank and predict relevant tags for programming questions — turning unstructured text into structured, searchable metadata.

BERTNLPPythonTransformers

Topic Discovery with pLSA

Implemented Probabilistic Latent Semantic Analysis with the EM algorithm to identify underlying programming languages in synthetic code snippets — a probabilistic approach to code classification.

pLSAEM AlgorithmNLPPython

TF-IDF vs BM25 Benchmark

Compared TF-IDF and BM25 retrieval algorithms on the large-scale LinkSO community Q&A dataset, analyzing precision, recall, and ranking quality for question-answer similarity matching.

PythonJupyterInformation Retrieval

Airline Booking Platform

Full-stack flight booking web app with search, seat selection, and reservation management. Built with vanilla JavaScript on the frontend and MongoDB for persistent storage.

JavaScriptMongoDBBootstrapNode.js

Web Music Player

Browser-based music player with playlist management, playback controls, and a responsive UI — a clean exercise in DOM manipulation and audio API integration.

HTMLCSSJavaScript

Flight Departure Widget

Real-time flight departure board powered by a third-party API from RapidAPI. Displays live departure data with a clean, airport-style interface.

Node.jsREST APIHTML/CSS

Student Results DBMS

Database-driven web app for managing student records and academic results — featuring CRUD operations, search, and reporting. Built with Java, MySQL, and a web frontend.

JavaMySQLJavaScriptHTML/CSS

Technologies in my daily rotation

Cloud & Infrastructure

Google Cloud (GCP)AWSAzureTerraformAnsibleDockerKubernetesHelmIstioGKECloud FunctionsPub/SubVMwareLinux

CI/CD & DevOps

JenkinsArgo CDGitHub ActionsGitLab CIJFrog ArtifactorySkaffoldBitrise CISonarQubeGitJira

Security & Compliance

SIEM / SOARIDS / IPSNGFWIAMHIPAASOC2YARA-LPolicy-as-CodeSecurity Command CenterBindPlaneZero TrustVulnerability ScanningOpenVPNWireshark

Observability & Monitoring

GrafanaPrometheusElasticsearchLogstashKibanaPagerDutyCloud MonitoringLookerAlertmanager

Databases & Storage

PostgreSQLMySQLMongoDBRedisCloud SQLFirestoreBigQueryCloud Storage

Languages & Scripting

PythonGoJavaScriptTypeScriptBash / ShellHCL (Terraform)YAMLSQL

Networking

TCP/IPUDPDNSHTTP/HTTPSTLS / mTLSLoad BalancingVPNCDNVPCFirewall Rules

Web & Frameworks

ReactNode.jsExpress.jsREST APISpring BootHTML5CSS3BootstrapNPMTomcat

4x Google Cloud Professional Certified

Professional Cloud Architect

Google Cloud

Designing and governing scalable, resilient, and secure cloud solutions end-to-end

Professional DevOps Engineer

Google Cloud

Building and operating continuous delivery systems with site reliability best practices

Professional Data Engineer

Google Cloud

Architecting data pipelines, processing systems, and machine learning workflows

Professional Network Engineer

Google Cloud

Engineering robust, secure, and high-performance network architectures at scale

Where it started

M.S.

Master of Science in Computer Science

Stevens Institute of Technology

Hoboken, NJ

Class of 2024

Cloud Computing, Database Systems, Text Mining & NLP, Web Development

B.E.

Bachelor of Engineering in Computer Science

RV College of Engineering

Bengaluru, India

Class of 2021

Operating Systems, Algorithms, UNIX, Parallel Programming, Database Design

Let's talk

Looking for my next challenge in SRE, DevSecOps, Customer Engineering, Platform Engineering or Solutions Architecture. I love helping customers solve hard infrastructure problems — whether that's in a pre-sales room or a production war room.