Open to new opportunities

Aaryaman Katoch

Building resilient infrastructure that never sleeps.

Infrastructure, Site Reliability & DevSecOps Engineer who turns infrastructure into a competitive advantage.
4x Google Cloud Certified. Multi-cloud. Equal parts architect, advisor, and operator.

0+ Years in Production
0x GCP Certified
0% Threats Stopped
0% Cloud Costs Saved
Scroll

The engineer behind the uptime

I'm the person teams calls when production needs to be bulletproof. Over the past 5+ years, I've built and hardened cloud platforms across GCP, AWS, and Azure for enterprises that can't afford outages — healthcare systems under HIPAA, financial services needing 99.99% availability, and fast-scaling startups breaking through traffic ceilings.

My day-to-day sits at the intersection of security, reliability, and velocity. I architect infrastructure that self-heals, write pipelines that ship code safely in minutes, and build observability stacks that catch issues before customers notice. When something goes wrong at 3 AM, I'm the one writing the blameless postmortem the next morning — and making sure it never happens again.

I'm equally comfortable in a customer-facing room as I am in a terminal. I've led technical discovery workshops, designed reference architectures for C-suite stakeholders, and translated business pain points into cloud solutions that actually ship. I've directly driven $5M+ in signed engagements from proof-of-concept through production go-live — including FinOps initiatives that optimized $2M+ in annual cloud spend by 30-40% through rightsizing, committed use discounts, spot/preemptible workloads, and idle resource cleanup.

I hold a Master's in Computer Science from Stevens Institute of Technology, backed by four Google Cloud Professional certifications spanning Architecture, DevOps, Data Engineering, and Network Engineering. I've also mentored and grown an entire cohort of junior engineers into confident L2 contributors.

Security as Code

Compliance isn't a checkbox — it's woven into every Terraform module, every pipeline gate, and every IAM policy I write.

Toil Killer

If I do it twice, I automate it. GitOps workflows, SOAR playbooks, self-healing infra — manual is the enemy of reliable.

Force Multiplier

Grew 6 junior engineers into promoted L2s through pairing, code reviews, and blameless RCA culture. Teams I touch ship faster.

Cloud FinOps

Optimized $2M+ in annual cloud spend through rightsizing, committed use discounts, spot/preemptible workloads, and idle resource cleanup — delivering 30-40% cost reduction without SLA degradation.

Trusted Technical Advisor

Led discovery workshops, designed reference architectures for C-suite stakeholders, and drove $5M+ in signed engagements from proof-of-concept through production go-live.

How I think about reliability

01

SLOs Over Gut Feelings

Every system I own has clearly defined Service Level Objectives. Error budgets drive release decisions, not hunches. When the budget is healthy, we ship aggressively. When it's burning, we pause and stabilize.

02

Observability is Non-Negotiable

You can't fix what you can't see. I instrument everything — metrics, traces, structured logs; and build dashboards that tell a story. The goal is to detect anomalies before they become incidents, and resolve incidents before they become outages.

03

Automate the Boring, Own the Hard

Toil is the tax on your team's creativity. I relentlessly automate repetitive operational work — deployments, certificate rotations, scaling decisions, remediation — so engineers can focus on building, not babysitting.

04

Security Shifts Left, Not Bolted On

Security controls belong in the CI pipeline, not in a quarterly review. Policy-as-code, least-privilege IAM, encrypted-by-default, vulnerability scanning at build time — hardening happens before merge, not after breach.

05

Blameless Postmortems, Always

Incidents are learning opportunities, not blame games. Every outage gets a structured RCA focused on systemic causes — what controls failed, what monitoring gaps existed, and what changes prevent recurrence.

06

Infrastructure as Code or It Doesn't Exist

If it's not in a Terraform plan or a Helm chart, it's a liability. Reproducible, version-controlled, peer-reviewed infrastructure is the only kind I trust in production.

07

Every Dollar Should Earn Its Keep

Cloud spend without visibility is just waste. I build FinOps practices into every engagement — rightsizing underutilized resources, leveraging committed use discounts, scheduling non-critical workloads, and setting up billing alerts that keep stakeholders informed before costs spiral.

08

Start with the Customer's Problem

The best architecture starts with listening, not building. I invest time understanding a customer's business constraints, compliance landscape, and growth trajectory before touching a single config file. Technical excellence means nothing if it doesn't solve the real problem.

Building reliability from the ground up

Senior Cloud Security & Reliability Engineer

Searce Inc

Houston, TX Jul 2025 — Present
  • Architected an Agentic AI-powered cloud migration platform that autonomously discovers, assesses, and migrates AWS, Azure, and on-prem workloads to Google Cloud, generating production-ready Terraform IaC and reducing multi-million-dollar migration timelines by 50% while hardening security posture and compliance readiness by 40%.
  • Spearheaded enterprise-wide HIPAA and SOC 2 compliance transformation for healthcare clients — implementing Zero Trust architecture, centralized IAM with least-privilege RBAC, encryption at rest and in transit, and automated audit logging. Achieved 100% audit readiness with zero critical findings across three consecutive annual assessments.
  • Engineered fully automated CI/CD pipelines (Jenkins, Argo CD, SonarQube, JFrog Artifactory) with SAST/DAST security gates, container image signing, and GitOps-driven promotion across dev/staging/prod Kubernetes clusters — cutting release cycles from 2 weeks to 2 days with zero-downtime deployments.
  • Optimized $2M+ annual cloud spend across enterprise accounts through FinOps practices — rightsizing, committed use discounts, spot/preemptible workloads, idle resource cleanup, and real-time billing alerts — delivering sustained 30–40% cost reduction without SLA degradation.
  • Championed pre-sales and solution architecture as the primary technical advisor to C-suite stakeholders, leading discovery workshops, designing reference architectures, and driving $5M+ in signed engagements from initial proof-of-concept through production go-live and ongoing optimization.
Agentic AITerraformHIPAASOC 2Zero TrustJenkinsArgo CDKubernetesGitOpsFinOpsSAST/DAST

Cloud Reliability Engineer

Searce Inc

Houston, TX May 2023 — Jun 2025
  • Deployed a cloud-native intrusion prevention system (IPS) using NGFW appliances and Terraform IaC, integrating threat intelligence feeds, auto-remediation workflows, and policy-as-code enforcement to block 99% of malicious traffic with zero false positives in production.
  • Established a blameless postmortem culture and authored incident response runbooks aligned with SRE best practices, reducing MTTR by 35% and improving system availability to 99.95% through structured root cause analysis and preventive action tracking.
  • Mentored six junior engineers in cloud architecture, security best practices, and Terraform module development through pairing sessions and code reviews — promoting all to L2 within 12 months and improving team delivery velocity by 20%.
IPS/IDSNGFWTerraformPolicy-as-CodeSREPostmortemsMentoring

Cloud Engineer

Searce Cosourcing Services Pvt. Ltd.

Pune, India Jan 2021 — Jul 2022
  • Designed and delivered 10+ proof-of-concept architectures on AWS & Azure, presenting technical feasibility and cost analysis to C-suite stakeholders — converting 5 prospects into signed engagements and growing the sales pipeline by 10%.
  • Conducted technical discovery sessions with prospective customers to map existing workloads, identify migration blockers, and scope end-to-end solution architectures — bridging the gap between sales and engineering delivery teams.
  • Configured continuous integration via GitHub Actions for tag-based and scheduled deployments with automated testing and artifact publishing, improving deployment efficiency by 70% and integrating real-time status notifications to Slack.
  • Triaged production incidents through PagerDuty on-call rotation, performing real-time diagnostics with log correlation and metric analysis, coordinating cross-team response, and documenting root causes to prevent recurrence.
AWSAzureGitHub ActionsPagerDutyPre-SalesSolution ArchitectureOn-Call

Things I've built and shipped

Side projects & academic exploration

Search Engine with Elasticsearch

Built a full-text search engine backed by Elasticsearch and Kibana, with custom analyzers, relevance tuning, and a query interface. Applied text mining techniques to optimize result ranking.

ElasticsearchKibanaText Mining

NLP Tag Ranker (BERT)

Fine-tuned a BERT-based language model to automatically rank and predict relevant tags for programming questions — turning unstructured text into structured, searchable metadata.

BERTNLPPythonTransformers

Topic Discovery with pLSA

Implemented Probabilistic Latent Semantic Analysis with the EM algorithm to identify underlying programming languages in synthetic code snippets — a probabilistic approach to code classification.

pLSAEM AlgorithmNLPPython

TF-IDF vs BM25 Benchmark

Compared TF-IDF and BM25 retrieval algorithms on the large-scale LinkSO community Q&A dataset, analyzing precision, recall, and ranking quality for question-answer similarity matching.

PythonJupyterInformation Retrieval

Airline Booking Platform

Full-stack flight booking web app with search, seat selection, and reservation management. Built with vanilla JavaScript on the frontend and MongoDB for persistent storage.

JavaScriptMongoDBBootstrapNode.js

Web Music Player

Browser-based music player with playlist management, playback controls, and a responsive UI — a clean exercise in DOM manipulation and audio API integration.

HTMLCSSJavaScript

Flight Departure Widget

Real-time flight departure board powered by a third-party API from RapidAPI. Displays live departure data with a clean, airport-style interface.

Node.jsREST APIHTML/CSS

Student Results DBMS

Database-driven web app for managing student records and academic results — featuring CRUD operations, search, and reporting. Built with Java, MySQL, and a web frontend.

JavaMySQLJavaScriptHTML/CSS

Technologies in my daily rotation

Cloud & Infrastructure

GCPAWSAzureTerraformAnsibleDockerPodmanKubernetesHelmIstioVMwareLinuxBare MetalHybrid CloudGPU ClustersVagrantPackerPXE Provisioning

Security

SIEM / SOARNGFWIDS / IPSUEBAEDRXDRWAFDLPNessusQualysWiresharkZero TrustRBACLDAP / SSOSecrets ManagementAPI Key GovernanceNetwork SegmentationMITRE ATT&CKOWASP

Compliance & Governance

HIPAASOC 2NISTPCI-DSSISO 27001GDPRYARA-LPolicy-as-CodeCIS BenchmarksIAMITILChange Management

CI/CD & Automation

JenkinsArgo CDGitHub ActionsGitLab CIBitrise CITektonSonarQubeSkaffoldJFrog ArtifactoryMLOps

Observability & Monitoring

PrometheusGrafanaELK StackLokiFluentdPagerDutyDatadogSplunkAlertmanagerLookerOpenTelemetrySLOs / SLIsCapacity Planning

Databases & Storage

PostgreSQLMySQLMongoDBRedisCassandraDynamoDBNeo4jInfluxDBElasticsearch

Languages & Scripting

PythonGoJavaScriptTypeScriptBash / ShellPowerShellHCL (Terraform)GroovyYAMLSQL

Networking

TCP/IPUDPDNSDHCPHTTP/HTTPSTLS / mTLSLoad BalancingVPNCDNVPCNATService MeshBGPOSPFRoCEv2InfiniBandFirewall Rules

Web & Frameworks

ReactNode.jsExpress.jsREST APISpring BootHTML5CSS3BootstrapNPMTomcat

AI/ML & Platform Ops

AI PlatformsGPU ComputeNVIDIA / CUDAMCP ServersModel ServingInference ServicesMLflowKubeflowRunbooksPlaybooks

4x Google Cloud Professional Certified

Professional Cloud Architect

Google Cloud

Designing and governing scalable, resilient, and secure cloud solutions end-to-end

Professional DevOps Engineer

Google Cloud

Building and operating continuous delivery systems with site reliability best practices

Professional Data Engineer

Google Cloud

Architecting data pipelines, processing systems, and machine learning workflows

Professional Network Engineer

Google Cloud

Engineering robust, secure, and high-performance network architectures at scale

Where it started

M.S.

Master of Science in Computer Science

Stevens Institute of Technology

Hoboken, NJ

Class of 2024

Cloud Computing, Database Systems, Text Mining & NLP, Web Development

B.E.

Bachelor of Engineering in Computer Science

RV College of Engineering

Bengaluru, India

Class of 2021

Operating Systems, Algorithms, UNIX, Parallel Programming, Database Design

Let's talk

Looking for my next challenge in SRE, DevSecOps, Customer Engineering, Platform Engineering or Solutions Architecture. I love helping customers solve hard infrastructure problems — whether that's in a pre-sales room or a production war room.