/* Hide Spotify link by default */
#1 SRE PODCAST

Season 3 – Now Streaming

Episode 1 is live! Join our new host, Jim Hirschauer, as we explore the future of AI-native incident response. Listen on all your favourite platforms

Available on all your favourite platforms

Episodes

S
3
.
e
1
New

The Zenduty Journey, AI-Native Response, and a New Host

Reliability is about fixing things, not just resolving them. In this season premiere, we take a trip down memory lane with Vishwa to uncover the story behind Zenduty and how the "Incidentally Reliable" podcast began. Jim and Vishwa discuss the transition to Xurrent, the "needle in the haystack" problem in modern observability, and why culture—not just code—is the key to true reliability.

S
2
.
e
4
New

Once an SRE, always an SRE

In this episode, Sudarshan shares his experience leading high-performing SRE and infrastructure teams at Rippling, Twilio, Walmart, and Epsilon. He talks about reducing CI/CD costs by 60 percent, cutting on-call alerts by 65 percent, and the mindset required to build resilient systems.

S
2
.
e
3

CTRL + ALT + Scale: Building More Than Just Code

In this episode, Madhu Rawat (CTO, Xurrent) sits down with Sakshi — Co-founder and Head of Engineering at Kapstan, with leadership experience at Sumo Logic and UpGrad. They discuss the evolution of observability, building for scale, the role of AI in incident management, and what it means to lead engineering teams through change.

S
2
.
e
2

Redefining ITxM with Zenduty x Xurrent

In this episode, Phil (CPO) and Madhu (CTO) from Xurrent sit down with Vishwa and Ankur from Zenduty to talk about ITxM, building for reliability across teams, and how product and platform thinking come together in real-world incident workflows.

S
2
.
e
1

From Cart Failures to Satellite Footprints

In this episode, we speak with Deepak Rajanna, CPO at SatSure and ex-Amazon, Flipkart, xto10x, about pricing failures at scale, war room lessons from Big Billion Days, and building satellite-powered systems with SRE principles at their core.

S0
1
.
e0
6

The Show must go on - with Piyush Varma

What reliability means to the modern consumer, why SREs make excellent decision-makers, and the current state of observability withCo-Founder and CTO of Last Nine

S0
1
.
e0
5

Tech is Easy, People are Hard — with Suresh Kumar Khemka

Platform engineering,balancing bureaucracy and velocity at startups with theHead of Platform and Infra at Apna

S0
1
.
e0
4

BookMyShow's Cinematic Product Journey — with Viraj Patel

Category Creation, Product Innovation, and Empowering SREs to Do More withPrevious Lead SRE at Flipkart and Bookmyshow

S0
1
.
e0
1

Evolution of Site Reliability - with Manoj Sebastian

The Evolution of SRE through 20 years, Post Incident Culture at Big Tech and the Future of Reliability with AI withPrevious Lead SRE at Flipkart, intuit, Atlassian and Amazon

Incidentally Reliable Blogs

Byte sized content from the front-lines of Site Reliability.
No items found.

The Definitive Guide to AI in Service & Operations

PDF cover that says "Modernizing IT Ops with AI"