/* Hide Spotify link by default */
#1 SRE PODCAST

Season 3 – Now Streaming

Episode 1 is live! Join our new host, Jim Hirschauer, as we explore the future of AI-native incident response. Listen on all your favourite platforms

Available on all your favourite platforms

Episodes

S
3
.
e
1
New

The Zenduty Journey, AI-Native Response, and a New Host

Reliability is about fixing things, not just resolving them. In this season premiere, we take a trip down memory lane with Vishwa to uncover the story behind Zenduty and how the "Incidentally Reliable" podcast began. Jim and Vishwa discuss the transition to Xurrent, the "needle in the haystack" problem in modern observability, and why culture—not just code—is the key to true reliability.

S
2
.
e
4
New

Once an SRE, always an SRE

In this episode, Sudarshan shares his experience leading high-performing SRE and infrastructure teams at Rippling, Twilio, Walmart, and Epsilon. He talks about reducing CI/CD costs by 60 percent, cutting on-call alerts by 65 percent, and the mindset required to build resilient systems.

S
2
.
e
3

CTRL + ALT + Scale: Building More Than Just Code

In this episode, Madhu Rawat (CTO, Xurrent) sits down with Sakshi — Co-founder and Head of Engineering at Kapstan, with leadership experience at Sumo Logic and UpGrad. They discuss the evolution of observability, building for scale, the role of AI in incident management, and what it means to lead engineering teams through change.

S
2
.
e
2

Redefining ITxM with Zenduty x Xurrent

In this episode, Phil (CPO) and Madhu (CTO) from Xurrent sit down with Vishwa and Ankur from Zenduty to talk about ITxM, building for reliability across teams, and how product and platform thinking come together in real-world incident workflows.

S
2
.
e
1

From Cart Failures to Satellite Footprints

In this episode, we speak with Deepak Rajanna, CPO at SatSure and ex-Amazon, Flipkart, xto10x, about pricing failures at scale, war room lessons from Big Billion Days, and building satellite-powered systems with SRE principles at their core.

S0
1
.
e0
14

GoDaddy's Journey to Hosting Reliability — Incidentally Reliable Podcast with Amit Rindhe

In this episode of Incidentally Reliable, we sit down with Amit Rhinde, Head of Engineering at GoDaddy, to uncover the secrets behind building resilient systems, scaling global operations, and ensuring uptime for millions of users.

S0
1
.
e0
13

Press Start to Scale: SRE in Gaming - Incidentally Reliable with Denys Pashutynski

In our latest episode, we speak with Denys Pashutynski, Senior Engineering Manager of Site Reliability at Roblox, about the formidable challenges of sustaining a global gaming platform. Drawing from his tenure at Twitter, AWS, and eBay, Denys delves into managing traffic surges, latency optimization, and strategic change management.

S0
1
.
e0
12

Battle-Tested Reliability Strategies with Abhishek Ghosh

We dive into the trenches with Abhishek Ghosh, a veteran who has led SRE teams at Pinterest, and now at Cribl. He shares gripping war room stories from Pinterest, strategies for maintaining uptime, insights into the role of AI in observability, and more! Discover the future of SRE and learn how to navigate the challenges of digital reliability. Tune in to gain valuable lessons from one of the industry's leading experts.

S0
1
.
e0
11

The Science of Building Cloud Native DevTools

Catch Ramiro Berrelleza — Founder and CEO at Okteto talk about how impactful DevTool startups are built, the importance of investing in Developer Experience, and the emerging issues with the Cloud Native ecosystem

The Definitive Guide to AI in Service & Operations

PDF cover that says "Modernizing IT Ops with AI"