Posts

Showing posts from May, 2026

Principal SRE - Interview Question

Image
  Sharing some of the interview questions for the role of Principal SRE from Apple: 1) Tell me about a production outage you’ve handled that had ambiguous symptoms. How did you narrow it down? 2) How do you decide what belongs in an SLO, and how do you avoid overengineering it? 3) You inherit a platform with strong uptime but high operational toil. What do you change first? 4) How do you handle a disagreement with product leadership when reliability work competes with feature delivery? 5) A service is scaling rapidly and latency is degrading under load. Walk me through your approach. 6) What does “good observability” mean to you in practice? 7) Describe how you would lead a major incident. 8) What’s your philosophy on automation in production operations? 9) How do you evaluate whether an architecture is resilient enough? 10) You’re the on-call SRE for a globally used service. At 2:00 AM, error rates jump from 0.2% to 8%, latency doubles, and one region is still healthy wh...