AI Insights
Snowflake

Software Engineer - Production Engineering

Snowflake · Bozeman, Montana, US
full-timemid (3-6 yrs)Posted 71d ago
Software EngineeringIC2ICOn-site
StackGoKubernetesLinuxAWSAzureGCPDistributed SystemsObservabilityIncident ResponseSLOsSLAsContainersCloud InfrastructureAutomationMonitoringPostmortem Analysis

Summary

A Production Engineering Software Engineer role at Snowflake focused on reliability engineering, SLO ownership, observability, and large-scale distributed systems. The team drives proactive reliability culture including incident management, blameless postmortems, and automation-first operations across Snowflake's cloud infrastructure.

About the role

At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era, we seek AI-native thinkers across every function who are energized by the opportunity to reinvent how they work. You don’t just use tools; you possess an innate curiosity, treating AI as a high-trust collaborator that is core to how you solve problems and accelerate your impact. We look for low-ego individuals who thrive in dynamic and fast-moving environments and move with an experimental mindset — who rapidly test emerging capabilities to discover simpler, more powerful ways to deliver results. At Snowflake, your role isn't just to execute a function, but to help redefine the future of how work gets done.

The Production Engineering Team at Snowflake is responsible for driving the reliability tools and processes that ensure Snowflake consistently delivers a top-tier experience for its customers. This includes championing Service Level Objectives (SLOs) across all of Engineering, building the infrastructure necessary for rapid detection of reliability issues, and deeply engaging in system health verification after releases. We think about production reliability end-to-end: how do we proactively prevent issues, quickly detect and diagnose problems when they arise, and efficiently resolve them to minimize impact. We drive the culture of learning from every incident.

RESPONSIBILITIES:

  • Improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.

  • Scale systems sustainably by automation; Participate in changes that improve reliability and velocity.

  • Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.

  • Write and review code. Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.

  • Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs.

  • Participate in a 12x7 on-call rotation.

MINIMAL QUALIFICATIONS:

  • Bachelor's degree in Computer Science, a related technical field involving software engineering, or equivalent practical experience.

  • Proficient in at least one modern programming language, preferably Golang.

  • Systematic problem-solving methods, effective communication skills.

PREFERRED QUALIFICATIONS:

  • 3+ years industry experience of building and supporting large scale systems in production.

  • Experience in modern observability tools and production monitoring practices.

  • Experience with containers and container orchestration systems such as Kubernetes

  • Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.

  • Hands-on experience with one of more public cloud providers (AWS, Azure, or GCP)

Snowflake is growing fast, and we’re scaling our team to help enable and accelerate our growth. We are looking for people who share our values, challenge ordinary thinking, and push the pace of innovation while building a future for themselves and Snowflake.

How do you want to make your impact?

For jobs located in the United States, please visit the job posting on the Snowflake Careers Site for salary and benefits information: careers.snowflake.com

What you'll do

1Improve the full lifecycle of services from inception and design through deployment, operation, and refinement
2Scale systems sustainably through automation and participate in reliability and velocity improvements
3Establish and lead low-noise incident response rotations and blameless postmortems to prevent recurrence
4Write and review code, develop documentation and capacity plans, and debug complex distributed systems issues
5Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs
6Participate in a 12x7 on-call rotation

Requirements

Proficiency in at least one modern programming language, preferably Go, to build reliability tooling and automation
Systematic problem-solving approach to debug and resolve issues in large-scale distributed systems
Strong communication skills to collaborate cross-functionally with software engineers on SLO design and capacity planning
Experience or strong aptitude for reliability engineering practices including on-call rotations, incident response, and blameless postmortems

Nice to have

Go
Kubernetes
AWS
Azure
GCP
Linux
Prometheus
Grafana
Datadog
OpenTelemetry
Terraform
Docker

Role overview

Role family
Software Engineering
Level
IC2 — devops_sre
Experience
3–6 years
Type
Individual Contributor
Remote policy
On-site
Visa sponsorship
Not offered

Tech stack analysis

LANGUAGES
Go
INFRASTRUCTURE
KubernetesDockerAWSAzureGCPLinux
TOOLS
Observability tooling (unspecified)Production monitoring platforms

Salary estimate

$140K – $185K
AI-estimated salary range
Confidence72%
Reasoning

Snowflake is a high-growth, publicly traded cloud data company known for competitive above-market compensation. For a mid-level SWE in Production/SRE at a tier-1 tech company, total cash (base + bonus) typically ranges $140K–$185K USD. Location in Bozeman, MT may slightly compress base vs. SF/NYC, but Snowflake historically pays competitively regardless of office location. RSU grants would be substantial on top of base. Estimate excludes equity.

See the AI-estimated salary range for this role

Sign up free →

Green flags

5 items
Blameless postmortem culture explicitly called out — signals a psychologically safe and learning-oriented engineering environmentculture

Discover all 5 green flags for this role

Sign up free →

Benefits breakdown

See all benefits organized by category — health, financial, time off & more

Sign up free →

Hiring insights

JD quality
7/10
Urgency
medium
Autonomy
medium
Team size
medium (5-15)

See JD quality score, hiring urgency & team details

Sign up free →

Red flags

PRO3 items
12x7 on-call rotation is significant — more demanding than a typical 1-week-per-month on-call schedule; could impact work-life balancework life balance

See all 3 red flags — what the JD isn't telling you

Sign up free →

Interview insights

PRO
Rounds
5
Duration
4 wks
Difficulty
hard
Take-home
Yes

Get full interview breakdown — rounds, likely topics & prep tips

Sign up free →

Career path

PRO
Next roles
Senior Software Engineer - SREStaff Production EngineerEngineering Manager - Reliability

See where this role leads — full career progression

Sign up free →
About the company

Snowflake is the cloud data platform that enables organizations to consolidate data, power analytics, and build data applications at massive scale. Processing exabytes of data for over 9,800 customers including Capital One, Adobe, and AT&T, Snowflake's unique architecture separates compute and storage across AWS, Azure, and GCP.

HQBozeman, MT, USA
Interview difficultyhard
Build vs Maintainboth
Cross-functionalYes