AI Insights
Cloudflare

Systems Reliability Engineer

Cloudflare · Austin, Texas, US
full-timemid (3-8 yrs)Posted 140d ago
DevOps / Site Reliability EngineeringIC3ICOn-site
StackGoPythonRustDockerLinuxKubernetesKafkaTerraformVaultConsulTemporalGitPrometheusAlertmanagereBPFstraceSaltCI/CDDNSIP NetworkingLoad BalancingStatefulSetsPersistent Volume ClaimsIngressCRDsBare Metal OperationsDistributed Systems

Summary

A Production Engineering role at Cloudflare focused on building and operating the internal private cloud platform—including Kubernetes, Kafka, CI/CD, and developer tooling—that enables Cloudflare's engineering teams to deploy services reliably and safely at global scale.

About the role

About Us

At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world’s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine’s Top Company Cultures list and ranked among the World’s Most Innovative Companies by Fast Company. 

At Cloudflare, we’re not looking for people who wait for a polished roadmap; we’re looking for the builders who see the cracks in the Internet that everyone else has simply learned to live with. We value candidates who have the instinct to spot a "normalized" problem and the AI-native curiosity to create a solution using the latest tools. Our culture is built on iteration, leveraging AI to ship faster today to make it better tomorrow, while ensuring that every improvement, no matter how small, is shared across the team to lift everyone up. If you’re the type of person who values curiosity over bureaucracy, and that AI is a partner in solving tough problems to keep the Internet moving forward, you’ll fit right in.

Available Locations: Austin

About the role

As an engineer on one of our Production Engineering teams, you'll be building the tools to help engineers deploy and operate the services that make Cloudflare work. Our mission is to provide a reliable, yet flexible, platform to help product teams release new software efficiently and safely. You’ll be building the private cloud that Cloudflare developers leverage to build Cloudflare itself. Core platforms we operate at Cloudflare include:

  • Kubernetes
  • Kafka 
  • Developer tools, CI, and CD systems
  • Vault, Consul
  • Terraform
  • Temporal Workflows
  • Cloudflare Developer Platform

What You'll Do

  • Build software that automates the operation of large, highly-available distributed systems.
  • Ensure platform security, and guide security best practices
  • Document your work and guide fellow developers towards optimal solutions
  • Contribute back to the open source community
  • Leave code better than we found it

What You'll Need

  • Recent career experience with Go or Python and at least 3 years experience in the role of full-time software engineer (any language).  Rust is an added bonus.
  • Experience with deploying and managing services using Docker on Linux
  • A firm grasp of IP networking, load balancing and DNS
  • Excellent debugging skills in a distributed systems environment
  • Source control experience including branching, merging and rebasing (we use git)
  • The ability to break down complex problems and drive towards a solution
  • Be passionate about improving User Experience

Bonus Points

  • Experience with Deployment, StatefulSets, Persistent Volumes Claims, Ingresses, CRDs on Kubernetes
  • Operational experience deploying and managing large systems on bare metal
  • Experience as a Site Reliability Engineer (SRE) for a large-scale company
  • You have practical knowledge of web and systems performance, and extensively used tracing tools like ebpf and strace.
  • Alerting and monitoring (Prometheus/Alert Manager), Configuration Management (salt)

 

What Makes Cloudflare Special?

We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.

Project Galileo: Since 2014, we've equipped more than 2,400 journalism and civil society organizations in 111 countries with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.

Athenian Project: In 2017, we created the Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration. Since the project, we've provided services to more than 425 local government election websites in 33 states.

1.1.1.1: We released 1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever released. Here’s the deal - we don’t store client IP addresses never, ever. We will continue to abide by our privacy commitment and ensure that no user data is sold to advertisers or used to target consumers.

Sound like something you’d like to be a part of? We’d love to hear from you!

This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.

Cloudflare is proud to be an equal opportunity employer.  We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness.  All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer.

Cloudflare provides reasonable accommodations to qualified individuals with disabilities.  Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.  If you require a reasonable accommodation to apply for a job, please contact us via e-mail at [email protected] or via mail at 101 Townsend St. San Francisco, CA 94107.

What you'll do

1Build software that automates the operation of large, highly-available distributed systems
2Ensure platform security and guide security best practices across engineering teams
3Document work and guide fellow developers towards optimal solutions
4Contribute back to the open source community
5Maintain and improve internal platforms: Kubernetes, Kafka, CI/CD, Vault, Consul, Terraform, and Temporal Workflows
6Leave code better than found — continuous improvement of internal developer tooling and platform

Requirements

3+ years of full-time software engineering experience with Go or Python (Rust a bonus)
Experience deploying and managing containerized services with Docker on Linux
Strong understanding of IP networking, DNS, and load balancing fundamentals
Proven debugging skills in distributed systems environments
Ability to break down complex problems and drive end-to-end solutions autonomously

Nice to have

Rust
Kubernetes (Deployments, StatefulSets, PVCs, Ingresses, CRDs)
Bare metal infrastructure operations at scale
Site Reliability Engineering at a large-scale company
eBPF
strace
Prometheus
Alertmanager
Salt configuration management
Temporal Workflows
Cloudflare Developer Platform

Role overview

Role family
DevOps / Site Reliability Engineering
Level
IC3 — devops_sre
Experience
3–8 years
Type
Individual Contributor
Remote policy
On-site
Visa sponsorship
Not offered

Tech stack analysis

LANGUAGES
GoPythonRust
FRAMEWORKS
Temporal WorkflowsCloudflare Developer Platform
DATABASES
Kafka
INFRASTRUCTURE
KubernetesDockerLinuxTerraformVaultConsulPrometheusAlertmanagerSaltCI/CD systems
TOOLS
GiteBPFstracePrometheusAlertmanagerSalt

Salary estimate

$155K – $210K
AI-estimated salary range
Confidence78%
Reasoning

Cloudflare is a publicly traded, large-scale tech company (NASDAQ: NET) based in San Francisco with an Austin office. For a mid-level SRE/Production Engineer with 3+ years experience in a high-cost Texas tech market, Cloudflare's known compensation bands (from Levels.fyi and public data) for IC3-level engineers typically range $155K–$210K total comp including base + equity + bonus. Base salary likely $135K–$165K with equity making up the remainder.

See the AI-estimated salary range for this role

Sign up free →

Green flags

5 items
AI-native culture with explicit encouragement to use AI tooling to ship faster — signals modern engineering practicesculture

Discover all 5 green flags for this role

Sign up free →

Benefits breakdown

See all benefits organized by category — health, financial, time off & more

Sign up free →

Hiring insights

JD quality
7/10
Urgency
medium
Autonomy
high
Team size
medium (5-15)

See JD quality score, hiring urgency & team details

Sign up free →

Red flags

PRO4 items
No salary or compensation range disclosed in the posting — below the transparency standard now required in many US statescompensation

See all 4 red flags — what the JD isn't telling you

Sign up free →

Interview insights

PRO
Rounds
5
Duration
4 wks
Difficulty
hard
Take-home
Yes

Get full interview breakdown — rounds, likely topics & prep tips

Sign up free →

Career path

PRO
Next roles
Senior Production EngineerStaff SREEngineering Manager – Platform

See where this role leads — full career progression

Sign up free →
About the company

Cloudflare operates one of the world's largest edge networks, spanning 300+ cities in 100+ countries. It provides web security, CDN, DNS, and zero-trust services that protect and accelerate over 20% of all websites on the internet. Cloudflare processes over 57 million HTTP requests per second on average.

HQSan Francisco, CA, USA
Interview difficultyhard
Build vs Maintainboth
Cross-functionalYes