AI Insights
OpenAI

Machine Learning Engineer, Distributed Data Systems

OpenAI · San Francisco, California, US
full-timesenior (5-10 yrs)Posted 82d ago
Software EngineeringIC3ICHybrid (3d)Visa SponsoredRelocation
StackDistributed SystemsData PipelinesData OrchestrationDistributed StorageStreaming InfrastructureMachine Learning InfrastructureDistributed ComputePythonLarge-Scale Data ProcessingSystem DesignInfrastructure ReliabilityScalability Engineering

Summary

Research Engineer role on OpenAI's Sora team focused on designing, scaling, and hardening distributed data infrastructure that powers large-scale multimodal model training and evaluation.

About the role

About the Team

The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating multimodal functionalities into our AI products, ensuring they are reliable, user-friendly, and aligned with our mission of broad societal benefit.

About the Role

As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements into robust systems, and harden pipelines that serve as the backbone for Sora’s rapid iteration cycles.

We’re looking for engineers who are detail-oriented, have strong experience with distributed systems, and excel at building reliable infrastructure in high-stakes environments.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

  • Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security.

  • Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient.

  • Partner with researchers to deeply understand requirements and translate them into production-ready systems.

  • Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation.

You might thrive in this role if you:

  • Have strong experience with distributed systems and large-scale infrastructure with a strong interest in data.

  • Are detail-oriented and bring rigor to building and maintaining reliable systems.

  • Demonstrate excellent software engineering fundamentals and organizational skills.

  • Are comfortable with ambiguity and rapid change.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

What you'll do

1Design, build, and maintain data infrastructure including distributed compute, data orchestration, distributed storage, streaming, and ML infrastructure
2Ensure the data platform can scale by orders of magnitude while remaining reliable and efficient
3Partner with researchers to deeply understand requirements and translate them into production-ready systems
4Harden, optimize, and maintain critical data infrastructure that powers multimodal training and evaluation pipelines
5Ensure scalability, reliability, and security across all data infrastructure systems

Requirements

Strong hands-on experience with distributed systems and large-scale data infrastructure at high scale
Ability to design and build production-grade data pipelines including orchestration, distributed storage, and streaming systems
Strong software engineering fundamentals with rigor in building reliable, secure, and efficient infrastructure
Experience partnering with ML researchers to translate research requirements into robust, scalable production systems
Comfort operating in fast-moving, ambiguous environments with rapidly changing requirements

Nice to have

Kubernetes
Apache Spark
Apache Kafka
Ray
Airflow
Prefect
HDFS
S3
gRPC
Go
Rust
C++
MLflow
Terraform
Prometheus
Multimodal AI

Role overview

Role family
Software Engineering
Level
IC3 — data_engineering
Experience
5–10 years
Type
Individual Contributor
Remote policy
Hybrid (3 days)
Visa sponsorship
Available

Tech stack analysis

LANGUAGES
Python
FRAMEWORKS
RayApache SparkApache KafkaAirflowPrefectDagster
DATABASES
Distributed File Systems (HDFS)Object Storage (S3, GCS)Data Lakes
INFRASTRUCTURE
KubernetesDistributed Compute ClustersStreaming InfrastructureCloud (AWS/GCP/Azure)Terraform
TOOLS
MLflowPrometheusGrafanagRPC

Salary estimate

$220K – $370K
AI-estimated salary range
Confidence75%
Reasoning

OpenAI is one of the highest-paying AI companies globally. Senior ML/infrastructure engineers at OpenAI in San Francisco typically earn $220K–$300K base salary, with total compensation (including equity/RSUs and bonuses) ranging from $300K–$370K+. Glassdoor, Levels.fyi, and public offer disclosures corroborate this range for senior research engineers at OpenAI.

See the AI-estimated salary range for this role

Sign up free →

Green flags

5 items
Role is on the Sora team — one of OpenAI's most high-profile, frontier AI projects — offering exceptional career visibility and impact.growth

Discover all 5 green flags for this role

Sign up free →

Benefits breakdown

See all benefits organized by category — health, financial, time off & more

Sign up free →

Hiring insights

JD quality
6/10
Urgency
medium
Autonomy
high
Team size
medium (5-15)

See JD quality score, hiring urgency & team details

Sign up free →

Red flags

PRO4 items
No salary, equity, or total compensation range disclosed — atypical for a CA-based role given California pay transparency norms.compensation

See all 4 red flags — what the JD isn't telling you

Sign up free →

Interview insights

PRO
Rounds
5
Duration
4 wks
Difficulty
very hard
Take-home
Yes

Get full interview breakdown — rounds, likely topics & prep tips

Sign up free →

Career path

PRO
Next roles
Staff ML Engineer / Distributed SystemsTechnical Lead, Data InfrastructurePrincipal Engineer, AI Infrastructure

See where this role leads — full career progression

Sign up free →
About the company

OpenAI is the AI research laboratory behind GPT-4, ChatGPT, DALL-E, and the Codex API. With over 200 million weekly active ChatGPT users, OpenAI is at the forefront of large language model development and deployment. The company pursues a mission of building safe artificial general intelligence that benefits all of humanity.

HQSan Francisco, CA, USA
Interview difficultyvery hard
Build vs Maintainboth
Cross-functionalYes