Site Reliability Engineer job in Austin, TX

or email this job to apply later

Why We Need You:

We are looking for an experienced Site Reliability Engineer to join (link removed) and help us build reliable, robust software solutions for our customers. As a Site Reliability Engineer, you will be responsible for monitoring system performance, troubleshooting technical issues, deploying code changes, and collaborating with other teams to ensure the best possible customer experience.

The ideal candidate should have extensive experience in cloud infrastructure, distributed systems engineering, scripting and automation tools such as Docker containers and Kubernetes clusters. Additionally, you should possess the skills needed to manage service outages and ensure system availability by writing scalable software solutions.

What You Will Do:

Monitor system health metrics to proactively identify potential bottlenecks or errors
Develop strategies for resolving performance issues and identify areas of improvement
Manage monitoring tools like Grafana and Prometheus including deploying and optimizing their usage
Deploy releases of applications and services in collaboration with developers
Troubleshoot production outages and implement fault tolerance solutions
Maintain documentation related to system operation procedures
Document game-day scenarios and test these scenarios
Develop and support automation that allows for continuous testing of software created by the team
Design and assist in the setup and maintenance of application monitoring and alerting
Assist in designing and deploying HA/DR architecture for mission critical workloads
Collaborate with other teams to ensure optimal performance of system and dependent resources
Participate in on-call duty rotation

Requirements

What You Will Need:

Bachelor's degree in Computer Science or relevant field preferred
5+ years of experience in SRE/DevOps roles
Good communicator and able to clearly articulate complex issues and technologies.
Expertise in Linux server administration and scripting languages (Python)
Knowledge of containerization technologies like Docker & Kubernetes
Proficient in a modern scripting language like GO or Python
Deep understanding of modern microservices based architectures and operations
Experience in defensive coding practices and patterns for high-availability.
Familiarity with configuration management tools
Excellent problem solving skills & strong collaboration abilities
Be comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to constantly adapt to moving targets.

Benefits

How we will support your growth and success:

Partner with executives, leadership and cross-functional organization including engineering, marketing and business operations.
Professional development opportunities to further skills and knowledge
Discover the exciting world of monitoring, observability, and SRE while becoming an advocate and drive innovation in the industry.
A supportive team of passionate and dedicated individuals all focused on building the best monitoring service in the world.

Health Care Plan (Medical, Dental & Vision)
Paid Time Off (Vacation, Sick & Public Holidays)
Family Leave (Maternity, Paternity)
Training & Development
Work From Home

Uptime.com

Apply Online

or email this job to apply later

	Search millions of jobs
Jobvertise

Report this job

Why We Need You:

What You Will Do:

What You Will Need:

How we will support your growth and success: