Search Jobvertise Jobs
Jobvertise

Site Reliability Engineer
Location:
US-TX-Austin
Email this job to a friend

Report this Job

Report this job





Incorrect company
Incorrect location
Job is expired
Job may be a scam
Other







Apply Online
or email this job to apply later

Why We Need You:

We are looking for an experienced Site Reliability Engineer to join (link removed) and help us build reliable, robust software solutions for our customers. As a Site Reliability Engineer, you will be responsible for monitoring system performance, troubleshooting technical issues, deploying code changes, and collaborating with other teams to ensure the best possible customer experience.

The ideal candidate should have extensive experience in cloud infrastructure, distributed systems engineering, scripting and automation tools such as Docker containers and Kubernetes clusters. Additionally, you should possess the skills needed to manage service outages and ensure system availability by writing scalable software solutions.


What You Will Do:

  • Monitor system health metrics to proactively identify potential bottlenecks or errors
  • Develop strategies for resolving performance issues and identify areas of improvement
  • Manage monitoring tools like Grafana and Prometheus including deploying and optimizing their usage
  • Deploy releases of applications and services in collaboration with developers
  • Troubleshoot production outages and implement fault tolerance solutions
  • Maintain documentation related to system operation procedures
  • Document game-day scenarios and test these scenarios
  • Develop and support automation that allows for continuous testing of software created by the team
  • Design and assist in the setup and maintenance of application monitoring and alerting
  • Assist in designing and deploying HA/DR architecture for mission critical workloads
  • Collaborate with other teams to ensure optimal performance of system and dependent resources
  • Participate in on-call duty rotation

Requirements

What You Will Need:

  • Bachelor's degree in Computer Science or relevant field preferred
  • 5+ years of experience in SRE/DevOps roles
  • Good communicator and able to clearly articulate complex issues and technologies.
  • Expertise in Linux server administration and scripting languages (Python)
  • Knowledge of containerization technologies like Docker & Kubernetes
  • Proficient in a modern scripting language like GO or Python
  • Deep understanding of modern microservices based architectures and operations
  • Experience in defensive coding practices and patterns for high-availability.
  • Familiarity with configuration management tools
  • Excellent problem solving skills & strong collaboration abilities
  • Be comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to constantly adapt to moving targets.

Benefits

How we will support your growth and success:

  • Partner with executives, leadership and cross-functional organization including engineering, marketing and business operations.
  • Professional development opportunities to further skills and knowledge
  • Discover the exciting world of monitoring, observability, and SRE while becoming an advocate and drive innovation in the industry.
  • A supportive team of passionate and dedicated individuals all focused on building the best monitoring service in the world.
  • Health Care Plan (Medical, Dental & Vision)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Training & Development
  • Work From Home

Uptime.com

Apply Online
or email this job to apply later


 
Search millions of jobs

Jobseekers
Employers
Company

Jobs by Title | Resumes by Title | Top Job Searches
Privacy | Terms of Use


* Free services are subject to limitations