<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=6766292&amp;fmt=gif">
Skip to content

Site Reliability Engineer

Kinetic Data seeks a Site Reliability Engineer to design and maintain our cloud infrastructure and ensure the reliability of our platform. This remote position requires deep expertise in AWS, Kubernetes, and database management, combined with a focus on automation and security. The ideal candidate brings 5+ years of experience in SRE or DevOps roles, along with strong programming skills and a proven track record of implementing scalable cloud solutions. You'll work with our engineering team to optimize performance, manage Kubernetes clusters, and develop robust monitoring strategies.

PUBLISHED ON

Feb 12, 2025

LOCATION

Remote

TIME

Full-Time

About this position

Site Reliability Engineers at Kinetic Data are responsible for designing, implementing,
and maintaining highly available, scalable, and secure cloud infrastructure. The role
requires expertise in cloud-based offerings such as AWS or GCP and the
management and monitoring of database services such as Postgres and Cassandra.
All product deployments will require in-depth knowledge of Kubernetes, Helm, and
other tools related to the implementation and management of Kubernetes clusters.

Duties and Responsibilities

  • Design, implement, and maintain scalable and resilient cloud infrastructure on
    AWS
  • Monitor cloud resource utilization and optimize for performance and cost
    efficiency
  • Design and implement backup, recovery, and disaster recovery strategies for
    critical data systems
  • Deploy and manage Kubernetes clusters for hosting containerized
    applications
  • Implement and manage monitoring tools such as Prometheus, Grafana,
    CloudWatch, and ELK Stack to track system health
  • Automate deployment pipelines using tools like GitHub Actions
  • Enforce security best practices across infrastructure, databases, and
    applications
  • Maintain comprehensive documentation for infrastructure, processes, and
    troubleshooting guidelines
  • Performs other related duties as assigned

Required Skills and Abilities

  • Advanced expertise in AWS services such as EC2, RDS, S3, and EKS
  • Expertise in deploying and managing applications on Kubernetes
  • Experience with PostgreSQL and Cassandra, including query optimization,
    replication, and cluster management
  • Familiarity with tools like Prometheus, Grafana, ELK Stack, and AWS
    CloudWatch
  • Strong programming skills in Python, Go, or Bash for automation tasks
  • Excellent verbal and written communication skills
  • Excellent interpersonal and customer service skills
  • Excellent organizational skills and attention to detail
  • Excellent time management skills with a proven ability to meet deadlines
  • Strong analytical and problem-solving skills

Education and Experience

  • Bachelor’s degree in computer science or related field or equivalent
    experience; High school diploma or equivalent is required
  • 5+ years of experience in SRE, DevOps, or infrastructure engineering roles
  • Certifications such as:
    • AWS Certified Solutions Architect or DevOps Engineer
    • Kubernetes Certified Administrator (CKA)

Physical Requirements

  • Prolonged periods sitting at a desk and working on a computer
  • Must be able to lift up to 15 pounds at times

This position reports to the Head of Engineering 

How can I apply?

If you are interested  in this position: 

  • Email hr@kineticdata.com  with a copy of your resume and note the job title "Site Reliability Engineer" in the subject line.