Site Reliability Engineer

(This position has been filled)

Kinetic Data seeks a Site Reliability Engineer to design and maintain our cloud infrastructure and ensure the reliability of our platform. This remote position requires deep expertise in AWS, Kubernetes, and database management, combined with a focus on automation and security. The ideal candidate brings 5+ years of experience in SRE or DevOps roles, along with strong programming skills and a proven track record of implementing scalable cloud solutions. You'll work with our engineering team to optimize performance, manage Kubernetes clusters, and develop robust monitoring strategies.

PUBLISHED ON

Feb 12, 2025

LOCATION

Remote

TIME

Full-Time

About this position

Site Reliability Engineers at Kinetic Data are responsible for designing, implementing,
and maintaining highly available, scalable, and secure cloud infrastructure. The role
requires expertise in cloud-based offerings such as AWS or GCP and the
management and monitoring of database services such as Postgres and Cassandra.
All product deployments will require in-depth knowledge of Kubernetes, Helm, and
other tools related to the implementation and management of Kubernetes clusters.

Duties and Responsibilities

Design, implement, and maintain scalable and resilient cloud infrastructure on
AWS
Monitor cloud resource utilization and optimize for performance and cost
eﬃciency
Design and implement backup, recovery, and disaster recovery strategies for
critical data systems
Deploy and manage Kubernetes clusters for hosting containerized
applications
Implement and manage monitoring tools such as Prometheus, Grafana,
CloudWatch, and ELK Stack to track system health
Automate deployment pipelines using tools like GitHub Actions
Enforce security best practices across infrastructure, databases, and
applications
Maintain comprehensive documentation for infrastructure, processes, and
troubleshooting guidelines
Performs other related duties as assigned

Required Skills and Abilities

Advanced expertise in AWS services such as EC2, RDS, S3, and EKS
Expertise in deploying and managing applications on Kubernetes
Experience with PostgreSQL and Cassandra, including query optimization,
replication, and cluster management
Familiarity with tools like Prometheus, Grafana, ELK Stack, and AWS
CloudWatch
Strong programming skills in Python, Go, or Bash for automation tasks
Excellent verbal and written communication skills
Excellent interpersonal and customer service skills
Excellent organizational skills and attention to detail
Excellent time management skills with a proven ability to meet deadlines
Strong analytical and problem-solving skills

Education and Experience

Bachelor’s degree in computer science or related field or equivalent
experience; High school diploma or equivalent is required
5+ years of experience in SRE, DevOps, or infrastructure engineering roles
Certifications such as:
- AWS Certified Solutions Architect or DevOps Engineer
- Kubernetes Certified Administrator (CKA)

Physical Requirements

Prolonged periods sitting at a desk and working on a computer
Must be able to lift up to 15 pounds at times

This position reports to the Head of Engineering

How can I apply?

If you are interested in this position:

Email hr@kineticdata.com with a copy of your resume and note the job title "Site Reliability Engineer" in the subject line.