Site Reliability Engineer
Kinetic Data seeks a Site Reliability Engineer to design and maintain our cloud infrastructure and ensure the reliability of our platform. This remote position requires deep expertise in AWS, Kubernetes, and database management, combined with a focus on automation and security. The ideal candidate brings 5+ years of experience in SRE or DevOps roles, along with strong programming skills and a proven track record of implementing scalable cloud solutions. You'll work with our engineering team to optimize performance, manage Kubernetes clusters, and develop robust monitoring strategies.
PUBLISHED ON
Feb 12, 2025
LOCATION
Remote
TIME
Full-Time
About this position
Site Reliability Engineers at Kinetic Data are responsible for designing, implementing,
and maintaining highly available, scalable, and secure cloud infrastructure. The role
requires expertise in cloud-based offerings such as AWS or GCP and the
management and monitoring of database services such as Postgres and Cassandra.
All product deployments will require in-depth knowledge of Kubernetes, Helm, and
other tools related to the implementation and management of Kubernetes clusters.
Duties and Responsibilities
- Design, implement, and maintain scalable and resilient cloud infrastructure on
AWS - Monitor cloud resource utilization and optimize for performance and cost
efficiency - Design and implement backup, recovery, and disaster recovery strategies for
critical data systems - Deploy and manage Kubernetes clusters for hosting containerized
applications - Implement and manage monitoring tools such as Prometheus, Grafana,
CloudWatch, and ELK Stack to track system health - Automate deployment pipelines using tools like GitHub Actions
- Enforce security best practices across infrastructure, databases, and
applications - Maintain comprehensive documentation for infrastructure, processes, and
troubleshooting guidelines - Performs other related duties as assigned
Required Skills and Abilities
- Advanced expertise in AWS services such as EC2, RDS, S3, and EKS
- Expertise in deploying and managing applications on Kubernetes
- Experience with PostgreSQL and Cassandra, including query optimization,
replication, and cluster management - Familiarity with tools like Prometheus, Grafana, ELK Stack, and AWS
CloudWatch - Strong programming skills in Python, Go, or Bash for automation tasks
- Excellent verbal and written communication skills
- Excellent interpersonal and customer service skills
- Excellent organizational skills and attention to detail
- Excellent time management skills with a proven ability to meet deadlines
- Strong analytical and problem-solving skills
Education and Experience
- Bachelor’s degree in computer science or related field or equivalent
experience; High school diploma or equivalent is required - 5+ years of experience in SRE, DevOps, or infrastructure engineering roles
- Certifications such as:
- AWS Certified Solutions Architect or DevOps Engineer
- Kubernetes Certified Administrator (CKA)
Physical Requirements
- Prolonged periods sitting at a desk and working on a computer
- Must be able to lift up to 15 pounds at times
This position reports to the Head of Engineering
How can I apply?
If you are interested in this position:
- Email hr@kineticdata.com with a copy of your resume and note the job title "Site Reliability Engineer" in the subject line.