A Virginia-based services company is an experienced and motivated AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of the company’s cloud infrastructure on Amazon Web Services (AWS).
Responsibilities:
- Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes
- Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
- Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
- Participate in on-call rotations to respond to and resolve incidents promptly
- Conduct post-incident reviews to identify root causes and implement preventive measures
- Work closely with the Security teams to implement and enforce best practices for securing AWS environments
- Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
- Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar
- Proactively identify areas for process improvement within the release management lifecycle
- Collaborate with QA teams to establish and execute release validation procedures
- Perform other duties, as needed
Qualifications:
- Proven experience as a Site Reliability Engineer or similar role
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent work experience)
- In-depth knowledge of AWS services and expertise in managing cloud infrastructure
- Proficiency in scripting languages (e.g., Python, Bash) for automation tasks
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
- Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies
- Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis
- Solid understanding of Agile methodologies and their application in release management
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
Desired Skills:
- Relevant certifications in DevOps or related fields