Senior SRE/DevOps Engineer - Remote Trading Infrastructure | AWS & Kubernetes

Remotely
Full-time
Part-time

We are seeking an experienced Site Reliability Engineer (SRE)/DevOps specialist to join our advanced trading platform team. In this pivotal role, you will architect, implement, and maintain robust cloud infrastructure that powers mission-critical trading operations. Your expertise in AWS, Kubernetes, and modern DevOps practices will ensure our systems maintain exceptional reliability, performance, and security standards essential for financial trading environments.


Key Responsibilities

- Monitor and troubleshoot production trading systems, identifying performance bottlenecks, service interaction issues, and creating actionable tickets for development teams.

- Conduct thorough incident resolution with comprehensive root cause analysis and detailed reporting, collaborating across teams to solve complex infrastructure challenges.

- Design and implement sophisticated monitoring solutions utilizing Zabbix, Grafana, Dynatrace, and ELK stack to provide real-time visibility into system health.

- Establish and maintain robust build, release, and configuration management processes that ensure consistency and reliability across environments.

- Deploy, automate, and manage AWS cloud-based production systems with emphasis on high availability, performance optimization, scalability, and enterprise-grade security.

- Implement and orchestrate Kubernetes clusters (v1.28+) for containerized applications with focus on resilience and horizontal scalability.

- Develop and refine CI/CD pipelines using Jenkins 2.x and GitLab CI to streamline deployment workflows and reduce time-to-production.

- Configure and manage Infrastructure as Code using Terraform and Terragrunt for consistent, version-controlled infrastructure deployments.

- Administer development, QA, and production environments throughout the entire software development lifecycle.


Required Skills

- Strong knowledge of Linux/Unix systems (Ubuntu 22.04, CentOS 8, or equivalent) with demonstrated troubleshooting expertise.

- Minimum 3+ years of hands-on experience with containerization technologies including Docker and Kubernetes orchestration.

- Demonstrated proficiency with Infrastructure as Code tools, particularly Terraform, Ansible, Chef, or Puppet.

- Comprehensive understanding of web server architecture and configuration, especially Nginx and reverse proxy implementations.

- In-depth knowledge of HTTP protocol stack and RESTful service architectures.

- Practical experience implementing and maintaining CI/CD pipelines using Jenkins and/or GitLab CI.

- Advanced proficiency with Git version control systems and GitOps workflows.

- Working knowledge of SQL and experience with PostgreSQL (v14+) or MySQL (v8+).

- Fundamental understanding of networking concepts including TCP/IP, DNS, load balancing, and security principles.


Nice to Have

- Experience with event streaming platforms, particularly Apache Kafka 3.x.

- Knowledge of service mesh technologies such as Istio or Linkerd.

- Familiarity with HashiCorp tools (Consul, Vault) for service discovery and secrets management.

- Background in high-frequency trading platforms or financial systems.

- AWS Certified DevOps Engineer or Kubernetes CKA/CKAD certifications.

- Scripting proficiency in Python, Go, or Bash for automation tasks.

- Experience implementing SRE practices including SLIs, SLOs, and error budgets.


Tech Stack

- Operating Systems: Linux distributions (Ubuntu 22.04, CentOS 8)

- Cloud: AWS (EC2, EKS, S3, CloudWatch, Lambda)

- Containerization: Docker, Kubernetes (v1.28+)

- Monitoring: ELK Stack 8.x, Zabbix 6.x, Grafana 10.x, Dynatrace

- Version Control: Git, GitHub/GitLab

- CI/CD: Jenkins 2.x, GitLab CI

- Infrastructure as Code: Terraform, Terragrunt

- Configuration Management: Ansible, Puppet, Chef

- Databases: PostgreSQL 15.x

- Messaging: Apache Kafka 3.x

- Service Discovery & Secrets: Consul, Vault


Why Join Us

Join our forward-thinking team and work on sophisticated technology that powers complex trading operations. We offer a flexible remote work environment with optional relocation to Montenegro. You'll tackle challenging infrastructure problems using modern technologies while contributing to mission-critical systems that demand the highest levels of reliability and performance. Our collaborative engineering culture emphasizes continuous improvement, knowledge sharing, and professional development in the rapidly evolving DevOps landscape. This position presents an exceptional opportunity to enhance your expertise in cloud-native technologies while making significant contributions to a high-impact trading platform.