Senior Incident Manager for Trading Platform | Remote Role
Are you a technical troubleshooter with exceptional analytical skills and a passion for maintaining high-availability trading systems? We're seeking a dedicated Senior Incident Manager to oversee our sophisticated trading platform operations and ensure 24/7 availability across all environments. Your expertise in identifying, resolving, and preventing incidents will be crucial for maintaining our system's reliability and performance.
Key Responsibilities:
- Monitor production reporting systems with vigilance, solving real-time problems while maintaining 99.9%+ uptime for business-critical trading applications.
- Identify and resolve incidents through comprehensive log analysis, performance metric evaluation, and service interaction assessment, then coordinate with development teams to implement permanent solutions.
- Manage the entire build, release, and configuration lifecycle for production applications using modern CI/CD pipelines and version control systems.
- Deploy, automate, and maintain AWS cloud-based infrastructure, optimizing for availability, performance, scalability, and security to support trading operations.
- Oversee development and QA environments to ensure consistency and reliability across the entire technology ecosystem.
- Analyze system metrics and application performance patterns, creating detailed reports and actionable recommendations for technology improvements.
- Collaborate with cross-functional teams to enhance system reliability and systematically reduce incident frequency through preventative measures.
Required Skills and Experience:
- 1+ years of experience designing, analyzing, troubleshooting, and resolving issues in multi-tiered application architectures.
- Demonstrated expertise supporting service-oriented and microservices architectures that demand 24/7 availability.
- Proficiency with SQL queries and advanced database troubleshooting techniques.
- Working knowledge of Oracle (PL/SQL 19c or newer) and/or PostgreSQL (version 14+) database systems.
- Linux fundamentals including command-line utilities (awk, sed, bash, cat, grep) and system monitoring tools.
- Practical experience with AWS services including VPC, EC2, ECS, Route53, S3, and related cloud infrastructure.
- Proficiency with Git version control systems and enterprise branching strategies.
- Strong understanding of networking concepts, protocols, and systematic troubleshooting methodologies.
- Exceptional analytical skills with demonstrated ability to identify root causes in complex, interconnected systems.
- Outstanding communication skills to coordinate incident response across multiple technical teams.
Nice to Have:
- Advanced Linux system administration and web server (Nginx, Tomcat) configuration experience.
- Hands-on experience with modern DevOps tools such as Docker, Kubernetes, Jenkins, GitLab-CI, and Terraform.
- Understanding of JVM configuration parameters and runtime optimization techniques.
- Knowledge of modern API technologies including RESTful services, GraphQL, and gRPC protocols.
- Background in high-load application implementation, scaling, and performance optimization.
- Software engineering experience, particularly in financial markets, Forex trading, or gaming industries.
- Proficiency with JIRA for incident tracking, workflow management, and cross-team coordination.
- Experience with the ELK stack (Elasticsearch 7.x+, Logstash, Kibana) for comprehensive log management.
- Familiarity with enterprise monitoring tools such as Zabbix, Prometheus with Grafana, or similar platforms.
- Working knowledge of message brokers including Apache Kafka, AWS SQS/SNS, and Enterprise Service Bus implementations.
- Scripting abilities in Bash, Python 3.x, or other automation languages for routine task elimination.
Why Join Our Team:
Working with us means taking ownership of mission-critical systems that power sophisticated trading operations worldwide. You'll expand your technical expertise across multiple domains while working in a flexible, remote environment with a team of dedicated professionals. We offer competitive compensation, continuous professional development opportunities, and the chance to make a significant impact on a high-performance financial technology platform that processes millions of transactions daily.