We are seeking an experienced OpenShift Platform Lead to own and manage our OpenShift-based virtualization platform that delivers enterprise VM hosting services. This role is responsible for the complete lifecycle management of the platform, including design, architecture, BAU operations, patching, upgrades, incident response, and driving platform stability.
You will lead the implementation, work closely with SRE and operations teams, and enable seamless VM migration from legacy infrastructure. This is a hands-on technical leadership role requiring deep OpenShift expertise and the ability to balance operational excellence with strategic platform evolution.
This is a full-time hybrid-based role.
To be successful in the role, you will need to be a good team player with excellent communication skills, have the ability to manage your own workload, and work well on your own initiative and under direction.
Platform Leadership & Strategy
- Own the technical strategy and roadmap for the OpenShift Virtualization platform
- Define platform architecture, design patterns, and technical standards
- Lead platform lifecycle management including major/minor upgrades and Red Hat CoreOS updates
- Drive platform stability improvements and performance optimization initiatives
- Establish platform governance, compliance, and security policies
- Build relationships with Red Hat support and leverage Technical Account Management (TAM)
Lifecycle & Operations Management
- Manage complete platform lifecycle from installation through upgrades to decommissioning
- Plan and execute OpenShift platform upgrades (4.x releases) with zero/minimal downtime
- Coordinate quarterly/monthly Red Hat CoreOS (RHCOS) patching cycles
- Oversee OpenShift Virtualization operator upgrades and feature enablement
- Maintain platform health through proactive monitoring and capacity planning
- Ensure platform meets defined SLAs and availability targets (99.9%+)
Incident & Event Management
- Lead Major Incident response for platform-level issues (Sev 1/2)
- Perform root cause analysis (RCA) and implement preventive measures
- Collaborate with SRE team on incident postmortems and improvement plans
- Manage platform-related events including maintenance windows
- Coordinate emergency changes and rollback procedures
- Participate in on-call rotation for critical platform escalations
Change Implementation & Release Management
- Review and approve platform changes through Change Advisory Board (CAB)
- Plan and execute complex platform changes with risk assessment
- Implement infrastructure-as-code (IaC) practices using Ansible and Terraform
- Drive GitOps adoption for platform configuration management
- Coordinate release windows for platform updates with business stakeholders
- Ensure change documentation and runbook accuracy
VM Migration & Workload Onboarding
- Lead VM migration strategy from VMware/legacy platforms to OpenShift Virtualization
- Design VM migration runbooks and automation workflows
- Create and maintain VM templates, golden images, and standardized configurations
- Enable application teams for self-service VM provisioning
- Troubleshoot VM performance, networking, and storage issues
- Optimize VM placement, resource allocation, and cluster balancing
Platform Stability & Performance
- Define and monitor key performance indicators (KPIs) for platform health
- Implement chaos engineering practices to validate platform resilience
- Tune OpenShift control plane and worker node performance
- Optimize storage performance (ODF/Ceph) for VM workloads
- Configure network policies and OVN-Kubernetes for optimal VM networking
- Drive continuous improvement initiatives based on operational metrics
Must-Have Skills & Experience
Experience Requirements:
- 8-12 years of overall IT infrastructure experience
- 5+ years of hands-on experience with Red Hat OpenShift Container Platform (4.x)
- 3+ years of experience with OpenShift Virtualization (KubeVirt) or similar VM hosting platforms
- 3+ years of experience in platform/infrastructure leadership roles
- 2+ years of experience with Red Hat Enterprise Linux (RHEL 7/8/9) and Red Hat CoreOS (RHCOS)
Technical Skills:
- Expert-level OpenShift administration (oc CLI, Web Console, API)
- Advanced OpenShift Virtualization knowledge (VMs, DataVolumes, CDI, live migration)
- Advanced Red Hat CoreOS and Machine Config Operator (MCO) experience
- Advanced Linux administration and troubleshooting (RHEL-based)
- Advanced storage management (ODF/Ceph, Storage Classes, PV/PVC, CSI drivers)
- Advanced networking (OVN-Kubernetes, Multus, Network Policies, SDN concepts)
- Advanced automation skills (Ansible, Bash scripting, Python)
- Intermediate Kubernetes concepts (Operators, Custom Resources, Pod lifecycle)
- Intermediate Infrastructure-as-Code (Terraform, GitOps tools like ArgoCD/Flux)
- Intermediate observability platforms (Prometheus, Grafana, AlertManager)
Platform Operations:
- Proven experience managing platform lifecycle (installation, upgrades, patching)
- Strong incident management and major incident response experience
- Experience with change management processes and release coordination
- Demonstrated ability to perform root cause analysis and implement preventive measures
- Experience with capacity planning and performance tuning
- Track record of driving platform stability improvements
Certifications Required (one or more):
- Red Hat Certified Engineer (RHCE)
- Red Hat Certified Specialist in OpenShift Administration
OR equivalent demonstrable experience