Maintaining availability of cloud & physical Linux servers that power the Palantir platform in air-gapped production environments
Design, deploy, and operate infrastructure to support customer & product requirements via modern orchestration & monitoring platforms.
Collaborate closely with product teams on requirements & SLOs for deploying software into air-gapped environments.
Identifying, troubleshooting, and solving network & systems issues
Scripting to automate away routine operational tasks
Provide technical troubleshooting support for production issues, ensuring timely resolution and minimal impact on operations. Participate in a support on-call schedule
What We Value
Confidence in troubleshooting complex systems issues independently using stack traces and observability & systems tools
Comfort with managing large scale production systems and technologies with configuration management, load balancing, monitoring & alerting infrastructure, and container orchestration
Demonstrated ability to continuously learn and work independently, making decisions with minimal supervision while working in secure facilities
Experience with containers (Docker/Podman) and orchestration (OpenShift/Kubernetes) at scale is a plus
Preferred Certifications: DOD 8570 IAT Level II or greater (CISSP, Sec+), Unix/Linux Computing Environment (e.g Linux+, RHCE)
What We Require
Active security clearance
4+ years of experience with Linux system administration (RHEL or equivalent preferred)
Experience with cloud-based hosting platforms like AWS, Azure, or GCP and/or experience with hardware-based environments
Familiarity with monitoring systems using tools like Prometheus and writing health checks
Proficiency with at least one programming language, such as Java, Go, Python, JavaScript, Bash, or similar languages.
Strong engineering background, preferred in fields such as Computer Science, Mathematics, Software Engineering, Physics, and Data Science