Acacia Communications is an optical networking technology company that empowers cloud and content providers to connect at the speed of light enabling them to meet the rapidly increasing consumer demands for data.
Senior Dev Ops Systems Administrator
This position is a member of a fast-paced Information Technology team responsible for leading the efforts to support the Engineering and computer platform side of the infrastructure. Acting as SME on Linux VM deployment, automation, and configuration as well as a technical resource and liaison between the end-users, vendors, engineers, and help-desk on a variety of projects and support tickets.
Key Essential Functions
- Lead bare metal installation, configuration, and deployment of new Linux servers. Hardware Support.
- Lead Linux Virtual Machine deployment, configuration, monitoring, right-sizing
- Documentation: Develop procedures and repeatable process flow, build configuration and automation, diagraming
- Act as SME on Build Automation, Configuration Management, Kickstart deployment, bootstrap configuration deployment (Cobbler, Ansible, BigFix, bash/shell scripting)
- Collaborate and Liaison with External Vendors, Engineering Team Leads regarding various aspects of storage, compute, capacity planning, infrastructure optimization, server purchases, GPU’s and maintenance scheduling. Support Product Development and Operational Lab teams. Improve existing processes, increase automation, and achieve product development goals.
- Administration and Monitoring: Shared FlexLM Licensing support, Grafana, Prometheus, NetApp Harvest, SNMP, Solarwinds, and external vendor monitoring. Administer ISC DHCP, DNS, NIS, Active Directory Security Group membership. Linux Package Management and internal package repositories. Installation of EDA tools to central storage.
- Troubleshooting: NIS, NFS Cross-Mounts, Load Averages, Performance, Active Directory, Centrify Authentication, NetApp Integration
- Project Implementation, Anti-Virus, and other agent deployments, policy and configuration enforcement
- Security Awareness and proactivity: Manage centralized local iptables, firewalls, user and group permissions, file system permissions
- Grid Compute platform support: Online new compute nodes, Remote Desktops, Troubleshoot grid issues with LSF/OpenLava
- Work with end-users (engineers) to review and resolve technical and/or infrastructure related issues
- Monitor storage growth, identify elastic workloads, capacity planning
- Occasional racking, cabling of new servers, blades, compute chassis
- Contribute to cloud initiatives and data access
- Some light database maintenance as needed
- Participate in On-Call rotation
- Perform other duties as assigned
Minimum Qualifications, Experience, Skills, Education and Certifications
- Bachelor’s Degree (or higher) in computer science or similar field, or the equivalent qualification in training and experience
- 5+ years of experience in a Development Operations role or similar experience
- 8+ years overall experience in a Linux Systems Administration or Lead role
- Qualified to work with the following: Linux, VMWare vSphere platform, RCS, NetApp, NFS, NIS, CIFS, SSSD, SNMP, SMTP (postfix), DHCP, DNS, Ansible, Active Directory, Firewalls (iptables), RHEL/CentOS (6,7), Perl, Bash, Sed, Awk, TCP/IP, Subnetting, VLAN’s, Routing, tcpdump, wireshark, Graphite, Postgresql, MySQL, WhisperDB, LevelDB, Carbon, Prometheus, Exceed Turbo-X, VNC, Centrify, and Resource Isolation/hardening
- Grid platform experience with grid software such as LSF, OpenLava, Slurm, etc.
- Strong oral and written communication skills, and ability to work independently and collaborate with peers as needed
- Amazon Web Services, Perforce, Perl, Python, CUDA, Docker
- EDA Tools Experience
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.