Site Reliability Engineer
ShareStream Education is a leader in online video and media management solutions for academic institutions. Our team is passionate about building a great product that is continually evolving and providing a service that allows our customers to realize the vast potential of streaming media for education.
ShareStream Education is deeply committed to achieving client successes and building strong relationships with the Company’s clients, whom we regard as our partners.
Join us and contribute to changing the way online education takes place through the use of streaming media!
The Site Reliability will work remotely, or in ShareStream’s office in Reston, VA if based in the Washington, D.C. metropolitan area. ShareStream Education will not accept resumes from recruiters for this position.
ShareStream is seeking a multitalented, dedicated Site Reliability Engineer who excels at automating engineering operations and building high-availability and fault-tolerant systems. The Site Reliability Engineer will:
- Enhance and operate the continuous integration and continuous delivery (CI/CD) pipeline for multiple applications
- Operate the container-orchestration platform and perform day-to-day monitoring and maintenance
- Automate upgrades, scaling, and other operational needs as required
- Deploy new releases across multiple SaaS customers
- Implement and operate a central logging solution as well as a central metrics solution
- Develop operational playbooks and dashboards to monitor production SaaS environments
- Contribute to managing AWS cost and resource usage
- Work with the Software-Engineering team to implement new technologies, including Minio, GlusterFS, ELK and Hadoop
- Assist with software engineering on ShareStream’s core applications.
- BS and/or MS degree in Computer Science or a related degree
- Extensive experience building and operating distributed systems in Amazon Web Services (AWS)
- Expert-level Linux skills (CentOS and Ubuntu)
- Extensive experience with container-based software development and management using Openshift, Docker and Kubernetes (or another container-orchestration platform)
- Extensive experience with Jenkins or another automation server
- Intermediate-level software-development skills using Java or another object-oriented programming language
- Expert in at least one scripting language, preferably Bash or Perl
- Experience managing backups and participating in disaster-recovery planning and testing is a strong plus
- Experience working in a fast-moving startup environment is a strong plus.