Software Reliability Expert - 6 months contract

Raising the Floor - US

Type of Position: Contractor, Full-Time, 6 months Contract

Anytime, Anywhere, Any Computer Access. We’re an international coalition of individuals and organizations dedicated to ensuring that the Internet, and everything available through it, is accessible to people with accessibility barriers due to disability, literacy, digital literacy, or aging, and regardless of their economic resources. Our vision is to revolutionize the landscape of assistive technology by creating an infrastructure to facilitate the development, distribution, and support of a wide range of affordable accessibility solutions around the world. That is, the Global Public Inclusive Infrastructure (GPII).
You will help a team of bright and talented developers located across continents who are passionate about our vision, that of radically improving the access to technology. How? By helping to develop associated system that supports the “portability” of user preferences across any platform or device -- that makes it easier for anyone to be able to have the technology they encounter automatically change into a form they can understand and use.
  • Work with the Global Public Inclusive Infrastructure (GPII) architects and subject-matter experts (SME) to understand the infrastructure components and define the reliability metrics that need to be implemented and monitored.
  • Plan large scale stress testing, where the stability of GPII is tested by developing automated test cases that mimic very heavy loads, of different user profiles, that are created when many users simultaneously access the cloud component of the GPII.
  • Design and document a reliability plan, to include a test approach, strategy and scenarios.
  • Implement the instrumentation required to collect data for analysis.
  • Recommend and document best practices.
  • Perform data analysis to detect reliability issues.
  • Provide recommendations to the GPII developers on how to optimize the system to improve reliability.
  • Integrate the reliability test cases into release processes, automate them in the GPII’s Continuous Integration environment, store results using technologies such as Elasticsearch, and provide dashboards to team members.
  • Work with Infrastructure developers to plan application deployments on Kubernetes clusters for reliability testing.
  • Debug and resolve issues relating to the automated test scripts.
  • 10+ years hands-on experience designing and writing reliability test plans.
  • Experience with modern, containerized cloud infrastructure (in particular, Docker and Kubernetes), and the reliability techniques best suited to this style of architecture.
  • Knowledge of and hands-on experience with open source testing tools.
  • An Agile mindset and team player, with experience contributing to open source communities using collaborative environments such as Github.
  • Development background with ability to review code and write automation scripts.
  • In-depth experience with debugging tools for Node.js (e.g. node-inspector, Chrome dev tools, heapdump, NSolid), and experience using these tools to identify the source of failures.
  • Ability to understand deployment topologies, identify problem areas, simulate failures, and recommend improvements.
  • Experience with networking protocols and one or more programming languages (JavaScript, Go, Python, Ruby).   
  • Experience using JIRA to report issues.
  • Experience working in a distributed environment.