Categories

Scraping Best Practices Investigator - Python
Scrapinghub
About the Job:
Your key objective will be to advance Scrapinghub’s knowledge of web technologies and web scraping best practices.
This is not a production role. Instead, you’ll be given the time and resources to iteratively, and with scientific rigor, test hypotheses and produce a research-backed knowledge base for other developers at Scrapinghub.
Despite not working on specific customer projects, your work will help fuel growth across all of Scrapinghub’s Data business (Professional Services & Data on Demand). Your measures of success will be your ability to iterate quickly and produce assets that are useful to other Shubbers.
Job Responsibilities:
- Create and execute well designed experiments (repeatable, multiple treatments, testable variables, controls, replication) to learn more about how to best complete web scraping projects
- Produce well written, indexed, reports of your findings (similar to publishing to an academic journal, though not nearly as lengthy)
- Propose new experiments to run
- Work with the Team Lead to prioritize the backlog of experiments
- Maintain best practice guides for other Shubbers who will be implementing client solutions based on your findings
- Propose changes to Scrapinghub’s other products (Crawlera, Scrapy Cloud, etc) or Scrapy itself based on your findings
Job Requirements:
- Excellent communication in written English.
- A strong understanding of the Scientific Method and the ability to continuously implement a process that follows it with rigor.
- Take a logical, measurement-backed approach to prioritizing projects, and enjoy working with others that do the same.
- Familiarity with techniques and tools for crawling, extracting and processing data, asynchronous communication and distributed systems.
- A strong knowledge of Python along with a broad general programming background; strong problem solver.
- Enjoy working across several teams and communicating with your end customer (other Shubbers)