Lin Taylor


Data Scientist

DueDil Jan 2017 - present
  • Core responsibilities include scoping out, building and testing data pipelines and data products. Have worked both independently and in teams on a variety of core and experimental projects.
  • Onboard and maintain datasets from third-party providers (numbering up to ~2M records per batch), from point of raw data collection to exposure to the site. Use in-house data pipeline framework (Python, Spark, Hadoop, Postgres, Jenkins) to automate daily or weekly batch running of ETL jobs and continuous integration of data. Build automatic data validation frameworks to control for data quality.
  • Used machine learning model (word2vec) to expand existing keywords by finding nearest neighbouring keywords in vector space. Resulted in addition of keywords for 22mn companies in 10 countries to the site.
  • Scope out MVPs for exploratory projects/new features using open-source packages/ ipynb/ etc.

Scientific software intern

Enthought Sept 2016 - Jan 2017
  • Built a plugin for Enthought's Canopy Geoscience application (a research platform for analysing geological data) that adds additionaly functionality to how a user can analyse data
  • Wrote and deployed Python code to production using Enthought's open source packages (Traits, Envisage)
  • Presented work at internal seminars

Data science fellow

Data Science for Social Good May - Aug 2016
  • 14-week fellowship; worked in a team of four to build an early intervention system for the Metro Nashville Police Department that predicts police officers at high risk of adverse incidents. Our work has featured in the Economist, New Scientist, and the Chicago Tribune
  • Designed and built a database system that can accommodate data from diverse sources and multiple police departments; this formed the starting point for a machine learning pipeline in Python and Luigi
  • Using machine learning, we were able to correctly predict 80% of officers who would go on to have an adverse incident, while flagging only a third of officers in the department. Our system flagged fewer than half the number of officers needed for this level of accuracy under current systems


Recurse Center Nov 2015 - Feb 2016
  • 12-week educational retreat for programmers; improved core programming and CS knowledge by writing open-source software, pair programming, and attending workshops

Data science fellow

S2DS Aug - Sept 2015
  • 5-week bootcamp for STEM PhD holders moving into the data science industry; worked in a team of three to perform proof-of-concept machine learning analyses for the financial technology company Intelliflo
  • Performed data analyses on dataset of >9 million records in Python, Pandas and scikit-learn; delivered a set of recommendations for data storage and usage, as well as Python scripts for automated data cleaning

Graduate researcher

University of Cambridge 2011 - 2015
  • Studied plant gene evolution; used bioinformatic, molecular and microscopy techniques to understand changes in gene function over time
  • Used statistical methods to build evolutionary trees to infer the ancestral traits of a group of flowering plants. Presented work frequently at international conferences and internal seminars

Data analyst and researcher

RBG Kew 2009 - 2010
  • Estimated plants' endangerment by approximating their geographic ranges using statistical analysis of plant sighting data
  • Created recommendations for how to use these types of data to make conservation assessments; this work resulted in two publications


Workshop organiser

Cambridge 2014 - 2015
  • Organised a monthly workshop for women and LGBTQ individuals to learn programming, with the aim of increasing their representation in the tech industry
  • During my involvement we grew by 60 new students and coaches
    (from ~50 to 110)

Selected open source projects


  • A biological name-matching game using data from an open Encyclopedia of Life; collaborated extensively with a colleague using git branching and merging workflow
  • Received 1200 unique visitors in its first week
  • Python, Flask, HTML/CSS, Heroku, Redis, git

iPython notebooks


PhD molecular genetics

BSc (Hons) Biology with a work experience year