Recruiting a Top Notch ETL Developer
In order to build a great data team, you need great data engineers. Here’s how to hire them.
First, if you are looking for an ETL developer, you should actually be looking for a data engineer. ETL is a term dating back to somewhere in the 70’s, when data pipelines were mostly file or batch oriented, and were composed of multiple steps of extraction, transformation, and loading. Fast forward 40 years, and the data landscape has grown to be so complex that to harness it, a broad skillset and years of experience are required.
Now, developing an ETL process may seem easy to the inexperienced engineer, but it could quickly escalate into a nightmare of spaghetti pipelines and endless edge cases even if you have a good team in place. To break your organization’s glass ceiling of data, you need a team of great data engineers, and they’re not easy to find and recruit. We are lucky enough to work closely with many amazing data engineers – our users – and we see how they really make a difference in their organizations. This post is about finding the right people who will help your organization put its data to use.
Data Engineer Toolkit
So, what should you look for? First, technical chops. In the past, you would’ve seen technical skills listed such as: Informatica, Tibco, Oracle data warehouse, {My,Postgre,MS}SQL, Perl (pun intended 😉 ), Bash, PL/SQL, {Websphere,Microsoft,Rabbit}MQ. Some ETL developers also call themselves DBA’s, and may know a lot about database design and query optimizations.
Today’s data engineer resume expands on the above with many more modern technologies and skills:
- Big data stores: Hadoop, Spark, MongoDB, Cassandra, Elasticsearch, Redshift, Bigquery, Vertica, Snowflake
- A more extensive set of programming languages: Python (including Pandas and SciPy), R, Scala, Java (including Map Reduce)
- Pipeline orchestration tools (a plus): Luigi, Celery, Airflow
- Log collection and distribution tools: (a nice addition as well): Kafka, Flume, Fluentd, Logstash, Filebeat, ELK, Splunk
Beware: if you see all these on a single CV, you should be somewhat suspicious! Let’s be honest, not everyone is a unicorn. Select the right skills that are right for your data team.
Selecting the right candidate
Once you filter the right candidates based on CV, the next step is to meet them for an interview. Other than verifying the actual experience, you should also look for several personality traits or soft skills:
- Attention to detail and a mild obsessiveness for order – keep in mind that data engineers deal with the bits and bytes of billions of events coming from tens of sources on a daily basis
- Dedication and agreement for off-hours availability– 24/7 real-time data is crucial for your business, set expectations with your data engineers from day 0
- Patience and perseverance with “dirty work” – data pipelines break, and to recover data loss, countless hours of tracking events and offsets are required
And of course, make sure they are a culture fit! The key is to find someone who is service oriented, passionate about data, and loves helping people. A great data engineer will dramatically change how your company runs.