Towards data science: learning to walk before you run

robothjørnet.jpg

It is undeniable that acceleration in technology requires firms to revisit the core of their business. Success stories enabled by data analytics and machine learning are becoming public at accelerating pace as firms try to position themselves as digital leaders. This creates urgency to evaluate automation as a new source of growth across all industries. However, companies rushing into sophisticated AI initiatives, who are short of a clear understanding of their data and it's "proprietary" control might end up paralyzed. Evidently, such ventures demand a change in the internal culture but also to the core economics of its business. 

How NOT to hire your first Data Scientist

Let´s outline a typical bad scenario. Company executives want to boost their business with machine learning. They hire a data scientist, often a fresh graduate, and tell her she will build machine learning models to gain insights and automate their business. The scientist starts her job and soon finds out that there is no infrastructure, no ETL, data is a mess because nobody has ownership of or focus on it yet. Engineers are working on their sprints and putting things in place for data science is not a priority. Everything is taking forever and patience is wearing thin. Everybody is frustrated: scientist because she cannot do the job she was hired for; engineers because they have to get away from their tasks to do “grunt work” and pull data from production for the scientist who sits around and does nothing useful; executives because the data scientist did not even produce a decent dashboard, much less this magic machine learning everybody keeps talking about. In the end, the scientist quits and the company goes back to ground zero plus a feeling of failure that will affect the future initiatives within data science.

The difficult hunt and unclear expectations

The recent Stack Overflow survey of 64,000 developers shows that data scientists typically “spend 1–2 hours a week looking for a new job”. Machine learning specialists and data scientists are leading the ranks of developers currently looking to change jobs at around 14 per cent.

This is happening at the same time as companies are experiencing huge difficulties hiring data scientists. I believe this is partly a self-created problem. If one looks at open data science job postings most of them ask data scientists to be experts in computer science, statistics, communication, data visualization, and to have extensive domain expertise. They are often expected to be adept at an ever expanding list of tools. However, data scientists come from different disciplines and backgrounds which creates a diverse subset of skills. Data science being a fairly new profession makes it difficult to find candidates with many years´ experience, left alone within a certain domain. All this diminishes the probability of completing a successful hire.

The truth is that the list of expected qualifications could often be cut in half for many of the hiring processes but there is a catch to be able to do so. It forces the hiring companies to be well informed of the state of their data and surrounding infrastructure in order to transform that information into specific needs related to data science. Few companies are capable of doing that, but it might be the defining factor in succeeding with building a strong data science ecosystem. Having a clear direction and goals for your data team will allow you to find the right profiles and treat data initiatives as an integral part of your business as opposed to an isolated unit working to achieve somewhat undefined goals.

Matching business value with data science

In addition to missing the raw material, data to obtain results, many data scientists complain about a lack of clear questions to answer. Companies may sense the opportunity, but they are often unsure about how to relate the highly technical area to solving actual business needs they have.

Successful application of all technology, but especially the field of data science and machine learning puts new demands on the competence of traditionally functional and strategic professionals. Lack of such competence results in data scientists left alone to find the cases to pursue, oftentimes without being provided any directions based on business challenges at hand. With estimated 85% of all big data projects failingidentifying core business problems and opportunities that can be solved by enabling your data is the critical step companies need to succeed at. Connecting data science initiatives to specific business cases will also be defining to get the necessary buy-in and support from the company executives without which the chances to succeed are minimal.

Learning from others

Despite application of data science and machine learning in business being fairly new and uncharted territory for many, the companies that are succeeding seem to have some traits in common. Here are some steps you can take to increase the chances of building a data-driven company while mitigating the risks:

  1. Before you decide to hire an in-house data team, make sure you have a high-level understanding of what machine learning is and the different use cases related to your business it can help you solve.

  2. Invest in data literacy throughout the company to increase the awareness of business stakeholders of the benefits in becoming a data-driven company.

  3. Remember that data scientists will need to have easy access to information so assessing the state of your data and taking the needed steps to make it available should come first.

  4. Data scientists will need to work together with key people, possibly from different departments (not just IT!). Those people will need to support the scientists with business knowledge and make themselves available to help the project progress. Ideally and in time, data science should be transversal to the whole organization, just like the accounting department is.

  5. Even if a data science project started as an experiment, there should always be an end business goal that actually matters, and which can be impacted by the project in question.

  6. The success criteria of data science projects should not be a certain % level of accuracy for a machine learning model, but a business metric. Defining relevant performance indicators will also help you get the needed support from the executive team.

  7. Hiring and building a quality in-house team is going to prove difficult for nearly every company – unless you are somehow very competitive both in salaries and in challenging problems your company poses. Especially when timing is important and you want to improve your business metrics sooner rather than later, outsourcing can be a good alternative that allows you to access people with the right data science skills as well as domain knowledge.


Author: Julija Pauriene (julija.pauriene@avoconsulting.no)

Julija is Head of Artificial Intelligence at AVO Consulting. Current areas of work are early stage business, data analytics, machine learning and artificial intelligence.