Johnson Poh, Head Data Science / Practice Lead, DBS Bank
The explosion of big data, coupled with advances in cloud computing technologies, has enabled companies to leverage on data for business insights. It is no surprise that across industries, companies have begun taking on a data-driven approach by charting out long-term roadmaps and setting up data science teams within their organisations.
Yet bringing technology, process, and people together is easier discussed than put into practice. How much of an investment do you need to make on infrastructure? How do you ensure consistency, continuity, and scalability in a data science set up? How do you acquire the right composition and talent in a big data team?
As more companies transit towards a data-driven approach, we begin to better understand how we can tackle the operational challenges that come with managing technology, people and process in practice.
Technology – Setting Your Architecture and Capabilities Right
Mapping out a three to five year data-driven roadmap ensures that companies have a good foundation to embark on the next step of getting the right data capabilities that suit their stage of data-driven development. It also helps companies keep costs at bay and purchase data assets that the business needs. In this regard, the old adage holds true—you can’t run before you learn how to walk.
For example, companies just starting out may focus on deploying a data science workbench while taking on a modular approach in developing data architecture. Using open sourced frameworks such as Hadoop and Spark are affordable and low cost for fulfilling big data processing and analysis. It also enables companies to perform quick experiments to understand the requirements for the next stage of growth in their data capabilities.
Companies that are more advanced in their data-driven strategy will find that they need to scale up in terms of data capacity and complexity. They are at a stage of evolving from just simple analysis to predictive modelling and growing a larger data repository. These companies may customise software and invest in their own R&D and design data applications to improve their company’s workflow and productivity.
Process – An Underrated and Often Overlooked Necessity
The data science practice is very much an exercise in continuous research, experimentation, and discovery. There are multiple data-science lifecycle frameworks to aid collaboration between business units and data science teams.
The data science practice is very much an exercise in continuous research, experimentation, and discovery
These frameworks include the commonly used Microsoft Team Data Science Process (TDSP), the Cross Industry Standard Process for Data Mining (CRISP-DM) and the Knowledge Discovery in Databases (KDD). Some companies have gone further to develop their own customised blueprint by adopting just snippets of these frameworks.
The key takeaway is that these frameworks adopt an agile methodology where close collaboration and iteration is required between the business units and the data science team. For example, the task of setting objectives is traditionally led by business units. However, data science teams that take it upon themselves to build an intimate understanding of the business’ challenges, regulatory policies, and the dynamics among upstream and downstream business partners, will have an advantage when working with the business unit.
In the same vein, developing models is a scope that is often heavily centered on data science teams. A strong collaborative synergy between business units and data science teams at this stage can set the premise for a robust model with sound business justifications. For example, feature engineering involves finding the most informative data variables, which requires a combination of domain expertise from the business units and technical analysis from the data scientists.
While project management structures and frameworks are not new, it is often an overlooked factor that is critical to running a sustainable data science operation and managing different stakeholders in the organization. It does not only ensure consistency and business continuity but helps companies advance much faster and more efficiently in their data-driven endeavor.
People – Getting the Right Expertise and Building the Right Expertise
Job functions in data science are not homogenous. On the contrary, these job functions span a variety of roles, which include data analysts, data engineers, data architects, data scientists, and more recently machine learning engineers.
Different roles contribute different segments of the data pipeline and require a specific set of expertise. Getting the right composition of data professionals and talents is critical for delivering a full-scale data product.
For instance, companies at the onset of building their data-driven foundation will require more data engineers, who implement and manage big data platforms and setting up repositories. As companies advance along their roadmap, more data scientists and machine learning engineers will be required to build production-grade data analytics and predictive algorithms. Finally, to enable business users to leverage data analytics capabilities for business decisions, software engineers are necessary for developing front-end dashboards and data visualization tools.
It is important to note these skillsets are typically not cross-functional. This means that a data engineer would not necessarily have the relevant skillsets to perform the role of a data scientist or data architect. As such, HR practitioners supporting data science talent acquisition must be cognizant of how the different roles and expertise contribute to the end-to-end data pipeline. This will help organisations match individuals to the right job functions and build strong data science teams.
The professional field of data science has cemented its relevance across industries with many companies embarking on their own data-driven strategies. While the data science field will continue to evolve, we are beginning to crystalize the lessons learnt on tackling the operational challenges of managing technology, processes and people, so that we can help more companies move ahead in their data-driven development in a more efficient manner.