The Data Science Revolution
How the new richness and accessibility of data, and advances in data science, are enhancing both quantitative and traditional fundamental investment research—and sparking a revolution in active management.
So-called “big data”—the residue of information that we all leave behind as we buy things, sell things, browse the high street and the internet, use our smartphones and generally live our modern lives—is proliferating. At the same time, advances in cloud computing, machine learning and artificial intelligence allow us to extract coherent, strategic insights from these digital residues. Combined, as data science, they have the potential to be a richly enhanced source of information about our world—information that is deeper and more detailed than we have ever had before, and yet also broader and more comprehensive.
Spreading from its origins in technology and retail, data science is now making waves in the information- and research-driven world of finance and investment.
Here, we look at what big data means in this new context. With real-world examples, we explore why data science informs rather than replaces traditional investment research. We describe the importance of information flows between quantitative and traditional fundamental research and how they can be enhanced still further with data science. In fact, we argue that the true power of data science lies there—which has implications for the way a modern active management business should structure its research efforts.
Big data is spreading everywhere in the modern, networked world, and advances in cloud computing, machine learning and artificial intelligence are enabling us to extract coherent, strategic insights from it.
Think about how we used to learn about a flu outbreak, for example. Sick people would go to the local doctor, who would notice an uptick in flu sufferers and speak with colleagues who were seeing the same thing. Today, clusters of Google searches can reveal the same dynamics many months or even years before these things would become evident in the old way. Similarly, in the analog world we had an inkling that people were more likely to go shopping when the weather was nice. In the big data world of credit card transaction records, we can find out what has been bought, from where it was bought and how much it cost.
Importantly, data science can give us a deeper and more detailed picture of the world without giving up any breadth or comprehensiveness. A wide range of alternative data is potentially useful in the investment context. It includes “hard” data such as supplier payments, prescriptions and healthcare insurance claims, and laboratory results. But there is also rich information in “softer” data such as news and social media keywords or satellite imagery.
While rarely decisive, these data can provide another corroborative layer to the insights gleaned from traditional data sources and fieldwork. We find that the following five data sources—which can include numbers, written text, spoken words, digital information and pictures—are the most prominent:
Alternative data become most useful when they are processed with data science techniques. A swatch of credit card transactions is just a crowd of individuals buying things: it takes a lot of computing power, and often machine-learning techniques, to identify coherent subsets of that sample that are tractable to market and security analysis.
Data scientists need to determine which data have the potential to be useful. A dataset may simply be of irretrievably poor quality. It may be very interesting, but immaterial to any investment question. It may represent excellent value for one of our competitors and their investment focus and style, but not for us. Having the expertise and the infrastructure to clean and interpret data is vital. So is being an engaged, responsive and demanding consumer of data, rather than a passive one. Neuberger Berman’s data scientists estimate that only 5% of the swatches it sees has the potential to inform our research effort.
It is also important to develop a clear research hypotheses to inform the selection and preparation of the raw data. This is where existing investment teams have a critical role to play in getting actionable results from alternative data.
We believe that making the most out of alternative data is not just a matter of investing in data science, but also of integrating data science more fully into the traditional investment management research process.
It helps to think about the characteristics of alternative data alongside the characteristics of traditional, bottom-up security analysis and the characteristics of factor- or risk premia-based quantitative investing.
Fundamental analysts’ knowledge is very deep but comparatively narrow: they know a lot about a small number of companies. Quantitative investors’ knowledge is very broad, but comparatively shallow: they know a few key things about every listed company in their investment universe.
As we have seen already, data science has the power to give us a deeper and more detailed picture of the world without giving up any of its breadth and comprehensiveness. Data on the performance of every product from every competitor company in the marketplace can bring extra breadth to fundamental research. Data that corroborates or challenges signals from traditional metrics and ratios, or fills gaps in company reporting can bring extra depth to quantitative investing.
In short, to get the most out of data science, practitioners should recognize that it is not a replacement for traditional investment research but a complement to it; and not a technology support function for investment professionals but an extension of what they already do.
“In asset management, data science and large-scale data engineering make it possible to understand and evaluate businesses both at a high level and in minute detail. In our view, this has never been more important than in the current business environment…"
Kai Cui, Head of Equity Data Science
As we described, data science can bring new breadth to fundamental analysts’ research. The starting point for this work is often a research hypothesis originating among the analysts themselves.
Most hypotheses are concerned with how well certain products are competing, or the immediate results achieved by new strategic initiatives.
In simple terms, the data science insights that feed into our fundamental research aim to enhance our understanding of economy-wide, sector-wide and company fundamentals so that we are better able to pick out who is going to win in the marketplace. We want to pick the next winners in a particular sector before anyone else is aware they exist; and we want to identify threats to incumbent businesses before they make their impact.
Overall, we find that many of our data science research hypotheses fall into one of the following six analytical categories:
At Neuberger Berman, quantitative investing is most focused on factor- or risk premia-based quantitative investing, which tries to identify risks that, over long time periods, have systematically been rewarded in excess of the market return.
Once we have identified these risks, we then try to identify stocks that map closely onto them. That involves gathering data about those stocks. These are primarily traditional fundamental metrics and ratios, but alternative data have at least three roles to play.
First, they can provide entirely new inputs directly into the mapping process. Second, alternative data can fill gaps that companies leave in their standard reporting. Third, and most commonly, alternative data simply enrich our quantitative team’s dataset for and knowledge of a company or sector. Potentially, it can identify risks that are not discounted before those risks exert their impact; and it can identify idiosyncratic drivers of return in excess of that predicted by the presence of systematic risk.
“In quant investing you make a hypothesis about the metrics you want to forecast as a key component of the performance of a company, and try to find evidence that those metrics have some predictive value. Data is what you need to test the metrics.”
Ray Carroll, CIO, Neuberger Berman Breton Hill
Information flows both ways between our fundamental and quantitative research teams.
When these two functions are fully integrated, quantitative screens can suggest entirely new avenues for bottom-up research, or risks that may be hidden to the analyst’s eyes. The quantitative team may see something at the (broad) macro level that is not being picked up by a fundamental analyst’s (deep) company- and sector-level view, for example.
Fundamental analysts’ insights can also inform quantitative investing. They can often confirm whether or not signals from fundamental metrics and ratios are corroborated by deeper human insight into individual companies.
When both processes are enhanced by collaboration with a data science team, their mutual dialogue is also enhanced—and, for that reason, we believe full integration of fundamental and quantitative processes extracts the most actionable value from alternative data.
A centralized research platform, feeding into both fundamental and quantitative strategies, will get the most out of a data science capability
This has important governance implications.
A centralized research capability—as opposed to siloed teams of analysts supporting individual strategies, groups of strategies, or portfolio management teams—is arguably best for facilitating the flow of information that data science stimulates.
Data science is a new tool in traditional investment research efforts, not a replacement for them. Data science has to ask the right questions. That requires investment professionals to generate research hypotheses that are relevant to investment objectives as well as being within the scope of data science and its datasets. We believe that achieving a common language with which to nurture that information flow is much easier when a firm maintains a centralized research program that feeds the full range of its fundamental and quantitative strategies.
Big data is out there, waiting to change our expectations of what investment research can achieve. To harness it, active managers need to embrace data science, but also structure their research efforts to exploit its full potential.
Poorly selected swatches of data, acquired passively and uncritically, compromised by gaps and anomalies and hosted on inappropriate platforms, will only take investors so far—and probably not far enough to justify the costs. For most practitioners, therefore, we believe disillusionment will follow and the hype will burn itself out.
The ones left to benefit will be those that not only make the investments in data science expertise and technology infrastructure that are necessary to glean strategic insights from the datasets, but also recognize its structural and governance implications. These are the practitioners who will take the big science revolution beyond the hype.