It’s said that 90% of all existing data has been created in the last two years. As people migrate online and become inseparable from their cell phones, their browsing activity and movements, along with satellite data, are being measured and analyzed, offering valuable insights to businesses, and—increasingly—to portfolio managers. In my role as a data scientist, I’ve looked closely at these trends, and share some thoughts below.
Data in Action
Before I came to finance, I worked as a professor teaching neuroscience and computer science, and helped students start businesses. One startup analyzed bank transactions at large financial institutions to help identify potential money-laundering activity. There were many transactions, but it turns out that accounts owned by terrorists and other criminals don’t look like normal bank accounts. A lack of check writing, cash-only transactions, receiving a wire from a charity—these were some of the traits that analysis of large data sets could use to help identify the bad guys.
Another venture dealt with online browsing activity—the “shoes” that follow you around the internet. In 2009, there were only static ads on all websites. But by 2014, the majority of advertisements on websites became dynamic and personalized to the user. The advertisements are selected in an auction that takes place as the web page is loading. Worldwide, users visit a new web page a million times per second, and we had a tenth of a second to decide, based on all the information we had at our disposal, what ad to show them and how much to bid for it.
These are two examples of the power of big data. The field of advertising has developed considerable infrastructure around such tasks, and other industries, including Wall Street firms and portfolio managers, are becoming more involved.
And why not? If you know what ad to show someone, you know what they’re interested in. If you know what they’re interested in across all products and across the whole population, you know who’s winning in the marketplace. It’s not a stretch to argue that such information can provide a leg up when it comes to investing.
Big Data and Fundamental Investing
Equity analysts spend a great deal of time analyzing companies—talking to CEOs, reading prospectuses and annual reports, and doing other proprietary research to get an edge. But in the end, there’s still a lot that’s unknown—which is often reflected in the earnings surprises seen in the market after quarterly reports. A stock price can swing a great deal after the facts are announced, after which it can move over the subsequent three months due to a variety of influences (including momentum) until the next set of concrete facts comes out.
What big data can do, among other things, is to provide a new level of precision regarding what is actually happening on the ground to a business, to help analysts and portfolio managers make choices. In theory, this means that the time between announcements becomes less of a mystery to those who are applying the science.
A simple way of thinking about this is to look at Zillow, the real estate valuation service. Zillow provides price discovery in the real estate space, breaking your house into component parts—bedrooms, bathrooms, square feet, style, age and so on—and then using this information and the price of nearby properties to generate a valuation. Today, research analysts and portfolio managers go through a similar exercise regarding companies that is largely manual in nature. With big data, the notion is that this can be done continuously and for everything traded in a market. It then becomes a question of the advantage provided by having more information about the value of a business.
Let’s use the hypothetical of a consumer tech company. Credit card data can show purchases across its businesses—whether tablets, mobile phones, music or laptops—and then each business can be assessed based on growth, geography and demographics. In turn, this information can be aggregated to show a “CEO’s dashboard” revealing how the whole company is faring.
Relevant in Surprising Places
Thus far, the sweet spot of big data has been consumer businesses, but it is increasingly relevant in areas like energy and health care. It’s worth bringing in a couple more examples to show how data can enrich analysis:
Energy: Some 20,000 – 50,000 Americans lease their land for petroleum fracking companies, and a number of trusts manage this process, providing royalties to individuals in proportion to oil extracted. By analyzing the aggregate trust deposits into their bank accounts, it’s possible to figure out the statistical relationship between the payments and oil extraction generally, thus creating a real-time estimate of U.S. oil production.
Health care: For the last eight years or so, Google has been constructing flu maps based on user web searches for flu symptoms and the geographic clusters they reveal. Search terms can also reflect drug side effects, and the degree to which a side effect is searched can tell you its frequency and level of severity. Under the normal reporting process, if doctors receive enough complaints from patients, they will tell the FDA, which will then do a study before adding a warning to the drug label—reportedly as long as 18 – 24 months after Google data shows evidence for the side effect.
Beyond transaction, browsing and phone location data, satellites are also useful in assessing industrial and other activity. For example, you can look at satellite photos of strip mines in central Australia and see how much ore was removed, and then you can look at the heat signature from a smelting plant, or the cars in the parking lot, or job postings to find out how much work the plant is doing. Then, you can look at what’s stacked outside to see how much material has ultimately been processed.
Such information can help you figure out what’s really happening to a business. Let’s say a company is experiencing 10% sales growth. How is it getting that growth? Is it adding new customers or squeezing existing ones? Is it offering discounts that will hurt its earnings down the road? The data that provides the answers is available to those with the technical ability plus insight to reveal and use it.
Of course, you have to be very careful about false, misleading or useless data. In one instance, a large store in California appeared to be getting peak customer traffic at 8.a.m.—two hours before it actually opened for the day. So, we called the company that provided the data and suggested that they were mistakenly including cars traveling on Route 101 (right next door) in their numbers. They denied it at first, saying that they had excluded vehicles going over 20 mph, but it turns out that, in rush hour, Route 101 slows to a crawl—enough to place all of its traffic in the data set
Looking Ahead
Where is all of this going? At the moment, big data represents an enticing but underused resource, and many folks don’t yet know how to get their arms around it.
On Wall Street, a key issue is cultural. Portfolio managers are used to thinking about tech workers as providing infrastructure, securing information and, if you are lucky, creating cost savings. Another challenge is structural. Many firms want their new systems to be compatible with existing ones to save money. But they probably should take a lesson from internet companies, which are dependent on computing advances for survival and thus are constantly rebuilding their software.
As author William Gibson has said, “The future is here already—it just isn’t very evenly distributed.” Big data is open for business, and some investors are starting to take advantage of it while others are not—in my view, to their detriment. The idea is not that data will take the place of judgment, skill or industry knowledge. Rather, it is something that can help managers make better decisions, and potentially enhance their ability to generate performance for clients over the long term.
World of Data
User Snapshot: Internet, Social, Mobile
Source: Hootsuite.
The Wired Landscape Continues to Expand
Connected Devices Globally (bn)
Source: Ericsson, 2017.
*Internet of things.