How new a trend is it for social media platforms to be used as a research source and how prevalent is this practice within the financial services sector? Social media analytics for investing and trading is relatively new. Its slower acceptance is striking when compared to social media data mining for marketing, advertising, or brand sentiment discovery purposes. This is due to a perception that investors would be unwilling to share information about their trades publicly, and the difficulty of deriving automated signals from the idiosyncratic language of finance. Social media for speedy information discovery has existed as long as micro-blogging has – approximately a decade – but the more sophisticated uses of social media analytics have only been around for half that time.
The foundational paper claiming to use tweets to predict market mood –
Twitter mood predicts the stock market, authored by Johan Bollen, Huina Mao and Xiao-Jun Zeng – came out in 2011. The study finds that if you filter all tweets through the Google Profile of Mood States, the 'calm' mood state seems to be predictive of the Dow Jones Industrial Average, with a two to six-day lag. When people are calmer markets will rise, and when they are more anxious markets will fall. A hedge fund was launched using this methodology. The study was later picked apart by an
anonymous blogger, and the fund failed, but the notion of finding market sentiment in unstructured social media data persisted. The relevance of social media to investors was reinforced in 2013 when
Bloomberg added tweets to its terminal. Several of the ‘investor sentiment’ data miners were founded around that time. They quickly reached a very high level of sophistication and now most, if not all, quantitative funds incorporate social media signals.
In terms of the volume of data that is out there, how much could be relevant to investors? The volume of data is immense – between 300 and 500 million tweets are sent every day. Only about one hundredth of 1% of these are cashtagged – adding a ‘$’ symbol to a stock ticker on Twitter or StockTwits, which means it can be clicked on and searched for all tweets about that stock – although many others without cashtags contain useful company data.
But social media mining extends beyond Twitter, of course. The overwhelming majority is totally irrelevant. Innovation in the space comes from finding novel data sets to analyse – beyond tracking social media posts, this could include chat logs, metadata, page views and trending searches. So the relevant data set keeps expanding all the time, and once one feed is closely tracked, advantages from following it decrease, although better natural language processing (NLP) – the ability of a computer program to understand human speech as it is spoken – can provide an edge.
Which social media platforms are the best ones to monitor for investment-related data?I would venture that it is worth classifying social media into two categories: general social networks, and social for finance. You have to approach them with different attitudes. There is a fundamental difference in usefulness for the institutional investor between social networks where investors share tips or portfolio strategies, and general social networks where immediate market signals can be found. The former, such as Stocktwits, Covestor, eToro, collective2, or any of the investing forums, are of limited usefulness to serious investors – they tend to be elaborate herding mechanisms. They are more useful for retail investors that seek security in the crowd, or justification for a given strategy. It is also more difficult to extract data from those social networks.
300–500 million
The number of tweets sent out every day
The absolute gold standard – the one that everybody pays attention to – is Twitter. Stock information on Twitter has great depth and variety, because you can use it to acquire real-time market sentiment about virtually any stock in the S&P500, and find a diversity of opinion too. The cashtags enable easy sorting. From a statistical perspective, you have a greater chance of approximating actual market sentiment, since your sample is so large. With the high volume comes a large quantity of irrelevant or low quality information though, and so scraping Twitter necessarily requires a significant commitment to filtering and cleaning. Gaining access to the Twitter firehose – which pushes data to end users in near real-time, guaranteeing delivery of all of the tweets that match the desired criteria – is also fairly expensive. To access the firehose you must go through Gnip (a social media aggregation service that allows uses to enter search terms into an interface and retrieve relevant data). The pricing is variable, and non-public, and it depends on volume, bandwidth, and generally is prohibitively expensive for all but the biggest players. The pre-2015 model listed the decahose (access to a tenth of all tweets) at $60,000 a year, and the halfhose (half of all tweets) at $360,000 a year. These have increased since Twitter consolidated its analytics partners and took it in-house.
For traditional assets, such as equities, there exists a formal and mature research ecosystem that ensures prices are generally efficient and somewhat reflective of fundamental values. For some novel assets, however, these research markets barely exist and social media data is a necessary component of one’s analysis – there is some
evidence that price formation is driven by social media activity. For cryptocurrencies, such as bitcoin, early access to news was only possible through relatively obscure information centres such as the bitcointalk forum, Reddit, or 4chan – but the potential for manipulation is high. As cryptocurrencies, especially bitcoin, are of growing interest to institutional investors as an alternative asset class, it is important to be aware of the mutualistic relationship these coins have with their constituent forums. Furthermore, since these markets are still inefficient, trading on social media data (even with approaches as simple as tracking the volume of searches for a given asset) can be profitable.
Where should professionals start in terms of gathering and understanding this data to start to build an investment recommendation?There is the ‘DIY’ method and the ‘all-in-one’ method. Generally, doing it yourself involves gaining access to the Twitter firehose via Gnip (or any other chatroom or social media entity you would like to follow), deciding which sort of companies or assets you want signals on, developing a filtering methodology, and applying an NLP framework to extract sentiment from the data. This requires a serious commitment of time and resources, and NLP for finance is still in its infancy, so reliable comprehension is still a sophisticated task. Others might outsource this job to financial analytics providers, which may include: Ravenpack, Social Market Analytics (SMA), Dataminr, PsychSignal, Estimize, Contix, iSentium, Selerity and Marketpsych. The established information retailers also offer coverage for social media data, through Bloomberg Social Velocity and Thomson Reuters’ Eikon. These services all have somewhat different value propositions, and should all be looked into individually. Because of the expense and nature of the data, these firms generally cater to larger investors, especially hedge funds and high-frequency trading.
What are the challenges around this type of social media analysis, and how can these be tackled? The chief issue with social media data is filtering out spam, and irrelevant and misleading data. The immediate and dramatic power of social media – especially Twitter – makes it a frequent target of scammers. These schemes usually involve hacking popular social media accounts, or imitating reputable news organisations in order to crash the price of a stock. Different means of tackling this issue exist. Approaches range from red-flagging problematic or ‘spammy’ accounts, to identifying ‘fake news’ by running articles through grammar screens.
Deriving genuine sentiment from unstructured data remains a challenge. As social media analysis becomes more widespread, the edge attainable by following real-time alerts about asset price fluctuations on Twitter disappears. Then investors using social signals to trade must differentiate themselves with more sophisticated language processing algorithms or by exploring obscure social networks. Getting algorithms to understand the subtleties of the ways humans express themselves online is a significant difficulty, and traditional NLP frameworks tend to stumble when exposed to the jargon of finance. Automated sentiment extraction can backfire when algorithms are confronted with sarcasm, for instance.
For investment professionals that do not adapt to using these new tools as part of their arsenal, what will their clients be missing out on? There is a significant body of evidence that incorporating aggregate social media data can predict market movements. This goes beyond simply finding impactful tweets or gaining access to public information before anyone else. Even for institutional and slower-moving investors, real-time, bottom-up measures of market sentiment have value. At present, many common indicators of sentiment (volatility indices and bullishness percentages) are top-down and insufficiently granular (granularity is the ratio of computation to communication). Filtered social media data is an alternative research product that tells you precisely how the market feels about an individual equity or the market as a whole.
About the expert
Nic Carter, who is studying for a masters degree in finance and investment at the University of Edinburgh Business School, created the award-winning documentary Social sentiment analysis: a needle in the haystack along with partners Aristeidis Georgopoulos, Dennis Jakob and Wenrong Lu.
Since many of the research providers only serve hedge funds, their methodologies are largely private, so assessing their validity is difficult. However, several of these players, such as SMA and Ravenpack, have transparently issued white papers detailing their methodology. SMA has launched, via the Chicago Board Options Exchange (CBOE), two indices tracking portfolios generated by social media-derived sentiment signals. These are proofs of concept, and their returns bolster SMA’s claim that incorporating social media sentiment has value for institutional investors, not simply high-frequency traders.
How do you expect this aspect of investment research to evolve in the coming years?In the future, we will see social media analytics firms extend their offerings to more asset classes. That said, it is tricky, if not impossible, to gauge investor sentiment on obscure commodities from social media posts. The language parsing will continue to improve, and clearer pictures will emerge of market sentiment at a given time.
As unstructured information is translated into investment signals with greater efficiency, advantages from this technology will likely be captured by the larger players and quantitative funds, which can afford subscriptions to the best analytics firms. Much like brokerages and exchanges, big funds and analytics firms will increasingly seek close and preferential access to social media networks. Privileged access to the data stream should see the field become more consolidated and difficult for outsiders to penetrate. At present, the social media information market is sufficiently efficient that retail investors have little to gain by probing these market sentiment strategies.
Further reading
Bitcoin spread prediction using social and web search media
The digital traces of bubbles: feedback cycles between socio-economic signals in the bitcoin economy