Making Sense of ESG Data with Machine Learning

Making Sense of ESG Data with Machine Learning

How’s this for stating the obvious:  Understanding risks and analyzing opportunities requires data. It’s true for every business, and applies to financial risks and opportunities, product and market risks and opportunities, and ESG-related risks and opportunities.

Another obvious statement: data is often difficult to gather, organize, standardize, and actually put to use in decision-making. Again, that applies to financial, product, marketing, and ESG-related decisions. In fact, we would argue that much of what falls under the umbrella term of “ESG data” has a direct impact on financial, product, and/or marketing decisions. For example, if data show that consumers want less wasteful packaging, that’s finance, product, marketing, and ESG, all in one.

ESG Data is Tricky 

But even though ESG issues are now widely recognized as an important part of the “risks and opportunities” landscape, dealing with ESG data is particularly challenging. That is partly because definitions are still evolving. Even when we think we can do a decent job of defining something, such as Scope 1, Scope 2 and Scope 3 carbon emissions, what data is available (at a reasonable cost) to measure those things? It’s not entirely clear, and companies have to make assumptions and estimates when direct measurement is not possible.

It is even more difficult to know what ESG data to gather to measure less quantifiable things such as employee engagement. By the way, that one can be a big deal. If, as recent estimates suggest, roughly half of the employees at a given company are doing the minimum required to hold onto their jobs, that’s a big problem for productivity and customer satisfaction. Companies want to analyze this, but even though we can collect a lot of data about employees, what they do during work hours, how they feel about the company and their jobs, and so on, then what?

This is one big reason that studies on whether or not ESG factors affect financial performance and stock returns are often inconclusive—different groups of researchers use incomplete data to measure various things that lack standard definitions. For example, a study using one ESG vendor’s scores to form portfolios of “Good” and “Bad” ESG stocks, comparing returns over time doesn’t say much about whether ESG is important to stock returns. It says much more about how well (or poorly) those scores reflected actual ESG impacts over the period analyzed. 

Is More Data Always Better?

Of course, as appealing as it seems to collect more and more data, it is not always helpful. In a recent article titled “Using AI to Tackle the ESG Data Challenge”, WorldQuant poses this important question: as the range of ESG data increases, how can we know what adds incremental value (better insights for an investment analyst, for a marketing team, for financial or HR decisions, etc.) beyond the data we are already using? If two data elements have the same (or different) names that doesn’t mean they measure the same (or different) things. This increases subjectivity and noise.

ESG Data and Machine Learning 

Questions about how to make sense of lots of data point to machine learning as a solution, and that is definitely happening in the ESG space. Machine learning uses the data we gather to figure out which data is most useful, in a positive feedback loop.

Among the three pillars of E, S, and G (and we must note that the boundaries between them is often very fuzzy) it is easiest to obtain standardized data for Environmental issues. WorldQuant notes that this means relationship between the environmental risks and opportunities and market performance are likely to strengthen in the near-term.

Data for the Social and Governance pillars are less quantifiable and standardized. As noted above, although employee engagement has very real implications for profitability it is difficult to measure. The same applies to issues such as whether a company is taking the right steps to protect customer data, avoids unacceptable labor practices in its supply chain, actually cares about diversity, equity and inclusion (which clearly affects the quality of decision-making in the workplace), and so on. 

The WorldQuant article describes ESG data as “noisy and incomplete, not fully standardized, not integrated and not transparent.” It is mostly “unstructured,” often involving text that requires context to understand. It is also voluminous. But that content can be very useful! 

  • Industry reports, surveys and third-party analyses include information about ESG-related actions a company would not voluntarily disclose. 
  • Media platforms around the world publish news articles, interviews, commentaries and reports that contain information that is relevant to ESG-related activities, whether or not the published content is specifically focused on ESG.
  • Content published in other countries (often not in English!) can reveal that companies are engaged in activities that violate basic obligations defined by the United Nations Global Compact, or (on the bright side) support various Sustainable Development Goals.

Analyzing ESG Data with Natural Language Processing

Natural language processing (NLP) can be used to comb through all types of publications, from regulatory filings to industry reports, news stories, research, and social media postings, to collect not just words and phrases that relate to ESG topics but the context and sentiment that give them meaning. NLP algorithms can categorize information and determine whether the related sentiment is positive or negative.

To toot our own horn here for a moment, OWL ESG was an early adopter of NLP in the ESG space, and we use it to collect and perform sentiment analyses on millions of pieces of content. The nature of machine learning is such that it becomes more accurate over time, and we now have a great deal of “learning” under our belts. Our clients can use this to construct indicators from these analyses to assess risks in analyzing individual companies. Corporations can use the information to find discover things about their competitors, and even about themselves. 

Experiment: ESG Data for Emerging Market Financial Institution Issuers Using NLP 

In 2021, the International Finance Corp  (IFC), part of the World Bank, and Amundi Asset Management, published a report on artificial intelligence, ESG, and emerging markets. It covers a lot of territory but here we highlight an experiment in which they used NLP to detect sentiment about financial institutions issuing emerging market debt. While based on a fairly small sample, the results (summarized in the graphs below) were interesting:

In short, the experiment showed the NLP algorithm detected far more negative sentiment in documents that were not published by a given issuer than from its company reports.

Machine learning is far from perfect but it can be quite good at determining when a new input can lead to a better decision. NLP, deep learning, and other techniques can combine data obtained from different datasets, and reduce noise while retaining most of the “signal” (the valuable information). For those who want to do a deep dive, the WorldQuant article talks about other types of machine learning algorithms that could be used in this space.

At OWL ESG, we have been using machine learning for years, and continue to emphasize the importance of using a wide range of inputs to analyze ESG risks and opportunities. Contact us to learn more about how our data and analytics could help your firm.