Investing using words

Published 16 April, 2025

A recent article in The Journal of Finance and Data Science introduces an innovative method for constructing investment instruments directly from financial reports — without the need for human intervention.

This novel approach employs dynamic topic modeling (DTM), a variant of Latent Dirichlet Allocation (LDA), to analyze annual and quarterly reports from companies, uncovering hidden risk factors and translating them into tradable indices.

"The beauty of this method lies in its simplicity and transparency; it combines several established algorithms to achieve what previously was not possible,” says co-author Marcel Lee. “By automating the process, we eliminate biases and provide a cost-effective alternative to traditional index construction."

This unsupervised technique automatically selects optimal parameters, discovering implicit risk factors through the semantic analysis of corporate publications, thereby creating a new class of investment instruments — thematic indices.

The study describes the model's capacity to dynamically track economic and industrial trends, illustrating that sectors considered static are in reality constantly evolving. This method captures the fluid nature of industries more accurately than traditional static classifications like GICS or ICB.

"We're observing the industrial landscape through a much sharper and multicoloured lens, enabling investors to tap into nuanced market themes and risk factors previously inaccessible," adds co-author Alan Spark.

In several cases, the research demonstrated that these newly created thematic indices closely mimic established indices, yet are derived without the predefined biases of manual classification systems. “This not only paves the way for a more unbiased benchmarking tool but also reveals industry trends and vocabulary shifts over time, offering a fresh perspective on sectoral dynamics,” says Lee.

One notable challenge acknowledged by the researchers is the approach’s reliance on a ‘bag-of-words’ model, which, while instrumental in parsing large datasets, overlooks the nuanced relationships between words. “Future iterations of this work aim to incorporate more complex models that capture these subtleties, potentially enhancing the predictive power of thematic indices on corporate actions and industry shifts,” shares Spark.

a schematic representation of the topic index algorithm

Contact author:

Marcel Alexander Lee, Birkbeck Business School, marcel.lee@alumni.lse.ac.uk

Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work in this paper.

See the article: Marcel Lee, Alan Spark, Unsupervised generation of tradable topic indices through textual analysis, The Journal of Finance and Data Science, Volume 11, 2025, 100149, ISSN 2405-9188, https://doi.org/10.1016/j.jfds.2025.100149

 

Back to News

Stay Informed

Register your interest and receive email alerts tailored to your needs. Sign up below.