SeaMNF vs. LDA: Unveiling the Power of Short Text Mining in Financial Markets

Authors

  • Qian Zhang Tencent Inc., CHINA
  • Jiarui Rao Uber Technologies Inc., USA
  • Jiaqi Hong China Academy of Art, CHINA

DOI:

https://doi.org/10.5281/zenodo.14061466

Keywords:

PSO-SVR Hybrid Model, Machine Learning, Uncertainty Sentiment, Empirical Asset Pricing

Abstract

The objective of this study is to construct a time series forecasting framework that incorporates textual features. By leveraging text mining techniques, we extract thematic and sentiment information from a vast array of news headlines related to the future. These text-derived features are then utilized as exogenous variables for prediction purposes. This paper addresses two critical questions: why headlines over full articles and why futures news over gold news. News headlines are considered summaries of the full articles, encapsulating most of the essential information. Additionally, our approach aligns with the work of Li et al. [1,2,3,4,5] which opted for news headlines to extract topics and sentiment information. The choice of futures news over gold news is justified by the scarcity of crude oil news and the established complex correlations between futures prices such as gold, natural gas, and crude oil. Research by Sujit & Kumar (2011) suggests that gold price fluctuations can impact the WTI index, and the dependence of different countries on crude oil can influence their currency exchange rates, thereby affecting the purchasing power of gold. Villar & Joutz (2006) indicate that a 20% temporary shock to WTI has a 5% contemporaneous impact on natural gas prices.[6,7,8,9]

We construct a daily topic strength index by following the SeaMNF approach, which allows us to calculate the probability of each headline belonging to each topic. The optimal number of topics is selected based on Pointwise Mutual Information (PMI) scores. Given the vast number of news articles published daily by media outlets, we compute the average weight of news as the topic strength for the day. The topic strength index for day t is defined as the sum of the weights of the first topic across all news articles published on that day.[10,11,12,13,14,15]

Downloads

Download data is not yet available.

Downloads

Published

2024-10-30
CITATION
DOI: 10.5281/zenodo.14061466
Published: 2024-10-30

How to Cite

Qian Zhang, Jiarui Rao, & Jiaqi Hong. (2024). SeaMNF vs. LDA: Unveiling the Power of Short Text Mining in Financial Markets. International Journal of Engineering and Management Research, 14(5), 76–82. https://doi.org/10.5281/zenodo.14061466

Similar Articles

<< < 2 3 4 5 6 7 8 9 10 11 > >> 

You may also start an advanced similarity search for this article.