Download

Abstract

In this research project, our team analyzed a Glassdoor data set comprising 3.4 million textual reviews and numerical ratings on US-listed firms across various industries. As Glassdoor is a US-based website where current and former employees anonymously review companies, we aim to extract trading signals utilizing the machine learning-based sentiment analysis framework on this domain-specific database. Firstly, we will explain the foundational aspects of our dataset tables and the findings of our exploratory data analysis processes. Afterward, we propose a BERT-based model, coupled with thematic sentiment analysis to mine alpha-predictive factors. Each of the generated factors is then integrated within specific mathematical transformations to generate signals in various trading strategies tested in our in-house self-implemented backtesting platform. In addition, we discuss the computing hardware requirements and model training time needed to reproduce our study.

Acknowledgments
  • We thank Dr. Yang You (HKU), Mr. Kevin Li (Tower Research Capital), and the teaching team for their guidance in this project.
  • We also thank the HKU HPC Laboratory for providing access to computing resources, a fundamental aspect of our project.
  • Finally, I’m very grateful for my groupmates, Jadon, Rhenald, and Charles.