Wednesday , 9 July 2025

Home TJEF Sunday Articles Leveraging Machine Learning for Predicting Market movements using Historical Data

Sunday Articles

Leveraging Machine Learning for Predicting Market movements using Historical Data

TJEF Tapmi3 September 202310 Mins read331 Views

Abstract

Due to the complexity and volatility of financial markets, predicting stock market movements has always been a difficult endeavour. The purpose of this research study is to create a prediction model that uses the Random Forest Classifier algorithm to anticipate the upward trend of the Nifty50 index. As input characteristics, the model uses two prominent technical indicators, Simple Moving Average (SMA) and Exponential Moving Average (EMA) ratios. The study used historical data from the Nifty50 index, as well as SMA and EMA as inputs for training the Random Forest Classifier model. These indicators are carefully chosen to represent the underlying trends and patterns that may impact the Nifty50 index’s upward trajectory. The findings show that the suggested model predicts the Nifty50 index’s upward trend with a 66% accuracy rate. This shows that when paired with the SMA and EMA ratios, the Random Forest Classifier might give useful insights into market movements and prospective investment opportunities. Furthermore, the findings demonstrate the Random Forest Classifier algorithm’s potential as an excellent tool for anticipating stock market movements, notably the Nifty50 index’s upward march. Furthermore, the study shows the use of SMA and EMA ratios as input characteristics, demonstrating their capacity to capture key market movements. Investors, financial institutions, and market analysts can use the findings of this study to make educated judgments and maximize their investment strategy.

Keywords: Random Forest Classifier, Nifty50, upward trend, prediction, SMA, EMA, technical indicators, stock market.

Introduction

For investors, traders, and financial analysts looking to maximize returns on investments, the forecasting of stock market movements has been a topic of significant interest. Accurately predicting market fluctuations, especially the rising trend of stock indexes, can offer insightful information and business prospects. In this study, we offer a predictive model to forecast the increasing trend of the Nifty50 index, a well-known stock market indicator in India. This model makes use of the Random Forest Classifier algorithm.

The top 50 firms listed on the National Stock Exchange of India (NSE) are represented by the Nifty50 index, which acts as a barometer for the state of the market as a whole. However, because of the inherent complexity and uncertainties related to financial markets, forecasting the future direction of such an index is a difficult endeavour. (Antonacci, (April 2013). )

The technique used is the Random Forest Classifier, which can handle high-dimensional data and capture intricate correlations between variables. With the help of several decision trees combined, this ensemble learning technique produces reliable and precise forecasts. We can take use of the patterns and linkages available in the data to forecast future upward moves more accurately by training the model on past data from the Nifty50 index.

FIGURE1: NIFTY50 INDEX (SOURCE: GOOGLE FINANCE)

The main objectives of this study are to analyse a model’s capacity for properly forecasting the upward trend of the Nifty50 index and to evaluate the efficacy of the suggested strategy. Our goal is to demonstrate the great potential of applying machine learning as a trustworthy tool for market forecasting by obtaining a high level of accuracy. This easy model can also be strengthened by include other technical indicators that have been appropriately adjusted as input for greater accuracy. (Mehtab, December 1, 2019)

The results of this study may have significant implications for traders, financial institutions, and investors. Making educated investment decisions, optimizing portfolio management, and strengthening general market analysis tactics may all be facilitated by the ability to anticipate the rising trend of the Nifty50 index with some degree of reasonable accuracy. We detail the data collecting and preprocessing methods, talk about the experimental setting, and then give the findings and analysis in the following sections of the research. Finally, we conclude with a discussion of the implications of our findings and possible directions for more stock market forecasting research.

Methodology

We use a systematic approach and 25 years’ worth of historical data to construct and assess our prediction model, which is used to estimate the rising trend of the Nifty50 index. Daily OHLC prices are included in the dataset for study. To make the model’s training, testing, and validation easier, we separate the dataset into several subsets. To teach the model, the underlying patterns and correlations connected to the Nifty50 index, we initially trained it using the first 10 years of data (Coqueret). Then, we compare the forecasts with the actual observed values to assess the model’s success by forecasting the rising trend of the Nifty50 index over the ensuing 11 years. Google Finance provided the data necessary to train and test the prediction model. The necessary financial data needed for the investigation was accessed and retrieved using the Google Sheets API.

FIGURE2: Illustration of the concept of chronological generalization for machine learning

The model is then trained using data from the following 10 years, advancing the analysis, and allowing it to adjust to changing market conditions. The rising trend of the Nifty50 index is then forecasted using this trained model for the 12th year, and its accuracy is evaluated by contrasting the predicted values with the corresponding actual values. We use the Random Forest Classifier method throughout the training and testing phases because of its reliability in handling high-dimensional data and identifying complicated correlations between variables. Simple Moving Average (SMA) and Exponential Moving Average (EMA) ratios are two popular technical indicators that we include as components in our forecasting model. To help the model comprehend and recognize patterns and probable price reversals in stock prices, SMA and EMA are changed as ratios with respective to the close price. Here are the formulas for calculating SMA and EMA

1.Simple Moving Average (SMA):

The SMA is a straightforward calculation that provides the average price of a security over a specific time period.

SMA = (Sum of closing prices over ‘n’ periods) / ‘n

Where,

‘n’ represents the number of periods (e.g., days, weeks, months) over which the average is calculated.
The sum of closing prices is obtained by adding the closing prices of the security for ‘n’ periods.

2.Exponential Moving Average (EMA):

The EMA places more weight on recent prices compared to older prices, making it more responsive to recent market movements.

EMA = (Closing price – EMA(previous day)) * (2 / (n + 1)) + EMA(previous day)

Where:

‘n’ represents the number of periods used in the EMA calculation.
The initial EMA value is usually set to the first closing price available or the SMA for ‘n’ periods.

The chosen time periods, which reflect various intervals for analysing the historical data of the Nifty50 index, were 2, 5, 60, 250, and 1000. After investigation, we found that the SMA and EMA ratios’ time periods had a substantial impact on how well the model predicted the rising trend of the Nifty50 index. The model may catch more instantaneous price changes and react fast to short-term trends when using shorter time periods, such 2 and 5. These shorter-term indicators were more susceptible to noise and volatility, which could have resulted in forecasts that were less precise. While smoothing out short-term swings, larger time periods, such as 60, 250, and 1000, offered a wider view on the general trend. The Nifty50 index’s long-term patterns and probable reversals were better captured by these indicators. Combining various time periods for SMA and EMA ratios allowed for a thorough study of the market’s short- and long-term tendencies. By considering many viewpoints and reducing the influence of transient price movements, this helped to enhance the model’s predictions.

We used performance measures like accuracy, precision to measure how accurately our model predicts the future. These metrics offer numerical evaluations of how well the model recognized the increasing trend of the Nifty50 index. We aim to develop a robust prediction model using this technique to anticipate the increasing trend of the Nifty50 index. We can evaluate the model’s adaptability and determine its accuracy in forecasting future market movements thanks to the iterative process of training on historical data and assessing on succeeding years.

Random Forest Classifier

The Random Forest Classifier is an ensemble learning technique that blends many decision trees to provide predictions. A fraction of the training data and a random selection of features are used to build each decision tree in the random forest. The final forecast is then made by averaging the predictions of each individual tree (Mehtab, December 1, 2019).

Unlike the SMA or EMA, the Random Forest Classifier does not use a single mathematical formula, but rather combines several important ideas and methods. The Random Forest Classifier involves the following primary steps:

Building Decision Trees: A portion of the training data is used to build each decision tree in the random forest. Random sampling is used to choose the subset, often using bootstrapping methods. Individual trees are produced using the decision tree method, such as CART (Classification and Regression Trees), based on the chosen subset.
Random Feature Selection: Each decision tree is built with a random selection of features that are considered at each split. This procedure aids in introducing unpredictability and lessens association between trees, increasing the random forest’s variety and resilience.
Voting or Averaging Predictions: Following the construction of all the decision trees, they all offer predictions on unknown data points. The random forest employs majority voting for classification problems, where each tree’s forecast is considered and the class receiving the most votes becomes the final prediction. The random forest averages the expected values from all the trees while doing regression tasks.
Ensemble Learning: The predictions from each individual decision tree are combined to create the final prediction of the random forest classifier. The generalization skills of the model are enhanced, and overfitting is reduced thanks to the ensemble learning technique.

The Nifty50 index’s rising trend is predicted using the Random Forest Classifier algorithm as the main predictive model. To train the model and provide predictions, the algorithm is used to the dataset, which consists of historical data and other financial indicators.

The Random Forest Classifier was used in this study in the following ways, specifically:

Building the Model: A portion of the historical data from the Nifty50 index is used to train the Random Forest Classifier. Along with other pertinent financial indicators, this training data contains elements like SMA and EMA ratios. Multiple decision trees are built by the algorithm utilizing various subsets of the training data and randomly chosen characteristics.
Predicting the Upward Trend: After the model has been trained, it is utilized to forecast the Nifty50 index’s upward trend. The trained model is given unobserved data points, which represent times during which the true rising trend is unknown. The decision trees’ forecasts are combined by the random forest ensemble to get a final forecast for the rising trend.
Evaluating Accuracy: By contrasting the projected values with the actual observed values, the accuracy of the Random Forest Classifier in forecasting the rising trend of the Nifty50 index is assessed. The percentage of accurate forecasts for the rising trend is used to construct the accuracy measure.

The purpose of using the Random Forest Classifier algorithm in this research is to take advantage of its capacity to recognize complicated associations and manage large datasets. To provide more reliable and precise forecasts about the increasing trend of the Nifty50 index, the model makes use of the ensemble of decision trees.

The parameters used for this model are

n_estimators: 200

This parameter specifies the number of decision trees to be created in the random forest. In this case, the random forest will consist of 200 decision trees.

min_samples_split: 50

This parameter sets the minimum number of samples required to split an internal node during the construction of each decision tree in the forest. If the number of samples at a node is less than min_samples_split, the node will not be split further, and it will become a leaf node.

random_state: 1

This parameter sets the random seed for reproducibility. By setting a specific value (in this case, 1), you ensure that the random forest classifier will produce the same results when trained and tested multiple times, given the same input data and settings.

Results and Discussion

Our predictive model began producing forecasts in October 2013 and trained itself to do so up to June 2023, in accordance with the adopted approach. The Nifty50 index had an upward trend throughout this ten-year period in around 53.96% of the trading sessions (2411). Our model correctly predicted 61.95% of the events that occurred on these positive days. A 66.1% accuracy rate was attained by the model for forecasts of the rising trend.

It is important to note that although though the model was only able to predict 61.95 percent of the days with an upward trend, the predictions it did make had a high accuracy rate of 66.1%.

Figure3: comparison of actual and prediction by the model

The results of this study are a compelling demonstration of the model’s enormous potential to provide extremely insightful and useful data, providing analysts and investors with important assistance in making solid and convincing decisions based on the indicated upward trajectories. It is important to keep in mind that although while this model achieves impressively high accuracy, its range is somewhat constrained by the rather little amount of data it presently uses. Therefore, it is necessary to improve and further develop this model.

One possible area for development to solve this constraint is to increase the number of technical indicators used by the model. It is possible to significantly improve the machine’s capacity for inference and meaningful conclusion-drawing by adding and adapting new technological indications. A news API must be used to integrate current market news for the model to reach its full potential. Real-time market data is added to the model’s knowledge base, enriching it while also ensuring that predictions and projections are based on the most recent data. The model’s forecasts can effectively enable market players to optimize their investment strategies through a thorough retraining process that incorporates these developments. Investors may be able to take advantage of opportunities, optimize profits, and move more quickly and confidently across the market’s changing terrain by relying on the forecasts made by the improved model.

In conclusion, the model’s shown skills provide a solid platform for further improvements in predictive analysis. It is anticipated that the model’s predictions would have a significant influence on market players when it has been improved, expanded the set of technical indicators, added real-time market data via a news API, and undergone extensive retraining. This improved approach has the potential to transform investment methods and pave the path for better success in the constantly changing financial landscape by opening previously undiscovered opportunities and facilitating strategic decision-making.

References

Coqueret, Guillaume, Persistence in Factor-Based Supervised Learning Models (November 1, 2021). Journal of Finance and Data Science
Antonacci, Gary, Absolute Momentum: A Simple Rule-Based Strategy and Universal Trend-Following Overlay (April 2013).
Burgess, Nicholas, Machine Earning – Algorithmic Trading Strategies for Superior Growth, Outperformance and Competitive Advantage (March 29, 2021). International Journal of Artificial Intelligence and Machine Learning.
Arnott, Robert D. and Harvey, Campbell R. and Markowitz, Harry, A Backtesting Protocol in the Era of Machine Learning (November 21, 2018)
A Mehtab, Sidra and Sen, Jaydip,,Robust Predictive Model for Stock Price Prediction Using Deep Learning and Natural Language Processing Proceedings of the 2019 International Conference on Business Analytics and Intelligence (ICBAI 2)

Honorary Mention in Journal Volume 8 Issue 1,

Author – Sundhara Pandiyan R

TAPMI