This project utilizes k-means clustering to refine selection of stocks so as to replicate the risk and cash flow profile of S&P 500. The main objective is to harness clustering capabilities to attain a minimum tracking error portfolio with parsimonious number of positions, thus minimizing transaction costs as portfolio replication typically involves frequent rebalancing. We also consider the problem of modeling the portfolio return using Monte-Carlo approach.
Dataset
The dataset comprises historical price data obtained from the ARPM database, covering a diverse range of equities across various sectors.
Main goals
The project is designed to achieve the following key objectives:
Data loading and preprocessing: Load and preprocess historical price data to ensure data consistency and integrity for subsequent analyses.
k-Means clustering: Compute initial clustering using correlation of log returns and apply k-means clustering. Select stocks from each sector based on their distance from the cluster center.
Portfolio weights: Utilize convex programming to calculate weights of the portfolio that minimize the tracking error. We constrain the selection to no shorting and ignore transaction costs. We compute the tracking error and compare the expected returns from last few observations.
Extract invariants from portfolio risk driver: We identify the portfolio risk driver and assume it follows GARCH(1,1) process. We use the ARPM ‘arch_model’ to fit our data and using the estimated parameters we extract the invariants. Perform the ellipsoid invariance test on the results to assess the goodness of fit.
Estimation: We compute the conditional flexible probabilities and utilize the copula-marginal approach for estimating the joint distribution of the invariants.
Projection: We generate a total of 10,000 scenarios for the future shocks and from these compute the future risk driver scenarios using the GARCH(1,1) equations. Utilizing these future scenarios we compute and display the future portfolio forecast distribution.
Conclusion
This project tackles several challenges inherent in financial data analysis, with k-means clustering playing a pivotal role in portfolio creation. This analysis is especially important where execution costs have a significant impact on strategy performance, thus ensuring a robust and efficient portfolio.