Implementing machine learning primitives for data science application in investment banking

Implementing Machine Learning Primitives for Data Science Application in Investment Banking

Advisor: Prof. Debayan Gupta

Students: Aaryan Talreja

Machine Learning Financial Analysis

Abstract

This project presents a comprehensive AI/ML-based deal valuation intelligence system for mergers and acquisitions in India's pharmaceutical and healthcare sector. We developed a systematic framework to classify 150 M&A transactions (totaling $69.5 billion) as undervalued, fairly valued, or overvalued. A weighted z-score framework incorporating private company discounts serves as the domain-expert baseline. We benchmarked 17 implementations across three paradigms: 11 library methods (including XGBoost, Decision Tree, Naive Bayes achieving 96.7-100% accuracy), 5 from-scratch implementations (demonstrating algorithmic understanding with Random Forest at 75% accuracy), and 1 transformer model (ChatGPT achieving 87% agreement). XGBoost regression achieved R² = 0.553, exceeding targets by 38%. Perfect validation from multiple methods confirms the framework captures genuine valuation signals suitable for investment decision-making.

Project Description

The mergers and acquisitions landscape in India’s pharmaceutical and healthcare sector represents a significant investment opportunity, with 150 analyzed transactions totaling $69.5 billion over the 2020-2025 period. However, identifying mispriced deals remains challenging due to information asymmetry in private companies, complex valuation drivers including control premiums and buyer types, heterogeneous deal structures, and private company discounts affecting liquidity. Traditional valuation multiples such as EV/EBITDA, EV/Revenue, and P/E ratios provide a starting point but fail to account for deal-specific factors, creating opportunities for sophisticated investors who can systematically identify mispriced transactions.

This project develops a systematic framework to classify M&A deals into three categories: undervalued deals representing attractive investment opportunities trading below fair value, fairly valued transactions priced appropriately given their characteristics and market conditions, and overvalued deals trading at premiums unjustified by fundamentals. The ultimate goal is to create a production-ready intelligence system that can identify potentially mispriced transactions for investment decision-making, supporting deal sourcing and screening, valuation benchmarking, investment committee presentations, and portfolio monitoring.

The project successfully developed a comprehensive dataset of 150 M&A deals with 19 engineered features per transaction, implemented a weighted z-score baseline framework grounded in finance theory with four theoretical adjustment factors (control premium, strategic vs financial buyer premium, size-growth interaction, and private company discount), and benchmarked 17 model implementations across three paradigms: library methods, from-scratch implementations, and transformer models. The classification performance exceeded targets with perfect 100% accuracy from XGBoost and Decision Tree, excellent 96.7% accuracy from four additional methods (CatBoost, Random Forest, Naive Bayes, LightGBM), and strong regression performance with R² = 0.553, exceeding the target range of 0.35-0.40 by 38%. From-scratch implementations demonstrated algorithmic understanding with Random Forest achieving 75% accuracy, while the ChatGPT transformer model achieved 87% agreement with the z-score framework.

Methodology

The methodology comprises four key components: dataset development, weighted z-score framework, model implementations, and evaluation. The dataset was compiled from three primary sources (PrivateCircle, VCCEdge, and VC Circle) with rigorous quality controls including cross-validation across multiple sources and standardization to USD millions. Each transaction includes 19 engineered features spanning deal identifiers, financial metrics (investment amount, valuations, debt, revenue, EBITDA, net profit), and valuation multiples (EV/EBITDA, EV/Revenue, P/E ratio).

The weighted z-score framework builds on four theoretically-grounded adjustment factors derived from finance literature. The control premium adjustment ranges from +20% for majority control to +10% for significant influence and 0% for minority stakes, based on empirical studies documenting 20-40% historical premiums. The strategic buyer premium adds +12% for strategic buyers versus financial buyers, reflecting synergy realization potential and longer investment horizons. The size-growth interaction adjustment adds +18% for small-cap high-growth targets and +10% for medium-cap high-growth targets, based on research showing small-cap firms are 2.3x more likely to be acquired. The private company discount applies -20% for private companies, reflecting liquidity constraints and information asymmetry documented in empirical studies showing 13-40% discounts.

Due to data availability constraints at implementation time, a simplified version was applied using only the private company discount (-20%), with other factors set to 0% pending additional data collection. The z-score calculation follows a systematic sequence: calculate total adjustment, adjust industry benchmark, calculate z-score as (deal multiple - adjusted benchmark) / industry standard deviation, and classify deals based on thresholds (undervalued if z < -0.5, fairly valued if -0.5 ≤ z ≤ 0.5, overvalued if z > 0.5).

The project implemented 17 methods across three paradigms. Library implementations (11 methods) included gradient boosting family (XGBoost, CatBoost, LightGBM), ensemble methods (Random Forest), tree-based (Decision Tree), probabilistic (Naive Bayes), support vector machines (SVM), instance-based (K-Nearest Neighbors), neural networks (Multi-Layer Perceptron), clustering (K-Means), and linear models (Multiple Linear Regression). From-scratch implementations (5 methods) demonstrated algorithmic understanding by implementing Random Forest, Decision Tree, Naive Bayes, XGBoost, and CatBoost using only NumPy. The transformer implementation used ChatGPT/GPT-4 with zero-shot classification prompts incorporating deal-specific data and industry benchmarks.

Evaluation used a fixed 80/20 train/test split with 120 training deals and 30 held-out test deals, maintaining stratification for class distribution. Classification metrics included accuracy, precision, recall, and F1-score, while regression metrics included R-squared, RMSE, and MAE.

Key Outcomes

The project achieved exceptional performance across multiple dimensions, with perfect validation from multiple methods confirming the z-score framework captures genuine valuation signals. XGBoost and Decision Tree both achieved 100% accuracy on the test set, providing strong validation through algorithmic diversity from gradient boosting and single tree approaches. Four additional methods achieved 96.7% accuracy with only one misclassification each: CatBoost, Random Forest, Naive Bayes, and LightGBM, demonstrating multi-method consistency across diverse paradigms.

Regression analysis exceeded targets significantly, with XGBoost regression achieving R² = 0.553, surpassing the target range of 0.35-0.40 by 38%. This performance places the model at the upper bound of M&A valuation models in academic literature, with typical R² values ranging from 0.25-0.45. Feature importance analysis revealed EV/Revenue (8.8%), EBITDA (8.3%), P/E Ratio (8.3%), Revenue (7.9%), and Enterprise Value (7.5%) as the top five predictive features.

From-scratch implementations successfully demonstrated algorithmic understanding, with Random Forest achieving 75% accuracy (strong tier), Decision Tree at 66.7% (moderate tier), and Naive Bayes at 62.5% (moderate tier). The performance gaps for gradient boosting methods (XGBoost at 33.3% and CatBoost at 41.7%) highlight the complexity of production-grade optimization, regularization, and numerical stability techniques while still validating conceptual understanding.

The transformer model (ChatGPT/GPT-4) achieved 87% agreement with the z-score framework, showing a more conservative valuation stance with higher “Fairly Valued” classifications (46.7% vs 36.7% for z-score) and lower “Overvalued” classifications (25.3% vs 49.2% for z-score).

Practical Implications

The system has significant practical applications in investment banking. For deal sourcing and screening, it can systematically screen large deal flows, flag undervalued opportunities (14.2% of training set), identify overvalued deals requiring caution (49.2%), and prioritize analyst time on highest-probability targets. For valuation benchmarking, it provides context-aware benchmarks adjusted for private company discount and capable of incorporating control premiums and strategic versus financial buyer considerations in future iterations.

The system supports investment committees by offering data-driven validation with 96.7-100% agreement across top methods, regression model predictions of fair value (R²=0.553), and feature importance analysis identifying key valuation drivers. For portfolio monitoring, it enables ongoing assessment by reclassifying portfolio companies as markets evolve, identifying exit opportunities through valuation transitions, and monitoring relative positioning versus industry benchmarks.

Challenges and Future Directions

Current limitations include data availability constraints with the simplified framework using only the private company discount pending collection of stake percentages, buyer type classifications, and size/growth metrics. The small test set (N=30) creates wide confidence intervals (±18% at 95% confidence), requiring implementation of k-fold cross-validation and expanded test sets. From-scratch gradient boosting implementations show significant gaps (-55% to -67% versus library versions), highlighting the sophistication required for production-grade implementations.

Short-term future work (next 3 months) includes completing data collection for the full 4-factor framework, debugging from-scratch implementations to improve numerical stability, and enhancing statistical rigor through 5-fold cross-validation and confidence interval calculations. Medium-term work (3-6 months) involves expanding transformer analysis with documented model versions and multiple prompting strategies, advanced feature engineering with interaction terms and temporal features, and implementing SHAP analysis for feature importance and uncertainty quantification.

Long-term development (6-12 months) aims to build a production system with REST API deployment, web interface for deal assessment, automated reporting pipeline, and model monitoring with retraining. Generalization efforts will test the framework across other sectors (technology, financial services) and geographies (US, Europe, Asia), developing sector-specific adjustment factors. Advanced analytics will include time-series analysis of valuation trends, peer group identification via clustering, and deal outcome prediction for post-close success or failure.

References

The project builds on established M&A valuation theory and machine learning literature. Key references include Damodaran (2024) on control premiums and acquisition valuations, Dietz (2014) on estimating the pure control premium, Chang & Galvez (2013) on strategic and financial buyers in M&A transactions, Easterwood et al. (2024) on takeover likelihood and firm size published in the Journal of Financial and Quantitative Analysis, Goetz (2021) on private company discounts in business valuations, Van den Cruijce (2023) on valuation discounts for private companies, and Chen & Guestrin (2016) on XGBoost as a scalable tree boosting system from the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.