Application of Supervised Machine Learning to the Search for New Physics in ATLAS data: A Study of Ordinary Dense, Parametrized and Ensemble Networks and their Application to High Energy Physics

William Hirst:

This thesis explores a diverse array of Machine Learning (ML) models as they search for chargino-neutralino pair production in three-lepton final states with missing transverse momentum. The study is based on a data set of sqrt(s) = 13 TeV proton-proton collisions recorded with the ATLAS detector at the LHC, corresponding to an integrated luminosity of 139 fb−1. The ML models applied in the study were three variants of Deep Neural Networks (DNN), and Boosted Decision Trees (BDT). The DNN variants included an ordinary dense Neural Network (NN), Parameterized Neural Network (PNN) and ensemble models utilizing pattern-specific pathways created by competing neurons. In the latter variant I included a novel layer introduced in this thesis, the Stochastic-Channel-Out (SCO).

Time and place: May 31, 2023 4:00 PM – 6:00 PM, Center for Computing in Science Education (Physics building 4th floor, eastern wing)

The study included an analysis of how each model attained sensitivity when training on a diverse data set including several orthogonal Beyond Standard Model (BSM) variants, specifically different masses for the chargino and neutralino. A study was also made on individual attributes of each model, for example the sparse pathways of the ensemble methods or the effect of the choice of parameters in the PNN.

In my studies I found that the inclusion of multiple signal variants can be beneficial during training of an ML model in the case that the variants exhibit overlapping feature distributions. This is specifically true if the model displays a strong long-term memory, as the models utilizing sparse pathways were found to do.

When comparing each model in their ability to attain sensitivity, I found that the PNN exhibited a preference towards high statistic signal which allowed it to attain impressive sensitivity in low mass regions.

On the contrary, the ensemble methods, which did not attain the same level of sensitivity on low mass signals, were able to achieve a far more balanced sensitivity for signal in both high and low mass regions.

I found that performing a Principal Component Analysis (PCA) on the dataset led to an improved sensitivity of the ensemble methods and the PNN for a majority of the mass combinations. When comparing the expected sensitivity of the models to that achieved by ATLAS I found that none of the models were able to extend the established exclusion limit on the masses of the chargino or neutralino.

Further improvements to the results could be achieved by more extensively studying the output from each model, especially the ensemble networks, which showed good sensitivity in non-excluded regions.

-------

As you might know, some of our master's students are about to defend their thesis soon. Until then, we are arranging a series of 8 weekly open sessions to practice for their presentation and share their research with the rest of the department.

The presentations shall have the typical 30-minute exposition + a round of questions of around 15 minutes. It would be significant to have a good number of PhDs and individuals interested in the field so that the question round can be engaging and serve as good practice for the student. Make sure to attend if the topic seems interesting.

Pizza will be served.

Published June 5, 2023 1:37 PM - Last modified June 5, 2023 1:40 PM