Jet tagging with more-interaction particle transformer

Yifan Wu; Kun Wang; Congqiao Li; Huilin Qu; Jingya Zhu

doi:10.1088/1674-1137/ad7f3d

Abstract：

In this paper, we introduce the More-Interaction Particle Transformer (MIParT), a novel deep-learning neural network designed for jet tagging. This framework incorporates our own design, the More-Interaction Attention (MIA) mechanism, which increases the dimensionality of particle interaction embeddings. We tested MIParT using the top tagging and quark-gluon datasets. Our results show that MIParT not only matches the accuracy and AUC of LorentzNet and a series of Lorentz-equivariant methods, but also significantly outperforms the ParT model in background rejection. Specifically, it improves background rejection by approximately 25% with a signal efficiency of 30% on the top tagging dataset and by 3% on the quark-gluon dataset. Additionally, MIParT requires only 30% of the parameters and 53% of the computational complexity needed by ParT, proving that high performance can be achieved with reduced model complexity. For very large datasets, we double the dimension of particle embeddings, referring to this variant as MIParT-Large (MIParT-L). We found that MIParT-L can further capitalize on the knowledge from large datasets. From a model pre-trained on the 100M JetClass dataset, the background rejection performance of fine-tuned MIParT-L improves by 39% on the top tagging dataset and by 6% on the quark-gluon dataset, surpassing that of fine-tuned ParT. Specifically, the background rejection of fine-tuned MIParT-L improves by an additional 2% compared to that of fine-tuned ParT. These results suggest that MIParT has the potential to increase the efficiency of benchmarks for jet tagging and event identification in particle physics.

HTML

I. INTRODUCTION

Jet identification has become a key area where machine learning is applied in high-energy physics, making significant progress in the past few years [1, 2]. Jets are collimated sprays of particles produced in high-energy collisions, typically from quarks, gluons, or hadronic decay of heavy particles. The process known as jet tagging, which involves identifying the particle that initiated the jet, is complex and challenging. This complexity arises because the initial particle evolves into a jet through multiple stages, increasing the number of particles within the jet and obscuring the characteristics of the initiating particle.

By analyzing the constituents of a jet, it is possible to determine the type of particle that initiated the jet. This identification is critical for revealing fundamental physical processes and discovering new particles. Initially, jet tagging relied heavily on the quantum chromodynamics (QCD) theory, which provided methodologies for distinguishing between quark and gluon jets [3−9]. With the advent of machine learning, a variety of new jet tagging methods have been introduced that exploit different machine learning models to increase the breadth and accuracy of previous techniques [10−15]. Recent advances in deep learning have further refined jet tagging methods, allowing modern algorithms to effectively process large and complex datasets. These algorithms are suitable for identifying subtle patterns that differentiate various types of jets, significantly improving the accuracy and efficiency of jet tagging [16−20]. The exceptional ability of deep learning to manage large datasets has been instrumental in these advances, leading to the discovery of new physical phenomena and deepening our understanding of particle interactions.

Jet tagging has undergone many changes over the years. Initially, traditional methods relied heavily on features designed by experts based on physical principles. The introduction of machine learning brought more advanced approaches, starting with the concept of jet images. These images, representing pixelated depictions of the energy deposited by particles in a detector, constituted a pivotal development in the field. The earliest application of jet images dates back to 1991, when Pumplin introduced the idea of representing jets as images [21]. Subsequent studies, starting around 2014, were inspired by computer vision. These studies used techniques such as Fisher's Linear Discriminant, originally used in face recognition technology, to improve jet tagging [10]. By 2015, deep neural networks (DNNs) were being applied to top tagging [11]; later, convolutional neural networks (CNNs) were widely adopted in jet tagging [12−15, 22], demonstrating significant improvements in jet tagging performance.

In 2016, sequence-based representations began to gain traction in the field of jet tagging, using recurrent neural networks (RNNs) to process ordered data. This period gave rise to significant advancements with the pioneering use of Long Short-Term Memory (LSTM) networks for classification purposes [23]. Subsequently, Gated Recurrent Units (GRUs) were also used for event topology classification, further extending the applications of RNNs in this domain [18]. Simultaneously, an innovative approach combining CNNs and LSTMs, known as DeepJet, was developed. This hybrid model significantly improved the performance of quark-gluon tagging [24]. Additionally, several studies using RNNs introduced new methods and insights [25, 26]. These methods have successfully overcome the limitations associated with input size in jet tagging, providing a more flexible approach for analyzing and using jet data.

In 2017, the introduction of graph-based representations using graph neural networks (GNNs) was a significant leap forward in jet tagging [27]. Subsequently, GNNs began to be widely used in particle identification, greatly expanding the capabilities in the field [28−31]. This broad application of GNNs has opened new avenues for accurately classifying and understanding complex particle interactions.

In 2018, the exploration of point cloud representations, which treat jets as unordered sets of particles, gave rise to notable advancements. Komiske et al. introduced the concept of Energy Flow Networks (EFNs), which can deal with variable-length unordered particle sets effectively [32]. This method exploits the "Deep Sets" concept, developed by Zaheer et al.[33] in 2017 , which treats jets specifically as sets of particles and represents a significant advance in jet tagging. Crucially, it made the algorithms permutation-invariant, thereby enhancing their capability to represent complex particle interactions.

In 2019, Qu et al. introduced ParticleNet [34], building on the Dynamic Graph Convolutional Neural Network (DGCNN) framework developed by Wang et al. [35] in 2018. ParticleNet, which also treats jets as unordered sets of particles, was a significant advancement in the field. In 2022, Qu et al. further extended their contributions by developing the Particle Transformer (ParT) [36], which is based on the Transformer architecture [37]. By incorporating pairwise particle interaction inputs, it significantly improved jet tagging performance. Furthermore, the introduction of a new large-scale dataset, JetClass, enabled pre-training of the ParT model, thereby reaching even higher performance.

However, currently, the most efficient jet tagging models, namely pre-trained ParT models, not only require pre-training, but also have a significant number of parameters. In addition, other transformer-based jet taggers fail to outperform the DGCNN-based ParticleNet owing to an insufficient number of jets in the training samples. This indicates that transformer-based models are effective at exploiting larger training datasets by using the attention mechanism. Moreover, we observed that pairwise particle interaction inputs play a crucial role in ParT. Therefore, we aimed to construct a transformer-based jet tagging model with an increased focus on particle interaction inputs, aiming for optimal results without pre-training.

In this paper, we propose a new jet tagging method based on the Transformer architecture, called More-Interaction Particle Transformer (MIParT). We enhanced the algorithm of ParT by modifying the attention mechanism and increasing the embedding dimensions of the pairwise particle interaction inputs while reducing the total number of parameters and computational complexity. We tested MIParT on two widely used jet tagging benchmarks and found that it exhibits higher performance than existing methods. Additionally, to address the challenges posed by very large datasets, we doubled the particle embedding dimensions to construct a larger model. We pre-trained this enhanced model on the 100M JetClass dataset before fine-tuning it on smaller datasets. This approach showed measurable performance gains over fine-tuned ParT, indicating the efficacy of our modifications.

The remainder of this manuscript is organized as follows. In Sec. II, we provide an overview of various deep-learning models and specifically focus on the architecture of MIParT. In Sec. III, we detail the conducted experimental process and extensively discuss the results obtained from our analysis. In Sec. IV, we end the paper by summarizing the main conclusions and discussing their implications for future research in this area.

IV. CONCLUSIONS

In this paper, we propose a novel deep-learning approach for jet tagging called MIParT that increases the dimensionality of particle interaction embeddings through More-Interaction Attention (MIA) to better exploit particle interaction inputs. We tested our model on two popular datasets and compared it with other models:

● On the Top Tagging Dataset: The MIParT model achieves accuracy and AUC metrics nearly identical to those of LorentzNet, and its Rej_50% and Rej_30% metrics are comparable within the error range to LorentzNet. Moreover, a series of Lorentz-equivariant methods demonstrate similar performance to that of LorentzNet. The MIParT model significantly outperforms ParT in the top tagging benchmark, achieving approximately 25% better background rejection at a 30% signal efficiency. Among the models evaluated, MIParT, along with LorentzNet and other Lorentz-equivariant-based models, ranks at the top tier, consistently delivering top-tier performance and robustness. For the fine-tuned MIParT-L model pre-trained on the 100M JetClass dataset, a 39% enhancement in background rejection performance is achieved, comparable to that of fine-tuned ParT.

● On the Quark-gluon Dataset: The MIParT model significantly outperforms LorentzNet across all metrics, including accuracy, AUC, Rej_50%, and Rej_30%, as well as other models. MIParT achieves the best performance across all evaluation metrics, improving background rejection power by approximately 3% compared to ParT. For the fine-tuned MIParT-L model, background rejection performance improves by 6%, outperforming fine-tuned ParT. Specifically, the background rejection of fine-tuned MIParT-L improves by an additional 2% compared to that of fine-tuned ParT.

Overall, MIParT outperforms ParT on both the top and quark-gluon tagging tasks while also exhibiting lower computational complexity and fewer parameters. Previously, it was generally assumed that transformer-based models required large-scale dataset pre-training to achieve optimal results. Our MIParT model demonstrates that with higher-dimensional particle interaction embeddings, top-tier performance can be achieved without pre-training on large datasets, even outperforming ParT.

Furthermore, given that pre-training ParT on the larger multi-class JetClass dataset and subsequently fine-tuning it on the top tagging dataset can enhance performance, we applied this approach to MIParT-L in this study. We found that MIParT-L can further capitalize on the knowledge from large datasets, showing superior capabilities after fine-tuning. Specifically, it performs better on the quark-gluon dataset than fine-tuned ParT. Finding more efficient approaches to fine-tune a base transformer model will be especially helpful for future experiments when generic and foundation models are deployed, and downstream application tasks are varied. Moreover, MIParT is not limited to jet tagging but can also be applied to event identification, which could be notably helpful in the search for new physics signals.

Reference (48)

[1]	A. J. Larkoski, I. Moult, and B. Nachman, Phys. Rept. 841, 1 (2020), arXiv: 1709.04464[hep-ph] doi: 10.1016/j.physrep.2019.11.001
[2]	M. Feickert and B. Nachman, arXiv: 2102.02770[hep-ph]
[3]	J. Gallicchio and M. D. Schwartz, Phys. Rev. Lett. 107, 172001 (2011), arXiv: 1106.3076[hep-ph] doi: 10.1103/PhysRevLett.107.172001
[4]	J. Gallicchio and M. D. Schwartz, JHEP 04, 090 (2013), arXiv: 1211.7038[hep-ph] doi: 10.1007/JHEP04(2013)090
[5]	A. J. Larkoski, J. Thaler, and W. J. Waalewijn, JHEP 11, 129 (2014), arXiv: 1408.3122[hep-ph] doi: 10.1007/JHEP11(2014)129
[6]	B. Bhattacherjee, S. Mukhopadhyay, M. M. Nojiri et al., JHEP 04, 131 (2015), arXiv: 1501.04794[hep-ph] doi: 10.1007/JHEP04(2015)131
[7]	D. Ferreira de Lima, P. Petrov, D. Soper et al., Phys. Rev. D 95, 034001 (2017), arXiv: 1607.06031[hep-ph] doi: 10.1103/PhysRevD.95.034001
[8]	P. Gras, S. Höche, D. Kar et al., JHEP 07, 091 (2017), arXiv: 1704.03878[hep-ph] doi: 10.1007/JHEP07(2017)091
[9]	C. Frye, A. J. Larkoski, J. Thaler et al., JHEP 09, 083 (2017), arXiv: 1704.06266[hep-ph] doi: 10.1007/JHEP09(2017)083
[10]	J. Cogan, M. Kagan, E. Strauss et al., JHEP 02, 118 (2015), arXiv: 1407.5675[hep-ph] doi: 10.1007/JHEP02(2015)118
[11]	L. G. Almeida, M. Backović, M. Cliche et al., JHEP 07, 086 (2015), arXiv: 1501.05968[hep-ph] doi: 10.1007/JHEP07(2015)086
[12]	L. de Oliveira, M. Kagan, L. Mackey et al., JHEP 07, 069 (2016), arXiv: 1511.05190[hep-ph] doi: 10.1007/JHEP07(2016)069
[13]	P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, JHEP 01, 110 (2017), arXiv: 1612.01551[hep-ph] doi: 10.1007/JHEP01(2017)110
[14]	G. Kasieczka, T. Plehn, M. Russell et al., JHEP 05, 006 (2017), arXiv: 1701.08784[hep-ph] doi: 10.1007/JHEP05(2017)006
[15]	S. Macaluso and D. Shih, JHEP 10, 121 (2018), arXiv: 1803.00107[hep-ph] doi: 10.1007/JHEP10(2018)121
[16]	A. Butter, et al., SciPost Phys. 7, 014 (2019), arXiv: 1902.09914[hep-ph] doi: 10.21468/SciPostPhys.7.1.014
[17]	M. Kagan, (2020), arXiv: 2012.09719[physics.data-an]
[18]	R. T. de Lima, arXiv: 2102.06128[physics.data-an]
[19]	H. Kheddar, Y. Himeur, A. Amira et al., arXiv: 2403.11934[hep-ph]
[20]	S. Mondal and L. Mastrolorenzo, arXiv: 2404.01071[hep-ex]
[21]	J. Pumplin, Phys. Rev. D 44, 2025 (1991) doi: 10.1103/PhysRevD.44.2025
[22]	J. Lin, M. Freytsis, I. Moult, and B. Nachman, JHEP 10, 101 (2018), arXiv: 1807.10768[hep-ph] doi: 10.1007/JHEP10(2018)101
[23]	D. Guest, J. Collado, P. Baldi et al., Phys. Rev. D 94, 112002 (2016), arXiv: 1607.08633[hep-ex] doi: 10.1103/PhysRevD.94.112002
[24]	E. Bols, J. Kieseler, M. Verzetti et al., JINST 15, P12012 (2020), arXiv: 2008.10519[hep-ex] doi: 10.1088/1748-0221/15/12/P12012
[25]	G. Louppe, K. Cho, C. Becot et al., JHEP 01, 057 (2019), arXiv: 1702.00748[hep-ph] doi: 10.1007/JHEP01(2019)057
[26]	T. Cheng, Comput. Softw. Big Sci. 2, 3 (2018), arXiv: 1711.02633[hep-ph] doi: 10.1007/s41781-018-0007-y
[27]	I. Henrion, K. Cranmer, J. Bruna et al., Proceedings of the Deep Learning for Physical Sciences Workshop at NIPS (2017), (2017).
[28]	M. Abdughani, J. Ren, L. Wu, and J. M. Yang, JHEP 08, 055 (2019), arXiv: 1807.09088[hep-ph] doi: 10.1007/JHEP08(2019)055
[29]	J. Arjona Martínez, O. Cerri, M. Pierini et al., Eur. Phys. J. Plus 134, 333 (2019), arXiv: 1810.07988[hep-ph] doi: 10.1140/epjp/i2019-12710-3
[30]	J. Ren, L. Wu, and J. M. Yang, Phys. Lett. B 802, 135198 (2020), arXiv: 1901.05627[hep-ph] doi: 10.1016/j.physletb.2020.135198
[31]	X. Ju et al., 33rd Annual Conference on Neural Information Processing Systems (2020), arXiv: 2003.11603[physics.ins-det]
[32]	P. T. Komiske, E. M. Metodiev, and J. Thaler, JHEP 01, 121 (2019), arXiv: 1810.05165[hep-ph] doi: 10.1007/JHEP01(2019)121
[33]	M. Zaheer, S. Kottur, S. Ravanbakhsh et al., arXiv: 1703.06114[cs.LG]
[34]	H. Qu and L. Gouskos, Phys. Rev. D 101, 056019 (2020), arXiv: 1902.08570[hep-ph] doi: 10.1103/PhysRevD.101.056019
[35]	Y. Wang, Y. Sun, Z. Liu et al., arXiv: 1801.07829[cs.CV]
[36]	H. Qu, C. Li, and S. Qian, Proceedings of the 39th International Conference on Machine Learning, 18281 (2022), arXiv: 2202.03772[hep-ph]
[37]	A. Vaswani, N. Shazeer, N. Parmar et al., 31st International Conference on Neural Information Processing Systems (2017), arXiv: 1706.03762[cs.CL]
[38]	H. Touvron, M. Cord, A. Sablayrolles et al., IEEE/CVF International Conference on Computer Vision (ICCV), 32 (2021), arXiv: 2103.17239[cs.CV]
[39]	S. Shleifer, J. Weston, and M. Ott, arXiv: 2110.09456[cs.CL]
[40]	A. Paszke, S. Gross, F. Massa et al., Advances in neural information processing systems 32, (2019), arXiv: 1912.01703[cs.LG]
[41]	F. A. Dreyer and H. Qu, JHEP 03, 052 (2021), arXiv: 2012.08526[hep-ph] doi: 10.1007/JHEP03(2021)052
[42]	S. Gong, Q. Meng, J. Zhang et al., JHEP 07, 030 (2022), arXiv: 2201.08187[hep-ph] doi: 10.1007/JHEP07(2022)030
[43]	D. Ruhe, J. Brandstetter, and P. Forré, arXiv: 2305.11141[cs.LG]
[44]	A. Bogatskiy, T. Hoffman, D. W. Miller et al., arXiv: 2211.00454[hep-ph]
[45]	J. Spinner, V. Bresó, P. de Haan et al., arXiv: 2405.14806[physics.data-an]
[46]	V. Mikuni and F. Canelli, Mach. Learn. Sci. Tech. 2, 035027 (2021), arXiv: 2102.05073[physics.data-an] doi: 10.1088/2632-2153/ac07f6
[47]	V. Mikuni and F. Canelli, Eur. Phys. J. Plus 135, 463 (2020), arXiv: 2001.05311[physics.data-an] doi: 10.1140/epjp/s13360-020-00497-3
[48]	A. Tumasyan, et al. (CMS), Phys. Rev. Lett. 131, 061801 (2023), arXiv: 2205.05550[hep-ex] doi: 10.1103/PhysRevLett.131.061801

Category	Variable	TOP	QG	JC
	$ \Delta\eta $	*	*	*
	$ \Delta\phi $	*	*	*
	log $ p_{\rm{T}} $	*	*	*
Kinematics	log E	*	*	*
	$ \log {p_{\rm{T}}}/{p_{\rm{T}}{\rm(jet)}} $	*	*	*
	$ \log { E}/{ E{\rm(jet)}} $	*	*	*
	$ \Delta R $	*	*	*
	charge		*	*
	Electron		*	*
Particle	Muon		*	*
identification	Photon		*	*
	Charged Hadron		*	*
	Neutral Hadron		*	*
	$ \tanh d_0 $			*
Trajectory	$ \tanh d_z $			*
displacement	$ \sigma_{d_0} $			*
	$ \sigma_{d_z} $			*

	Accuracy	AUC	Rej_50%	Rej_30%
PFN	—	0.9819	247±3	888±17
P-CNN	0.930	0.9803	201±4	759±24
PCT	0.940	0.9855	392±7	1533±101
CGENN	0.942	0.9869	500	2172
PELICAN	0.9426	0.9870	—	—
L-GATr	0.9417	0.9868	548±26	2148±106
LorentzNet	0.942	0.9868	498±18	2195±173
ParticleNet	0.940	0.9858	397±7	1615±93
ParT	0.940	0.9858	413±16	1602±81
MIParT (ours)	0.942	0.9868	505±8	2010±97
ParT f.t.	0.944	0.9877	691±15	2766±130
MIParT-L f.t. (ours)	0.944	0.9878	640±10	2789±133

	Accuracy	AUC	Rej_50%	Rej_30%
PFN	—	0.9052	37.4±0.7	—
ABCNet	0.840	0.9126	42.6±0.4	118.4±1.5
PCT	0.841	0.9140	43.2±0.7	118.0±2.2
LorentzNet	0.844	0.9156	42.4±0.4	110.2±1.3
ParT	0.849	0.9203	47.9±0.5	129.5±0.9
MIParT (ours)	0.851	0.9215	49.3±0.4	133.9±1.4
ParT f.t.	0.852	0.9230	50.6±0.2	138.7±1.3
MIParT-L f.t. (ours)	0.853	0.9237	51.9±0.5	141.4±1.5

	TOP	QG	Params	FLOPs
PFN	—		86.1k	4.62M
P-CNN	0.930	—	354k	15.5M
ParticleNet	0.940	—	370k	540M
ParT	0.940	0.849	2.14M	340M
MIParT (ours)	0.942	0.851	720.9k	180M
MIParT-L f.t. (ours)	0.944	0.853	2.38M	368M

Jet tagging with more-interaction particle transformer

Abstract：

References

Access

Article Metrics

Metrics

通讯作者: 陈斌, bchen63@163.com

Email This Article

Jet tagging with more-interaction particle transformer

Corresponding author: Kun Wang, kwang@usst.edu.cn

HTML

A. Particle Attention Block

B. MI-Particle Attention Block

C. Class Attention Block

D. Implementation

目录

	All classes		$ H\to b \bar{b} $	$ H\to c \bar{c} $	$ H\to g g $	$ H\to 4 q $	$ H\to \ell \nu q q' $	$ t\to b q q' $	$ t\to b \ell \nu $	$ W\to q q' $	$ Z\to q q' $
	Accuracy	AUC	Rej_50%	Rej_50%	Rej_50%	Rej_50%	Rej_99%	Rej_50%	Rej_99.5%	Rej_50%	Rej_50%
ParticleNet (2 M)	0.828	0.9820	5540	1681	90	662	1654	4049	4673	260	215
ParticleNet (10 M)	0.837	0.9837	5848	2070	96	770	2350	5495	6803	307	253
ParticleNet (100 M)	0.844	0.9849	7634	2475	104	954	3339	10526	11173	347	283
ParT (2 M)	0.836	0.9834	5587	1982	93	761	1609	6061	4474	307	236
ParT (10 M)	0.850	0.9860	8734	3040	110	1274	3257	12579	8969	431	324
ParT (100 M)	0.861	0.9877	10638	4149	123	1864	5479	32787	15873	543	402
MIParT-L (2 M)	0.837	0.9836	5495	1940	95	819	1778	6192	4515	311	242
MIParT-L (10 M)	0.850	0.9861	8000	3003	112	1281	3650	16529	9852	440	336
MIParT-L (100 M)	0.861	0.9878	10753	4202	123	1927	5450	31250	16807	542	402