-
The purely leptonic decays of charmed mesons,
$ D^+\to l^+\nu_l\ (l=e, \ \mu, \ \tau) $ , offer an ideal laboratory for testing the Standard Model. These decays also offer one of the simplest and best understood probes of the$ c\to d $ quark flavour-changing transition [1−4]. However, these decays are subject to helicity suppression, with decay rates proportional to the square of the lepton mass. Consequently, the$ D^+\to e^+\nu_e $ decay rate is highly suppressed compared to its muonic [5, 6] and tauonic [5, 7] counterparts, with a branching fraction (BF) expected to be approximately$ 10^{-8} $ , beyond the current experimental sensitivity [8, 9].In the radiative leptonic decay
$ D^+\to\gamma e^+\nu_e $ , photon emission mitigates helicity suppression, with the strong interaction engaged solely in a single hadronic external state. Theoretically, its differential decay width can be expressed as a function of the photon energy$ E_\gamma $ in the rest frame of the$ D^+ $ meson [10] via$ \begin{aligned}[b] \frac{{\rm{d}}\Gamma}{{\rm{d}}E_\gamma}(D^+\to \gamma l^+\nu_l)=\;&\frac{\alpha_{em}G^2_{\rm F} \left[F_V^2(E_\gamma)+F_A^2(E_\gamma)\right]}{6\pi^2}\\ &\times\left|V_{cd}\right|^2E_\gamma^3m_{D^+}\left(1-\frac{2E_\gamma}{m_{D^+}}\right), \end{aligned} $
(1) where
$ \alpha_{em} $ is the fine-structure constant,$G_{\rm F}$ is the Fermi coupling constant,$ \left|V_{cd}\right| $ is the Cabibbo-Kobayashi-Maskawa matrix element [11, 12],$ m_{D^+} $ is the nominal mass of the$ D^+ $ meson [13], and$ F_V $ and$ F_A $ are the form factors characterizing the non-perturbative quantum chromodynamics (QCD) dynamics. Various theoretical models have been proposed to investigate the$ D^+\to\gamma e^+\nu_e $ decay, including quark models [14−16], perturbative QCD [17], lattice QCD [18], and QCD factorization approaches [10, 19, 20]. Most theoretical predictions on its BF,$ {\cal{B}}(D^+\to\gamma e^+\nu_e) $ , are on the order of 10-5, as summarized in Table 1.Model ${\cal{B}}\ (\times10^{-5})$ Light front quark model [14] 0.69 Non-relativistic quark model [15] 0.46 Relativistic quark model [16] 3.34 Perturbative QCD [17] $8.2\pm6.5$ Lattice QCD [18] $0.09\pm 0.04$ QCD factorization [19] 2.81 QCD factorization [20] 1.92 QCD factorization [10] $\left(1.88^{+0.36}_{-0.29}, \, 2.31^{+0.65}_{-0.54}\right)$ BESIII 2017 [21] <3.0 Table 1. Theoretical predictions and experimental result for
${\cal{B}}(D^+\to\gamma e^+\nu_e)$ .The only experimental result on
$ D^+\to\gamma e^+\nu_e $ was reported from the BESIII experiment [21], based on 2.9$ {\rm{fb}}^{-1} $ of$ e^+e^- $ annihilation data taken at the center-of-mass energy$ \sqrt{s}=3.773\; \, {\rm{GeV}} $ . The upper limit on the partial BF of$ D^+\to\gamma e^+\nu_e $ with$ E_\gamma>10 $ MeV was determined to be$ 3.0\times10^{-5} $ at 90% confidence level (C.L.), approaching the range of theoretical predictions [10, 16−20]. A larger dataset comprising 20.3$ {\rm{fb}}^{-1} $ of$ e^+e^- $ annihilation data recently collected at BESIII [22−24], which includes the previous 2.9$ {\rm{ fb}}^{-1} $ data, provides an excellent opportunity to further constrain these theoretical models.The major challenge in searching for
$ D^+\to\gamma e^+\nu_e $ is the substantial background contamination from$ D^+ $ semileptonic decays, such as$ D^+\to\pi^0e^+\nu_e $ ,$ D^+\to K^0_Le^+\nu_e $ , and$ D^+\to K^0_S(\to\pi^0\pi^0)e^+\nu_e $ . Photons originating from$ \pi^0 $ decays or long-lived neutral hadrons can deposit energy in the electromagnetic calorimeter, potentially mimicking the radiative photon signal. Conventional cut-based methods have struggled to establish effective discriminators to suppress these backgrounds, whose yields are two orders of magnitude greater than the potential signal.An unprecedentedly powerful signal identification tool can be accomplished using deep learning, utilizing a deep neural network (DNN), to distinguish signals from backgrounds. Leveraging flexible data representation and modern algorithms, this method exhibits powerful capabilities in recognizing and interpreting underlying relationships and hidden patterns within the topological structure of
$ e^+e^- $ annihilation events.In this paper, we present an improved search for
$ D^+\to\gamma e^+\nu_e $ based on the analysis of the aforementioned 20.3$ {\rm{ fb}}^{-1} $ data, enhancing signal sensitivity through a deep-learning-based signal identification method. Throughout this paper, charge conjugation is implied by default. -
The BESIII detector [25] records symmetric
$ e^+e^- $ collisions provided by the BEPCII storage ring [26] in the center-of-mass energy range from 1.84 to 4.95 GeV, with a peak luminosity of$ 1.1 \times 10^{33}\;{\rm{cm}}^{-2}{\rm{s}}^{-1} $ achieved at$ \sqrt{s} = 3.773\;{\rm{GeV}} $ . BESIII has collected large data samples in this energy region [27−29]. The cylindrical core of the BESIII detector covers 93% of the full solid angle and consists of a helium-based multilayer drift chamber (MDC), a time-of-flight system (TOF), and a CsI(Tl) electromagnetic calorimeter (EMC), which are all enclosed in a superconducting solenoidal magnet providing a 1.0 T magnetic field. The solenoid is supported by an octagonal flux-return yoke with resistive plate counter muon identification modules interleaved with steel. The charged-particle momentum resolution at 1 GeV/c is 0.5%, and the dE/dx resolution is 6% for electrons from Bhabha scattering. The EMC measures photon energies with a resolution of 2.5% (5%) at 1 GeV in the barrel (end cap) region. The time resolution in the plastic scintillator TOF barrel region is 68 ps, while that in the end cap region is 110 ps. The end cap TOF system was upgraded in 2015 using multigap resistive plate chamber technology, currently providing a time resolution of 60 ps, which benefits 85% of the data used in this analysis [30−32].Monte Carlo (MC) simulated data samples produced with a
${\small{\rm{GEANT}}}$ 4-based [33] software package, which includes the geometric description of the BESIII detector and its response, were used in this study to determine detection efficiencies and estimate backgrounds. A signal MC sample of the decay$ D^+\to\gamma e^+\nu_e $ was simulated according to Eq. (1), where the form factors were taken from Ref. [20]. The minimum energy of the radiative photon was set at 10 MeV to avoid the infrared divergence for soft photons. In the simulation, we modeled the beam energy spread and initial state radiation (ISR) in the$ e^+e^- $ annihilations with the generator${\small{\rm{KKMC}}}$ [34, 35]. The inclusive MC sample encompassed the production of$ D\bar{D} $ pairs (including quantum coherence for the neutral D channels), non-$ D\bar{D} $ decays of$ \psi(3770) $ , ISR production of the$ J/\psi $ and$ \psi(3686) $ states, and continuum processes incorporated in${\small{\rm{KKMC}}}$ [34, 35]. All particle decays were modeled with${\small{\rm{EVTGEN}}}$ [36, 37] using BFs either taken from the Particle Data Group [13], when available, or otherwise estimated with${\small{\rm{LUNDCHARM}}}$ [38, 39]. Final state radiation from charged final state particles was incorporated using the${\small{\rm{PHOTOS}}}$ package [40]. -
Taking advantage of the
$ D^+D^- $ pair production in$ e^+e^- $ annihilation at$ \sqrt{s}=3.773 $ GeV, we employed a double-tag (DT) analysis method [41] to constrain the kinematics of$ D^+\to\gamma e^+\nu_e $ and suppress non-$ D^+D^- $ backgrounds. The$ D^- $ meson was first reconstructed through six specific hadronic decay modes:$ K^+\pi^-\pi^- $ ,$ K^+\pi^-\pi^-\pi^0 $ ,$ K^0_S\pi^- $ ,$ K^0_S\pi^-\pi^0 $ ,$ K^0_S\pi^+\pi^-\pi^- $ , and$ K^+K^-\pi^- $ . Here,$ K^0_S $ was reconstructed with a$ \pi^+\pi^- $ pair, and$ \pi^0 $ was reconstructed from two photons. These decay modes are referred to as tag modes, and the selected events are called single-tag (ST) events. In the presence of the ST$ D^- $ meson, the signal decay$ D^+\to \gamma e^+\nu_e $ was searched for within its recoil system, with selected events referred to as DT events. The BF of the signal decay was determined from$ {\cal{B}}_{\rm{sig}}=\frac{N_{\rm{DT}}}{\sum_i\left(N_{\rm{ST}}^{i}\epsilon_{\rm{DT}}^i/\epsilon^i_{\rm{ST}}\right)}, $
(2) where
$ N_{\rm{DT}} $ represents the sum of DT yields for all tag modes, and$ N^i_{\rm{ST}} $ ,$ \epsilon^i_{\rm{ST}} $ , and$ \epsilon^i_{\rm{DT}} $ are the ST yield, ST efficiency, and DT efficiency for the tag mode i, respectively.For reconstructing the ST
$ D^- $ meson, charged tracks detected in the MDC were required to be within a polar angle (θ) range of$ |\cos\theta|<0.93 $ , where θ is defined with respect to the z-axis, which is the symmetry axis of the MDC. For charged tracks not originating from$ K_S^0 $ decays, the distance of closest approach to the interaction point (IP) was required to be less than 10 cm along the z-axis,$ V_{z} $ , and less than 1 cm in the transverse plane,$ V_{xy} $ . Particle identification (PID) for charged tracks combines measurements of the energy loss in the MDC (dE/dx) and the flight time in the TOF system to form likelihoods$ {\cal{L}}(h)\; (h=K, \pi) $ for each hadron h hypothesis. The charged kaons or pions not originating from$ K_S^0 $ were required to satisfy$ {\cal{L}}(K)>{\cal{L}}(\pi) $ or$ {\cal{L}}(\pi)>{\cal{L}}(K) $ , respectively.The
$ K_{S}^0 $ candidates were reconstructed from two oppositely charged tracks satisfying$ V_{z}< $ 20 cm. The two charged tracks were assigned as$ \pi^+\pi^- $ without imposing further PID criteria. They were constrained to originate from a common vertex and required to have an invariant mass in the range of (487, 511) MeV/c2 corresponding to approximately$ \pm3\sigma $ of the detection resolution. The decay length of the$ K^0_S $ candidate was required to be greater than twice the vertex resolution away from the IP.Photon candidates were identified using showers in the EMC. The deposited energy of each shower was required to exceed 25 MeV in the barrel region (
$|\cos \theta| < 0.80$ ) and 50 MeV in the end cap region ($ 0.86 <|\cos \theta|< 0.92 $ ). To exclude showers that originated from charged tracks, the angle subtended by the EMC shower and the position of the closest charged track at the EMC were required to exceed 10 degrees as measured from the IP. To suppress electronic noise and showers unrelated to the event, the difference between the EMC time and the event start time was required to be within [0, 700] ns. The$ \pi^0 $ mesons were reconstructed using a kinematic fit from all combinations of two photon candidates by constraining their invariant mass to the nominal$ \pi^0 $ mass [13]. The invariant mass of two photons before the kinematic fit was required to be in the range of (115, 150) MeV/c2. The combinations with both photons from end cap EMC regions were rejected because of poor resolution.After selecting the component particles, we reconstructed the ST
$ D^- $ meson using their combinations. Two kinematic variables, the energy difference$ \Delta E $ , and the beam-constrained mass$ M_{\rm{BC}} $ were calculated for signal extraction as follows:$ \begin{aligned} \Delta E&=E_{D^-}-E_{\rm{beam}}, \end{aligned} $
(3) $ \begin{aligned} M_{\rm{BC}}&=\sqrt{E^2_{\rm{beam}}/c^4-\left|\vec{p}_{D^-}\right|^2/c^2}, \end{aligned} $
(4) where
$ E_{\rm{beam}} $ is the beam energy, and$ \vec{p}_{D^-} $ and$ E_{D^-} $ are the momentum vector and energy of the$ D^- $ candidate in the$ e^+e^- $ rest frame, respectively. If multiple combinations were present in an event, the combination with the smallest$ |\Delta E| $ value was retained for further analysis for each tag mode and charge. To suppress combinatorial backgrounds, mode-dependent requirements on$ \Delta E $ were imposed on the accepted candidates within an approximate range of$ \pm3.5\sigma $ for the detection resolution, as listed in Table 2.Tag mode $\Delta E$ /MeV$N_{\rm{ST}}^{i}$ (×103)$\epsilon_{\rm{ST}}^{i}$ (%)$\epsilon_{\rm{DT}}^{i}$ (%)$K^+\pi^-\pi^-$ (−25, 24) $5552.8\pm2.5$ 51.10 10.78 $K^0_S\pi^-$ (−25, 26) $656.5\pm0.8$ 51.42 10.69 $K^+\pi^-\pi^-\pi^0$ (−57, 46) $1723.7\pm1.8$ 24.40 5.42 $K^0_S\pi^-\pi^0$ (−62, 49) $1442.4\pm1.5$ 26.45 5.57 $K^0_S\pi^-\pi^-\pi^+$ (−28, 27) $790.2\pm1.1$ 29.59 6.62 $K^+K^-\pi^-$ (−24, 23) $481.4\pm0.9$ 40.91 8.96 Table 2.
$\Delta E$ requirements, ST yields in data, and ST and DT efficiencies for each tag mode in data. Note that the efficiencies do not include the BFs of subsequent$\pi^0$ and$K^0_S$ decays.The ST yields were obtained through maximum likelihood fits to the
$ M_{\rm{BC}} $ distributions after applying the aforementioned requirements, as shown in Fig. 1. In each fit, the signal was described by the shape obtained from MC simulation, convolved with a double-Gaussian function to take into account resolution differences between data and MC simulation results. The background was described by an ARGUS function [42] with the endpoint fixed at the beam energy and other parameters left free. A signal region$ M_{\rm{BC}}\in(1.863, \ 1.877)$ MeV/c2, corresponding to approximately$ \pm3\sigma $ of the detection resolution, was set to further reject the backgrounds. The ST efficiencies were estimated by analyzing the inclusive MC sample with the same method as in data, dividing the fitted yields by the numbers of generated events. Table 2 summarizes the ST yields in data and ST efficiencies for each tag mode.Figure 1. (color online) Fits to the
$M_{\rm{BC}}$ distributions of ST$D^-$ candidates for each tag mode in data. The black dots with error bars represent data. The blue curves are the fit results. The red dotted curves are the fitted combinatorial background shapes. The yellow histograms are the MC simulated background scaled to the luminosity of data. The red arrows indicate the signal region.The signal candidates for
$ D^+\to\gamma e^+\nu_e $ were selected from the remaining charged tracks and showers, excluding those associated with the ST$ D^- $ meson. We required only one track left with charge opposite to that of the ST$ D^- $ meson, following the same selection criteria as those in the tag side. This track was required to be identified as a positron combining the PID information from dE/dx, the TOF system, and the EMC, satisfying$ {\cal{L}}(e)>0.001 $ and$ {\cal{L}}(e)/[{\cal{L}}(e)+{\cal{L}}(\pi)+ {\cal{L}}(K)]>0.8 $ for the combined confidence levels. We also required that at least one photon candidate satisfied the same selection criteria used in the tag side. If multiple photon candidates passed the selection, the most energetic one was chosen as the signal radiative photon. To improve the kinematic resolution of the positron, final-state-radiation recovery was performed by adding the momenta of any remaining photon candidates back to the reconstructed momentum of positron candidate if their opening angles with respect to the IP were less than$ 5^\circ $ .To derive information about the missing neutrino, a kinematic observable
$ U_{\rm{miss}} $ was defined as$ U_{\rm{miss}}=E_{\rm{miss}}-\left|\vec{p}_{\rm{miss}}\right|c, $
(5) where
$ E_{\rm{miss}} $ and$ \vec{p}_{\rm{miss}} $ represent the total energy and momentum vector of all missing particles in the event. The missing energy was calculated as$ E_{\rm{miss}}=E_{\rm{beam}}-E_{e^+}-E_{\gamma} $ , where$ E_{e^+} $ and$ E_{\gamma} $ are the energies of signal positron and photon, respectively. The missing momentum was calculated as$ \vec{p}_{\rm{miss}}=\vec{p}_{D^+}-\vec{p}_{e^+}-\vec{p}_{\gamma} $ , where$ \vec{p}_{D^+} $ ,$ \vec{p}_{e^+} $ , and$ \vec{p}_{\gamma} $ are the momenta of signal$ D^+ $ meson, positron, and photon, respectively. The$ D^+ $ momentum was further constrained as$ \vec{p}_{D^+}=-\dfrac{\vec{p}_{\rm{ST}}}{\left|\vec{p}_{\rm{ST}}\right|}\sqrt{E^2_{\rm{beam}}-m^2_{D^+}} $ , where$ \vec{p}_{\rm{ST}} $ is the momentum of the ST$ D^- $ meson. The consequent analysis was performed within$ U_{\rm{miss}}\in(-0.2, \ 0.2) $ GeV.We also explored
$ D^+\to\pi^0e^+\nu_e $ as a validation channel in this study. For reconstructing$ D^+\to\pi^0e^+\nu_e $ events, additional DT selection criteria were applied after those mentioned above. At least one$ \pi^0 $ candidate was required, and the one with minimum$ \chi^2 $ from the kinematic fit was chosen as the signal$ \pi^0 $ candidate. Another kinematic observable,$ U'_{\rm{miss}} $ , was defined as$ U'_{\rm{miss}}=E'_{\rm{miss}}-\left|\vec{p'}_{\rm{miss}}\right|c, $
(6) where
$ E'_{\rm{miss}}=E_{\rm{beam}}-E_{e^+}-E_{\pi^0} $ and$ \vec{p'}_{\rm{miss}}=\vec{p}_{D^+}-\vec{p}_{e^+}-\vec{p}_{\pi^0} $ were calculated using the energy and momentum of the signal$ \pi^0 $ candidate, respectively.Figure 2(a) presents the
$ U_{\rm{miss}} $ distributions for both data and inclusive MC samples after applying the aforementioned selections. The major backgrounds are from$ D^+\to\pi^0e^+\nu_e $ ,$ D^+\to K^0_Le^+\nu_e $ , and$ D^+\to K^0_S(\to\pi^0\pi^0)e^+\nu_e $ . Notably, the$ D^+\to\pi^0e^+\nu_e $ background exhibits a non-trivial shape within the signal region. The high background level makes it difficult to reliably extract the signal yield from the$ U_{\rm{miss}} $ distribution with adequate sensitivity, thereby requiring further signal identification treatments, as explained below.Figure 2. (color online) (a)
$U_{\rm{miss}}$ distribution of the signal decay$D^+\to\gamma e^+\nu_e$ and (b)$U'_{\rm{miss}}$ distribution of the validation channel$D^+\to\pi^0 e^+\nu_e$ after DT selection. The black dots with error bars represent data. The histograms with filled colors represent different background components in the inclusive MC sample. The red hollow histogram indicates an arbitrarily scaled signal shape. -
The main challenge in distinguishing signals from massive backgrounds lies in extracting distinctive features from the data. Notably, the shower patterns observed in the EMC provide key information for this task. Background events involving a
$ \pi^0 $ typically result in a higher number of photon candidates than the signal, whereas$ K^0_L $ -induced showers often exhibit more dispersed cluster shapes compared to those from photons [43]. Additionally, the kinematics of the particles, influenced by the unique phase spaces and dynamics of their decays, offer another aspect of differentiation. Conventional cut-based methods characterize such distinctions via hand-crafted variables derived from practical understanding of the involved physics, such as the number of$ \pi^0 $ -like photon combinations, lateral moment of photon candidates [21], and$ \chi^2 $ values from kinematic fits under specific final-state hypotheses [44]. In contrast, deep learning methods aim to process minimally refined data at the level of fundamental detector responses to the penetrations of the final state particles. A DNN is expected to automatically explore underlying correlations and recognize distinctive information beyond prior knowledge.In this study, the dataset for DNN training was constructed by randomly shuffling signal and background events sourced from relevant MC samples. The events were categorized into three groups with equal statistics:
$ D^+\to\gamma e^+\nu_e $ signal,$ D^+\to\pi^0 e^+\nu_e $ background, and other inclusive backgrounds. The training dataset contained approximate 3.3 million events in total, with 80% used for training and the remaining 20% reserved for validation. The DNN input information from an event included all charged tracks reconstructed in the MDC and isolated showers clustered in the EMC. These particles were organized as a point cloud [45] structure, i.e., an unordered and variable-sized set of points in a high-dimensional feature space. For each charged track, the features comprised the azimuthal and polar angles in the center-of-mass frame, charge, magnitude of momentum, and parameters characterizing its helical trajectory in the MDC. Additionally, low-level measurements from the MDC, TOF, and EMC were incorporated as implicit information for particle identification and reconstruction quality. For each shower, the features comprised the azimuthal and polar angles, energy deposition, count of fired crystals, time measurement, and parameters describing its expansion scope among nearby crystals. A full list of input features is provided in Table 3; these features were found to be consistent between data and MC simulation results.Variable Definition Track Shower θ azimuthal angle √ √ ϕ polar angle √ √ p momentum magnitude √ – $ {\mathtt{charge}} $ electric charge √ – $ d_{\rho} $ distance in the $ x-y $ plane from the point of closest approach (POCA) of the track to the IP√ – $ d_{Z} $ distance along z-axis from the POCA of the track to the IP √ – $ \phi_0 $ azimuthal angle of the track at the POCA √ – κ scaled inverse of the momentum of the track √ – $ \tan\lambda $ tangent value of the angle of the track at the POCA with respect to the $ x-y $ plane√ – $ \chi_{{\rm{d}}E/{\rm{d}}x}(e) $ deviation in $ {\rm{d}}E/{\rm{d}}x $ measurement of the track from the anticipated positron value√ – $ \chi_{{\rm{d}}E/{\rm{d}}x}(\pi) $ deviation in $ {\rm{d}}E/{\rm{d}}x $ measurement of the track from the anticipated pion value√ – $ \chi_{{\rm{d}}E/{\rm{d}}x}(K) $ deviation in $ {\rm{d}}E/{\rm{d}}x $ measurement of the track from the anticipated kaon value√ – $ \chi_{\rm{TOF}}(\pi) $ deviation in TOF measurement of the track from the anticipated pion value √ – $ \chi_{\rm{TOF}}(K) $ deviation in TOF measurement of the track from the anticipated kaon value √ – $ Q_{\rm{TOF}} $ pulse height of the track in TOF corresponding to its deposited energy √ – $ \beta_{\rm{TOF}} $ velocity of the track measured by TOF √ – $ E_{\rm{raw}} $ total deposited energy in EMC √ √ $ E_{\rm{seed}} $ deposited energy of the crystal at the shower center √ √ $ E_{3\times3} $ sum of energies of $ 3\times3 $ crystal matrices surrounding the shower center√ √ $ E_{5\times5} $ sum of energies of $ 5\times5 $ crystal matrices surrounding the shower center√ √ $ N_{\rm{hits}} $ count of fired crystals in the shower √ √ $ {\mathtt{time}} $ EMC time measurement √ √ $ {\mathtt{secondary}} $ $ {\mathtt{moment}} $ shower expansion parameter defined as $ \sum_{i}E_{i}r^2_{i}/\sum_{i}E_{i} $ √ √ $ {\mathtt{lateral}} $ $ {\mathtt{moment}} $ shower expansion parameter defined as $ \sum_{i=3}^{n} E_{i}r_{i}^2/(E_{1}r_{0}^2+E_{2}r_{0}^2+\sum_{i=3}^{n}E_{i}r_{i}^2) $ √ √ $ A_{20} $ $ {\mathtt{moment}} $ shower expansion parameter defined as $ \sum_{i}\dfrac{E_{i}}{E_{tot}}f_{2, 0}(\dfrac{r_{i}}{R_{0}}) $ ,$ f_{2, 0}(x)=2x^2-1 $ √ √ $ A_{42} $ $ {\mathtt{moment}} $ same but defined as $ \left|\sum_{i}\dfrac{E_{i}}{E_{tot}}f_{4, 2}(\dfrac{r_{i}}{R_{0}})e^{2i\phi}\right| $ ,$ f_{4, 2}(x)=4x^4-3x^2 $ √ √ Table 3. Input particle feature variables and their availability in tracks and showers. If measurements were unavailable from the detector readouts, the default value was set to zero.
The architecture of the DNN used in this study largely inherits from the Particle Transformer (ParT) [46], with adaptations tailored for the BESIII experiment. A key innovation of the ParT is the injection of particle physics knowledge into the Transformer network [47], where a set of pairwise interaction features are calculated using the energy-momentum vectors for each pair of particles and incorporated as biases U in the self-attention mechanism. This treatment efficiently captures the kinematic relationships between particles, significantly improving the ParT performance. For the BESIII application, we considered its unique properties, such as a purely spherical coordinate system and energy-momentum conservation in 3D space, by designing a new set of pairwise interaction features
$ {\bf{U}}_{\rm{BES}} $ defined as$ {\bf{U}}_{\rm{BES}}=\left\{ \begin{aligned} &1-\cos\theta_{a, b}, \\ &{\rm{min}}\left(E_a, \, E_b\right)\left(1-\cos\theta_{a, b}\right), \\ &{\rm{min}}\left(E_a, \, E_b\right)/\left(E_a+E_b\right), \\ &\left(E_a+E_b\right)^2-\lVert\vec{p}_a+\vec{p}_b\rVert^2, \end{aligned} \right. $
(7) where
$ E_a $ and$ E_b $ are the energies of the two particles,$ \vec{p}_a $ and$ \vec{p}_b $ are their momentum vectors, and$ \theta_{a, b} $ is the opening angle between these vectors in the center-of-mass frame. For the model hyperparameters, this adaptation includes four particle attention blocks and two class attention blocks. It uses a particle embedding encoded from the input particle features via a 3-layer multilayer perceptron with 128, 256, and 128 nodes in each layer and an interaction embedding encoded from the input pairwise features using a 4-layer pointwise 1D convolution with 64, 64, 64, and 8 channels. A dropout rate of 0.1 was applied to prevent overfitting.The training of the DNN was performed on the aforementioned dataset for 50 epochs with a batch size of 512 and an initial learning rate of 0.001. An additional treatment was employed during training to address the potential correlation between the DNN output and the
$ U_{\rm{miss}} $ distribution, which could inadvertently create signal-like structures in the$ U_{\rm{miss}} $ distribution of remaining background events after applying the selection criteria on the DNN output [48−51]. To decouple the DNN output from the$ U_{\rm{miss}} $ observable, we employed an iterative approach by assigning a weight$ \omega(U_{\rm{miss}}) $ to each background event which adjusts its contribution to the loss function. Events with higher weights have a greater impact on the loss function, which means that they are better classified during training. The weight$ \omega_i^j(U_{\rm{miss}}^j) $ for an event j in the i-th iteration is calculated as$ \omega_i^j(U_{\rm{miss}}^j)=\omega_{i-1}^j(U_{\rm{miss}}^j)\left[ \left.\frac{p^{\rm{BKG}}_{i-1}(U_{\rm{miss}})}{p^{\rm{BKG}}_{\rm{orig}}(U_{\rm{miss}})}\right|_{U_{\rm{miss}}=U_{\rm{miss}}^j} \right], $
(8) where
$ \omega_0^j(U_{\rm{miss}}^j)=1 $ ,$ p^{\rm{BKG}}_{i-1}(U_{\rm{miss}}) $ represents the normalized probability density function for the remaining background shape after the$ (i-1) $ -th iteration, and$ p^{\rm{BKG}}_{\rm{orig}}(U_{\rm{miss}}) $ denotes the original background shape before applying the DNN output veto. For each iteration, the DNN was re-trained, and both$ p^{\rm{BKG}}_{i-1}(U_{\rm{miss}}) $ and$ \omega_i^j(U_{\rm{miss}}^j) $ were updated. Intuitively, the process converged when$ p^{\rm{BKG}}_{i-1} (U_{\rm{miss}}) $ approached$ p^{\rm{BKG}}_{\rm{orig}}(U_{\rm{miss}}) $ , resulting in a trivial background shape that facilitates signal extraction. In practice, as shown in Fig. 3, this iterative approach partially corrects the$ D^+\to\pi^0 e^+\nu_e $ background shape while successfully retrieving the original one for other inclusive background shapes. This result demonstrates both its effectiveness and limitations in addressing the mass correlation issue. Furthermore, another machine-learning technique called model ensemble was employed. With some randomness factors incorporated in the training, such as network initialization, batch processing sequence, and dropout [52], we trained a total of 20 DNNs in parallel in each iteration. The outputs of these DNNs were averaged for each event during inference, creating an ensemble DNN that offers better robustness and generalization than any single DNN model.Figure 3. (color online) Normalized (a)
$D^+\to\pi^0 e^+\nu_e$ and (b) other inclusive background shapes in the$U_{\rm{miss}}$ distribution during the iteration process.The DNN output assigns three scores in the range [0, 1] to each event, reflecting the probabilities of the event belonging to the signal,
$ D^+\to\pi^0 e^+\nu_e $ background, and other background categories. Given that the sum of the three scores always equals one, only two of them are independent. As the final step in the event selection, we required the score for the$ D^+\to\pi^0 e^+\nu_e $ background to be less than 0.15 and that for other backgrounds to be less than 0.05. These cut values were optimized using the Punzi significance metric,$ \epsilon/(1.5 + \sqrt{B}) $ [53], where$ \epsilon $ represents the signal efficiency and B is the expected background yield. Figure 4(a) shows the$ U_{\rm{miss}} $ distribution after implementing the deep learning approach. The backgrounds were reduced by more than two orders of magnitude at the cost of approximately two-thirds of the signal efficiency, significantly enhancing the sensitivity of the signal search.Figure 4. (color online) (a)
$U_{\rm{miss}}$ distribution of the signal decay$D^+\to\gamma e^+\nu_e$ and (b)$U'_{\rm{miss}}$ distribution of the validation decay$D^+\to\pi^0 e^+\nu_e$ after the DNN vetoes. The black dots with error bars represent data. The histograms with filled colors represent different background components in the inclusive MC sample. The red hollow histogram indicates an arbitrarily scaled signal shape. -
According to Eq. (2), the DT efficiencies for each tag mode,
$ \epsilon^i_{\rm{DT}} $ , were estimated using signal MC samples; they are summarized in Table 2. To obtain the DT yield$ N_{\rm{DT}} $ in data, a maximum likelihood fit was performed on the$ U_{\rm{miss}} $ distribution combining all the tag modes. The fit included three components, namely the signal,$ D^+\to\pi^0 e^+\nu_e $ background, and other backgrounds. The shapes for these components were taken from the corresponding MC simulations, and their yields were all floated in the fit. The fit resulted in$ N_{\rm{DT}}=12.2\pm8.6 $ and${\cal{B}}(D^+\to\gamma e^+\nu_e)= (0.54\pm 0.38)\times 10^{-5}$ , as shown in Fig. 5(a), where the uncertainties are statistical only. Given that no significant signal was observed, an upper limit on the signal BF was set after accounting for all systematic uncertainties.Figure 5. (color online) Fits to (a) the
$U_{\rm{miss}}$ distribution of the signal decay$D^+\to\gamma e^+\nu_e$ and (b)$U'_{\rm{miss}}$ distribution of the validation decay$D^+\to\pi^0 e^+\nu_e$ . The points with error bars represent data. The blue line represents the fitted curve. The red dashed area represents a signal component, and other dashed lines represent background components. -
A common concern in adopting the deep learning method in particle physics experiments is the coupling between potential analysis biases and the sophisticated DNN structure, which makes these biases difficult to validate, quantify, and calibrate. The extensive and diverse datasets collected at BESIII facilitate various control samples in data to investigate these issues. In this study, our primary background process
$ D^+\to\pi^0e^+\nu_e $ served as a natural control sample. By partially reversing the DNN vetoes, i.e., to require the score for the$ D^+\to\pi^0 e^+\nu_e $ background to be greater than 0.15 while keeping the score for other backgrounds below 0.05, we could isolate a$ D^+\to\pi^0e^+\nu_e $ sample with a purity exceeding 98% and measure its absolute BF. Figures 2(b) and 4(b) show the$ U'_{\rm{miss}} $ distributions before and after applying the partially reversed DNN vetoes, with significant subtraction of backgrounds other than$ D^+\to\pi^0e^+\nu_e $ .To measure the absolute BF of
$ D^+\to\pi^0e^+\nu_e $ , we used the value of$ \epsilon^i_{\rm{DT}} $ estimated from MC simulations and extracted$ N_{\rm{DT}} $ via the maximum likelihood fit shown in Fig. 5(b). In the fit, the$ D^+\to\pi^0e^+\nu_e $ signal was modeled using the MC simulated shape convolved with a Gaussian resolution function. The other background component was described using the inclusive MC shape with the yield floated. The final fit to all data resulted in$ N_{\rm{DT}}= 3004\pm59 $ and$ {\cal{B}}(D^+\to \pi^0 e^+\nu_e)=(3.629\; \pm0.071)\; \times10^{-3} $ , where the uncertainty was statistical only. This result is consistent with previous BESIII measurements,$ (3.63\; \pm0.08_{\rm{stat.}}\; \pm 0.05_{\rm{syst.}})\times10^{-3} $ [54], based on approximately one seventh of full data, thereby validating the reliability of our DNN method. The residual deviations were considered a source of systematic uncertainty, as detailed in the next section. -
Systematic uncertainties in BF determination stem from various sources; many of them are related to the ST selection being canceled, benefiting from the DT technique. The remaining systematic sources can be grouped into two categories: multiplicative uncertainties, which affect the efficiency, and additive uncertainties, which impact the signal yield. These two types of uncertainties contribute to the upper limit of the BF through different approaches.
The multiplicative uncertainty sources include the ST yield, positron tracking, positron PID, photon reconstruction, MC model, and DNN vetoes, as summarized in Table 4. The uncertainty due to the ST yield was estimated by varying the descriptions of signal and background components in the
$ M_{\rm{BC}} $ fits and was set to 0.3%. A control sample of radiative Bhabha scattering events was used to study the position tracking and PID uncertainties, and the values for both were set to 1.0% [55]. The photon reconstruction uncertainty was set to 1.0% using a control sample of$ J/\psi\to\pi^+\pi^-\pi^0, \, \pi^0\to\gamma\gamma $ [56]. The uncertainty related to MC modeling was defined as the difference between the signal efficiencies of the nominal MC model and an alternative single-pole model [19]; it was set to 9.1%.Source Size (%) ST yield 0.3 Positron tracking 1.0 Positron PID 1.0 Photon reconstruction 1.0 MC model 9.1 DNN vetoes 5.8 Total 10.9 Table 4. Multiplicative systematic uncertainties.
Special consideration was given to systematic uncertainties related to the DNN vetoes [57]. Following some frontier surveys [13, 58], we considered two primary uncertainty sources: model uncertainty and domain shift. Model uncertainty arises from our limited knowledge of the best model and was quantified by the relative shift in signal efficiency across the 20 DNNs trained during the model ensemble process. The standard deviation of these efficiencies, divided by their average, 5.3%, was taken as the corresponding systematic uncertainty. Domain shift describes the mismatch between datasets for training and inference, reflecting potential data-MC inconsistencies. This uncertainty was evaluated using the control sample of
$ D^+\to\pi^0 e^+\nu_e $ described in Section VI, with the deviation between our BF measurements and previous BESIII measurements [54] calculated to be 2.2%, combining the center value shift and$ 1\sigma $ statistical uncertainty. In total, the multiplicative systematic uncertainty was determined to be 10.9% by adding the above sources in quadrature.The additive systematic uncertainties arise from the signal,
$ D^+\to\pi^0 e^+\nu_e $ background, and other background shapes used in the DT fit. These effects were evaluated by testing alternative descriptions of the shapes. One variation is in using parameterized functions, where the signal and$ D^+\to\pi^0 e^+\nu_e $ background shapes were modeled by double-Gaussian functions, and the other background shape was modeled by a third-order Chebyshev polynomial. Another change was the use of MC-simulated shapes before the final iteration of the mass decorrelation process as well as across the 20 DNN predictions in the model ensemble process. -
Given that no significant signal was observed in data, the upper limit on
$ {\cal{B}}(D^+\to\gamma e^+\nu_e) $ was set using a Bayesian method described in Ref. [59]. We performed a series of maximum likelihood fits as described in Section V, with the signal yield$ N_{\rm{DT}} $ fixed to each scanned value. The corresponding maximum likelihood values were used to construct a discrete likelihood distribution,$ {\cal{L}}({\cal{B}}_{\rm{sig}}) $ . The systematic uncertainties were incorporated in two steps: first, the additive uncertainties were addressed by varying the DT fit method, with the most conservative upper limit retained. Next, the multiplicative uncertainties were considered by smearing the likelihood distribution as follows [60]:$ {\cal{L}}'({\cal{B}}_{\rm{sig}})\propto\int^{1}_{0}{\cal{L}}\bigg({\cal{B}}_{\rm{sig}}\cdot\frac{\epsilon}{\epsilon_{0}}\bigg){\rm e}^{-\frac{(\epsilon-\epsilon_{0})^{2}}{2\sigma_{\epsilon}^{2}}}{\rm{d}}\epsilon, $
(9) where
$ \epsilon_{0} $ is the nominal signal efficiency, and$ \sigma_{\epsilon} $ is the multiplicative uncertainty corresponding to the efficiency value. By integrating the$ {\cal{L}}'({\cal{B}}_{\rm{sig}}) $ curve up to 90% of the area for$ N_{\rm{DT}}>0 $ and calculating the corresponding BF using Eq. (2), we set the upper limit on the BF of$ D^+\to\gamma e^+\nu_e $ at 90% C.L. to be$ 1.2\times10^{-5} $ , as shown in Fig. 6.Figure 6. (color online) Normalized likelihood distributions before and after incorporating the multiplicative and additive systematic uncertainties. The blue dashed line represents the initial distribution, whereas the red solid line represents the distribution with systematic uncertainties considered. The red arrow indicates the final upper limit.
-
We conducted an improved search for the radiative leptonic decay
$ D^+\to\gamma e^+ \nu_e $ using 20.3 fb–1 of$ e^+e^- $ collision data taken at$ \sqrt{s}=3.773 $ GeV with the BESIII detector. Our investigation does not reveal a significant signal but establishes an upper limit of$ 1.2\times10^{-5} $ on its BF with a radiative photon energy above 10 MeV, at 90% C.L. This result excludes most existing theoretical predictions [10, 16−20], thereby indicating the necessity for further investigation of the decay mechanism. The key advancement of this study is the application of a novel deep learning approach, which effectively distinguishes signals from massive backgrounds. Our results showcase the potential and broad applicability of deep learning techniques in particle physics research. -
The BESIII Collaboration thanks the staff of BEPCII and the IHEP computing center for their strong support.
Search for radiative leptonic decay D+→γe+νe using deep learning
- Received Date: 2025-03-20
- Available Online: 2025-08-15
Abstract: Using 20.3 fb–1 of