-
In this study, a DNN consisting of an input layer, two hidden layers, and an output layer is used, as shown in Fig. 1. The relationships between the input, hidden, and output layers are as follows:
Figure 1. (color online) Deep neural network with four inputs, two hidden layers, and one output. The number of hidden units is j and k for the first and second hidden layers, respectively. For each of the input and hidden layers, there is a bias term shown by a yellow circle.
$ z^{(1)}_j=S(\sum\limits_i w^{(1)}_{ji}x_i+w^{(1)}_{j0}), $
(1) $ z^{(2)}_k=\sigma(\sum\limits_j w^{(2)}_{kj}z^{(1)}_j+w^{(2)}_{k0}), $
(2) $ y_l=\sum\limits_l w^{(3)}_{lk}z^{(2)}_k+w^{(3)}_{l0}, $
(3) where
$ x_i $ ,$ z^{(1)}_j $ ,$ z^{(2)}_k $ , and$ y_l $ are the inputs, the hidden units in the first hidden layer, the hidden units in the second hidden layer, and the output of the model, respectively. Here,$ {\boldsymbol{w}}=\{w^{(1)}_{ji},w^{(2)}_{kj},w^{(3)}_{lk},w^{(1)}_{j0},w^{(2)}_{k0}, w^{(3)}_{l0}\} $ are the parameters of the model, S is the sigmoid function [30]$ S(x)=\frac{1}{1+e^{-x}}, $
(4) and σ is the softmax function [30]
$ \sigma(x_k)=\frac{e^{x_k}}{\sum _k e^{x_k}}, $
(5) which makes the sum of
$ z^{(2)}_k $ in the second hidden layer equal to$ 1 $ . The hyperbolic tangent [30] has also been used as the activation function for the model,$ \tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}. $
(6) The key reasons for choosing the sigmoid and softmax functions as the activation functions can be found in Appendix A.
The inputs of the neural network are given by
$ {\boldsymbol{x}}=\{Z,N,\delta,P\} $ , that is, including the neutron number N and proton number Z, together with information about the nuclear pairing δ and shell effect P. In particular, the information about nuclear pairing is characterized by$ \delta=\frac{(-1)^N+(-1)^Z}{2}, $
(7) and hence,
$ \delta=1 $ for even-even nuclei,$ \delta=0 $ for odd-A nuclei, and$ \delta=-1 $ for odd-odd nuclei. The information about the shell effect is characterized by$ P=\frac{v_pv_n}{v_p+v_n}, $
(8) where
$ v_p,v_n $ are the difference between Z, N and the closest magic number ($ 2 $ ,$ 8 $ ,$ 20 $ ,$ 28 $ ,$ 50 $ ,$ 82 $ ,$ 126 $ , or$ 184 $ ), respectively [31].In this study, the finite-range droplet model (FRDM12) is used as the theoretical mass model [6]. FRDM12 is a macroscopic–microscopic model, which includes the finite-range liquid-drop model in Eq. (9) as the macroscopic model,
$ \begin{aligned}[b] E_{\rm mac}(A,Z)=&\; a_vA+a_sA^{2/3}+a_3A^{1/3}B_k+a_0A^0 \\ &+E_c-c_2Z^2A^{1/3}B_r-c_4\frac{Z^{4/3}}{A^{1/3}} \\ &-c_5Z^2\frac{B_wB_s}{B_1}+f_0\frac{Z^2}{A}-c_a(Z-N) \\&+W+E_{\rm{pairing}}-a_{el}Z^{2.39}, \end{aligned} $
(9) and the folded-Yukawa single-particle potential as the microscopic corrections [6]. FRDM12 has a relatively small RMS deviation between the experimental and theoretical masses when compared with other theoretical mass models. Its overall RMS deviation is 0.603 MeV, whereas the RMS deviation of the nuclear mass for the RMF model is 2.269 MeV [11]. Furthermore, the FRDM model has been used to develop different models for β-decay properties, such as the FRDM+QRPA model [3]. Therefore, the improved FRDM12 with the neural network model can be used to improve these β-decay models.
The aim of this study is to obtain the mass residuals between the theoretical FRDM12 mass
$ M_{l,\rm{FRDM12}} $ [6] and the experimental mass$ M_{l,\rm{exp}} $ [2],$ \begin{array}{*{20}{l}} t_l=M_{l,\rm{FRDM12}} - M_{l,\rm{exp}}. \end{array} $
(10) To achieve the target mass residuals from the neural network, different hyperparameters of the model must be adjusted. The mean-square-error,
$ \begin{array}{*{20}{l}} {\rm{ mse}}({\boldsymbol{x}},{\boldsymbol{w}})= \sum\limits_l(t_l-y_l)^2, \end{array} $
(11) is used as the loss function, with
$ y_l $ and$ t_l $ obtained from Eqs. (3) and (10), respectively. The Adam algorithm [32] with a learning rate of 0.01 is used to adjust the parameters$ {\boldsymbol{w}} $ to achieve a smaller loss function. The corresponding key reasons are shown in Appendix B. Each of the hidden layers consists of 22 hidden units and a bias (the key reasons are presented in Appendix C). Therefore, there are$ 639 $ parameters in$ {\boldsymbol{w}} $ for the model.For the initial parameter
$ {\boldsymbol{w}} $ before training, we use three different initializers: the standard normal initializer, Glorot normal initializer, and zeros initializer [33]. The standard normal initializer distributes the initial parameters from a standard normal distribution with a mean of$ 0 $ and a standard deviation of$ 1 $ . The Glorot normal initializer distributes the initial parameters from a normal distribution with a mean of$ 0 $ and a standard deviation equaling$ \sqrt{{2}/(f_{\rm{in}}+f_{\rm{out}})} $ . In the above equation,$ f_{\rm in} $ and$ f_{\rm out} $ are the numbers of input and output units of that layer, respectively [33]. The zeros initializer sets the initial parameters to zero [33]. In this study, the initial parameters$ {\boldsymbol{w}} $ before training are generated from the standard normal distribution for$ \{w^{(1)}_{ji}, w^{(2)}_{kj}\} $ and the Glorot normal distribution for$ \{w^{(1)}_{j0}, w^{(2)}_{k0}\} $ (the key reasons are presented in Appendix D).The experimental data of this study are taken from the atomic mass evaluation of 2020 (AME2020) [2]. There are 2457 nuclei in total with
$ Z,N \geq 8 $ . They are randomly separated into a training set and validation set. 1966 nuclei (80%) are randomly selected for the training set, and the remaining 491 nuclei (20%) are in the validation set.To achieve a neural network with less variance in the prediction of mass,
$ 33 $ sets of hyperparameters with different regularizers and seed numbers are used for training. There are three different types of regularizers used in this study. The first type is without using any regularizer. The second type is the L2 regularizer, which includes an additional term in the loss function [34],$ \begin{array}{*{20}{l}} L_{\rm{L2}}({\boldsymbol{w}})=\lambda \times \sum(w)^2, \end{array} $
(12) where λ is a hyperparameter controlling the rate of regularization. The third type is the orthogonal regularizer, which encourages the basis of the output space of the layer to be orthogonal to each other [34]. The hyperparameter λ is used to control the rate of regularization.
In the following calculations, 20000 epochs are run for each training, and the smallest RMS deviation between
$ t_l $ and$ y_l $ of the validation set in different epochs is used to determine the performance of the set of hyperparameters. Only the sets of hyperparameters giving an RMS deviation smaller than 0.228 MeV for the validation set (a reduction in RMS deviation of over 60% compared with the FRDM12 prediction) are selected. The average prediction from the selected sets of hyperparameters is the model output y. Summing the model output y and theoretical FRDM12 mass$ M_{k,\rm{FRDM12}} $ will give us the model mass prediction for the nuclei. -
Of the 33 sets of hyperparameters, seven sets have an RMS deviation reduction greater than 60%, as shown in Fig. 2. Different regularizers speed up or slow down the training of the model, compared with the model without a regularizer (panel (a)). Note that from the 33 sets of hyperparameters, three are without regularizers and only one of them has an RMS deviation reduction over 60%. For the 12 sets of hyperparameters using the orthogonal regularizers, four have an RMS deviation reduction over 60%. For the nine sets of hyperparameters using L1 or L2 regularizers, none and two have an RMS deviation reduction over 60%, respectively. This shows that the regularizers still have an effect on the model. Moreover, because the seed number will affect the random numbers generated during training and the initial values of the parameters, different seed numbers are tried for the model. As shown in Table 1, the RMS deviations have average reductions from 0.603 to 0.200 MeV and 0.232 MeV, having improvements of 66.8% and 61.5% for the training and validation sets, respectively. This shows that the neural network can improve the accuracy of the FRDM12 model after adjusting the hyperparameters.
Figure 2. (color online) RMS deviations of the training and validation sets with different sets of hyperparameters. Only the seven sets of hyperparameters with a reduction in RMS deviation larger than 60% are selected as follows: (a) No regularizer, seed=3, (b) L2 regularizer,
$ \lambda=0.00001 $ , seed$ =1 $ , (c) L2 regularizer,$ \lambda=0.00001 $ , seed$ =2 $ , (d) orthogonal regularizer,$ \lambda =0.1 $ , seed$ =2 $ , (e) orthogonal regularizer,$ \lambda=0.01 $ , seed$ =1 $ , (f) orthogonal regularizer,$ \lambda=0.001 $ , seed$ =1 $ , and (g) orthogonal regularizer,$ \lambda=0.001 $ , seed$ =2 $ . The blue and orange curves represent the RMS deviations of the training and validation sets, respectively. The arrow points to the lowest RMS deviation of the validation set within the 20000 epochs.Hyperparameters $ \sigma_{\rm training} $ /MeV$ \sigma_{\rm validation} $ /MeV$ \Delta\sigma_{\rm training} $ (%)$ \Delta\sigma_{\rm validation} $ (%)No regularizer, seed $ =3 $ $ 0.175 $ $ 0.228 $ $ 71.0 $ $ 62.2 $ L2 regularizer, $ \lambda=0.00001 $ , seed$ =1 $ $ 0.204 $ $ 0.226 $ $ 66.2 $ $ 62.5 $ L2 regularizer, $ \lambda=0.00001 $ , seed$ =2 $ $ 0.216 $ $ 0.235 $ $ 64.2 $ $ 61.0 $ Orthogonal regularizer, $ \lambda=0.1 $ , seed$ =2 $ $ 0.212 $ $ 0.218 $ $ 64.8 $ $ 63.8 $ Orthogonal regularizer, $ \lambda=0.01 $ , seed$ =1 $ $ 0.176 $ $ 0.238 $ $ 70.8 $ $ 60.5 $ Orthogonal regularizer, $ \lambda=0.001 $ , seed$ =1 $ $ 0.219 $ $ 0.238 $ $ 63.7 $ $ 60.5 $ Orthogonal regularizer, $ \lambda=0.001 $ , seed$ =2 $ $ 0.201 $ $ 0.238 $ $ 66.7 $ $ 60.5 $ Average of the above sets $ 0.200 $ $ 0.232 $ $ 66.8 $ $ 61.5 $ Table 1. RMS deviations of nuclear mass between the experimental data from AME2020 [2] and the model predictions. The original RMS deviation between experimental data and FRDM12 [6] is
$ 0.603 $ MeV. For each set of hyperparameters, the RMS deviations for the training and validation sets are denoted by$ \sigma_{\rm training} $ and$ \sigma_{\rm validation} $ , respectively. The last two columns show the reduction in RMS deviations from the original deviation$ 0.603 $ MeV to$ \sigma_{\rm training} $ and$ \sigma_{\rm validation} $ , respectively.For most of the isotopes and isotones, the neural network model can give good performance in mass prediction. For example, as shown in Fig. 3, the average of the seven selected sets of hyperparameters has a considerably better overall performance in the mass difference compared to the FRDM12 model for the Nd isotopes and
$ N=66 $ isotones. These two isotopes and isotones are selected because they have enough nuclei that are not included in the training set, where the data in the training set are represented by the gray-hatched regions in Fig. 3. In general, the mass prediction in the training set is better than that in the validation set. For the regions far from the training set, such as$ N<68 $ for Nd isotopes and$ Z>59 $ for$ N=66 $ isotones, the mass difference is still within the mass uncertainty shown in the yellow-hatched regions; therefore, the neural network model still exhibits good performance in these regions. However, the error bar for the neural network model is larger in regions far from the training set, which indicates that the mass predictions for different sets of hyperparameters in these regions have large variations. Therefore, multiple sets of hyperparameters are selected, and averaging the prediction of the seven sets of hyperparameters can provide the uncertainty of nuclear mass prediction for different nuclei. The larger the error bar for the nuclei, the less confidence we have in the mass prediction.Figure 3. (color online) Mass difference between the DNN mass data and experimental data,
$t_k=M_{k,\rm DNN} - M_{k,\rm exp}$ , for (a) Nd isotopes and (b)$ N=66 $ isotones. The blue curves represent the mass difference between the FRDM12 model and experimental data. The semi-transparent curves in the background represent the mass difference between the FRDM12+DNN mass model and experimental data with the seven sets of hyperparameters. The red curves with uncertainties represent the mass difference for the average of the FRDM12+DNN mass model. The error bars are calculated from the standard deviation of the mass difference from the$ 7 $ sets of hyperparameters. The gray-hatched regions represent the training set. The yellow-hatched regions represent the mass uncertainties by including the average RMS deviations of seven sets of hyperparameters together with the experimental uncertainties.Furthermore, there are several nuclei whose the masses from the FRDM12 model are significantly different from the experimental data. For example, the mass difference can be up to
$ 2.5 $ MeV. It is interesting to observe whether the neural network can significantly improve the mass predictions for these specific nuclei.For Cr isotopes, shown in Fig. 4, FRDM12 cannot give a good mass prediction for the region
$ N<22 $ . In this region, the neural network model can still provide a better mass prediction compared with the FRDM12 model. Note that, the mass difference of the two nuclei$ N=17 $ and$ 18 $ is not within the yellow-hatched region. Although the mass prediction for the nucleus$ N=18 $ is not as good as that for other nuclei, the neural network model can improve the mass difference from$ -2.1 $ to$ -0.6 $ MeV. For the nucleus$ N=17 $ , an improvement in mass difference from$ -2.5 $ to$ -2 $ MeV is obtained, although the error bar for this nucleus is too large to reach a solid conclusion. The above discussion shows that the neural network model can improve mass prediction, even though the original theoretical model cannot provide an accurate prediction.Figure 4. (color online) Same as Fig. 3, but for Cr isotopes.
Finally, we must find whether there are any nuclei for which the mass prediction becomes worse after applying the neural network model. Out of the
$ 3456 $ nuclei, there are$ 101 $ nuclei for which the DNN predictions are outside the yellow-hatched area and with a larger mass difference compared with the original FRDM12 model. Among these$ 101 $ nuclei,$ 92 $ have proton numbers less than$ 60 $ ,$ 63 $ occur in neutron-rich and proton-rich nuclei, and for the remaining cases,$ 16 $ might have a worse prediction because of odd-even staggering. For example, the Ba isotopes in Fig. 5 show good agreement in the mass prediction for the training set and large-neutron-number regions. However, for$ N=60 $ and$ 62 $ , the neural network gives worse predictions compared to the FRDM12 model. This may occur because of two reasons. First, the nuclei$ N=60 $ and$ 62 $ are in the validation set and are not involved in the training of the model. Therefore, the overall performance of the nuclei in the validation set is not as good as that of the nuclei in the training set. Second, this neural network model still cannot tackle the odd-even staggering in light nuclei. Therefore, these two even-even nuclei do not exhibit better mass prediction after using the neural network, whereas the nearby even-odd nuclei in the training set exhibit better performance.Figure 5. (color online) Same as Fig. 3, but for Ba isotopes.
Nuclear mass predictions based on a deep neural network and finite-range droplet model (2012)
- Received Date: 2023-06-07
- Available Online: 2024-02-15
Abstract: A neural network with two hidden layers is developed for nuclear mass prediction, based on the finite-range droplet model (FRDM12). Different hyperparameters, including the number of hidden units, choice of activation functions, initializers, and learning rates, are adjusted explicitly and systematically. The resulting mass predictions are achieved by averaging the predictions given by several different sets of hyperparameters with different regularizers and seed numbers. This can provide not only the average values of mass predictions but also reliable estimations in the mass prediction uncertainties. The overall root-mean-square deviations of nuclear mass are reduced from 0.603 MeV for the FRDM12 model to 0.200 MeV and 0.232 MeV for the training and validation sets, respectively.