3 results and discussion, 3.1 compare with sota dta prediction models in classification tasks.
Table 2 summarizes the quantitative results. For the Human dataset, the proposed method yielded a significantly higher precision than that of other methods for DTA prediction. For the C. elegans dataset, the proposed method achieved considerable improvements in both precision and recall. These results reveal MGraphDTA's potential to master molecular representation learning for drug discovery. Besides, we observed that replacing CNN with MCNN can yield a slight improvement, which corroborates the efficacy of the proposed MCNN.
Dataset | Model | Precision | Recall | AUC |
---|---|---|---|---|
Human | GNN-CNN | 0.923 | 0.918 | 0.970 |
TrimNet-CNN | 0.918 | 0.953 | 0.974 | |
GraphDTA | 0.882 (0.040) | 0.912 (0.040) | 0.960 (0.005) | |
DrugVQA(VQA-seq) | 0.897 (0.004) | 0.948 (0.003) | 0.964 (0.005) | |
TransformerCPI | 0.916 (0.006) | 0.925 (0.006) | 0.973 (0.002) | |
MGNN-CNN (ours) | 0.953 (0.006) | 0.950 (0.004) | 0.982 (0.001) | |
MGNN-MCNN (ours) | 0.955 (0.005) | 0.956 (0.003) | 0.983 (0.003) | |
C. elegans | GNN-CNN | 0.938 | 0.929 | 0.978 |
TrimNet-CNN | 0.946 | 0.945 | 0.987 | |
GraphDTA | 0.927 (0.015) | 0.912 (0.023) | 0.974 (0.004) | |
TransformerCPI | 0.952 (0.006) | 0.953 (0.005) | 0.988 (0.002) | |
MGNN-CNN (ours) | 0.979 (0.005) | 0.961 (0.002) | 0.991 (0.002) | |
MGNN-MCNN (ours) | 0.980 (0.004) | 0.967 (0.005) | 0.991 (0.001) |
For the regression task on the filtered Davis dataset, we compared the proposed MGraphDTA with SOTA methods in this dataset, which were MDeePred, 21 CGKronRLS, 63 and DeepDTA. 2 We used root mean square error (RMSE, the smaller the better), CI, and Spearman rank correlation (the higher the better) as performance indicators following MDeePred. The whole dataset was randomly divided into six parts; five of them were used for fivefold cross-validation and the remaining part was used as the independent test dataset. The final performance was evaluated on the independent test dataset following MDeePred. Note that the data points in each fold are exactly the same as MDeePred for a fair comparison.
Tables 3 and 4 summarize the predictive performance of MGraphDTA and previous models on the Davis, KIBA, and Metz datasets. The graph-based methods surpassed CNN-based and recurrent neural network (RNN) based methods, which demonstrates the potential of graph neural networks in DTA prediction. Since CNN-based and RNN-based models represent the compounds as strings, the predictive capability of a model may be weakened without considering the structural information of the molecule. In contrast, the graph-based methods represent compounds as graphs and capture the dependence of graphs via message passing between the vertices of graphs. Compared to other graph-based methods, MGraphDTA achieved the best performances as shown in Tables 3 and 4 . The paired Student's t -test shows that the differences between MGraphDTA and other graph-based methods are statistically significant on the Metz dataset ( p < 0.05). Moreover, MGraphDTA was significantly better than traditional PCM models on three datasets ( p < 0.01). It is worth noting that FNN was superior to other traditional PCM models ( p < 0.01), which is consistent with the previous studies. 11,12 Table 5 summarizes the results of four methods in the filtered Davis dataset. It can be observed that MGraphDTA achieved the lowest RMSE. Overall, MGraphDTA showed impressive results on four benchmark datasets that exceed other SOTA DTA prediction models significantly, which reveals the validity of the proposed MGraphDTA.
Dataset | Davis | KIBA | ||||||
---|---|---|---|---|---|---|---|---|
Model | Proteins | Compounds | MSE | CI | r index | MSE | CI | r index |
These results are taken from DeepDTA. These results are taken from WideDTA. These results are taken from GraphDTA. These results are taken from DeepAffinity. — These results are not reported from original studies. | ||||||||
DeepDTA | CNN | CNN | 0.261 | 0.878 | 0.630 | 0.194 | 0.863 | 0.673 |
WideDTA | CNN + PDM | CNN + LMCS | 0.262 | 0.886 | — | 0.179 | 0.875 | — |
GraphDTA | CNN | GCN | 0.254 | 0.880 | — | 0.139 | 0.889 | — |
GraphDTA | CNN | GAT | 0.232 | 0.892 | — | 0.179 | 0.866 | — |
GraphDTA | CNN | GIN | 0.229 | 0.893 | — | 0.147 | 0.882 | — |
GraphDTA | CNN | GAT–GCN | 0.245 | 0.881 | — | 0.139 | 0.891 | — |
DeepAffinity | RNN | RNN | 0.253 | 0.900 | — | 0.188 | 0.842 | — |
DeepAffinity | RNN | GCN | 0.260 | 0.881 | — | 0.288 | 0.797 | — |
DeepAffinity | CNN | GCN | 0.657 | 0.737 | — | 0.680 | 0.576 | — |
DeepAffinity | HRNN | GCN | 0.252 | 0.881 | — | 0.201 | 0.842 | — |
DeepAffinity | HRNN | GIN | 0.436 | 0.822 | — | 0.445 | 0.689 | — |
KronRLS | SW | PS | 0.379 | 0.871 | 0.407 | 0.411 | 0.782 | 0.342 |
SimBoost | SW | PS | 0.282 | 0.872 | 0.655 | 0.222 | 0.836 | 0.629 |
RF | ECFP | PSC | 0.359 (0.003) | 0.854 (0.002) | 0.549 (0.005) | 0.245 (0.001) | 0.837 (0.000) | 0.581 (0.000) |
SVM | ECFP | PSC | 0.383 (0.002) | 0.857 (0.001) | 0.513 (0.003) | 0.308 (0.003) | 0.799 (0.001) | 0.513 (0.004) |
FNN | ECFP | PSC | 0.244 (0.009) | 0.893 (0.003) | 0.685 (0.015) | 0.216 (0.010) | 0.818 (0.005) | 0.659 (0.015) |
MGraphDTA | MCNN | MGNN | 0.207 (0.001) | 0.900 (0.004) | 0.710 (0.005) | 0.128 (0.001) | 0.902 (0.001) | 0.801 (0.001) |
Model | Proteins | Compounds | MSE | CI | r index |
---|---|---|---|---|---|
DeepDTA | CNN | CNN | 0.286 (0.001) | 0.815 (0.001) | 0.678 (0.003) |
GraphDTA | CNN | GCN | 0.282 (0.007) | 0.815 (0.002) | 0.679 (0.008) |
GraphDTA | CNN | GAT | 0.323 (0.003) | 0.800 (0.001) | 0.625 (0.010) |
GraphDTA | CNN | GIN | 0.313 (0.002) | 0.803 (0.001) | 0.632 (0.001) |
GraphDTA | CNN | GAT–GCN | 0.282 (0.011) | 0.816 (0.004) | 0.681 (0.026) |
RF | ECFP | PSC | 0.351 (0.002) | 0.793 (0.001) | 0.565 (0.001) |
SVM | ECFP | PSC | 0.361 (0.001) | 0.794 (0.000) | 0.590 (0.001) |
FNN | ECFP | PSC | 0.316 (0.001) | 0.805 (0.001) | 0.660 (0.003) |
MGraphDTA | MCNN | MGNN | 0.265 (0.002) | 0.822 (0.001) | 0.701 (0.001) |
Model | RMSE | CI | Spearman |
---|---|---|---|
These results are taken from MDeePred. | |||
MDeePred | 0.742 (0.009) | 0.733 (0.004) | 0.618 (0.009) |
CGKronRLS | 0.769 (0.010) | 0.740 (0.003) | 0.643 (0.008) |
DeepDTA | 0.931 (0.015) | 0.653 (0.005) | 0.430 (0.013) |
MGraphDTA | 0.695 (0.009) | 0.740 (0.002) | 0.654 (0.005) |
(1) Orphan–target split: each protein in the test set is unavailable in the training set.
(2) Orphan–drug split: each drug in the test set is inaccessible in the training set.
(3) Cluster-based split: compounds in the training and test sets are structurally different ( i.e. , the two sets have guaranteed minimum distances in terms of structure similarity). We used Jaccard distance on binarized ECFP4 features to measure the distance between any two compounds following the previous study. 12 Single-linkage clustering 12 was applied to find a clustering with guaranteed minimum distances between any two clusters.
Given that the DTA prediction models are typically used to discover drugs or targets that are absent from the training set, the orphan splits provide realistic and more challenging evaluation schemes for the models. The cluster-based split further prevents the structural information of compounds from leaking to the test set. We compared the proposed MGraphDTA to GraphDTA and three traditional PCM models (RF, SVM, and FNN). For a fair comparison, we replaced the MGNN in MGraphDTA with GCN, GAT, GIN, and GAT–GCN using the source code provided by GraphDTA with the hyper-parameters they reported. We used the five-fold cross-validation strategy to analyze model performance. In each fold, all methods shared the same training, validation, and test sets. Note that the experimental settings remain the same for the eight methods.
Fig. 7 shows the experimental results for eight methods using the orphan-based and cluster-based split settings. Compared with the results using the random split setting shown in Tables 3 and 4 , we found that the model's performance decreases greatly in the orphan-based and cluster-based split settings. Furthermore, as shown in Fig. 7(a) and (c) , the MSE for MGraphDTA on Davis, KIBA, and Metz datasets using the orphan–drug split were 0.572 ± 0.088, 0.390 ± 0.023, and 0.555 ± 0.043, respectively while those using the cluster-based split were 0.654 ± 0.207, 0.493 ± 0.097, and 0.640 ± 0.078, respectively. In other words, the cluster-based split is more challenging to the DTA prediction model compared to the orphan–drug split, which is consistent with the fact that the cluster-based split setting can prevent the structural information of compounds from leaking to the test set. These results suggest that improving the generalization ability of the DTA model is still a challenge. From Fig. 7(a) , we observed that MGNN exceeded other methods significantly in the Davis dataset using the orphan–drug split setting ( p < 0.01). On the other hand, there were no statistical differences between MGNN, GAT, and RF ( p > 0.05) in the KIBA dataset while these three methods surpassed other methods significantly ( p < 0.01). In addition, SVM and FNN methods were superior to other methods significantly in the Metz dataset ( p < 0.01). Overall, the traditional PCM models showed impressive results that even surpassed graph-based methods in the KIBA and Metz datasets using the orphan–drug split setting as shown in Fig. 7(a) . These results suggest that it may be enough to use simple feature-based methods like RF in this scenario, which is consistent with a recent study. 64 Since the number of drugs in the Davis dataset is significantly less than that in KIBA and Metz datasets as shown in Table 1 , the generalization ability of a model trained on limited drugs can not be guaranteed for unseen drugs. Fig. 8 shows the correlations between predictive values and ground truths of five graph-based models in the Davis dataset using orphan–drug splitting. The predictive value of MGNN was broader than that of other graph-based models as shown in Fig. 8(a) . We also noticed that the ground truths and predictive values of MGNN have the most similar distributions as shown in Fig. 8(b) . The Pearson correlation coefficients of GCN, GAT, GIN, GAT–GCN, and MGNN for DTA prediction were 0.427, 0.420, 0.462, 0.411, and 0.552, respectively. These results further confirm that MGNN has the potential to increase the generalization ability of the DTA model. From Fig. 7(b) , we observed that MGNN outperforms other models significantly in three datasets using the orphan–target split setting ( p < 0.01). MGNN also exceeded other methods significantly in KIBA and Metz datasets using the cluster-based split setting as shown in Fig. 7(c) ( p < 0.05). It is worth noting that graph-based methods outperformed traditional PCM models in the random split setting as shown in Tables 3 and 4 , while the superiority of the graph-based methods was less obvious in the orphan-based and cluster-based split settings as shown in Fig. 7 . Overall, the results show the robustness of MGNN in different split setting schemes and prove that both local and nonlocal properties of a given molecule are essential for a GNN to make accurate predictions.
Comparisons of MGNN and other seven models in Davis, KIBA, and Metz datasets in terms of MSE, CI, and r index (from left to right) using the (a) orphan–drug, (b) orphan–target, and (c) cluster-based split settings. |
(a) Scatter and (b) kernel density estimate plots of binding affinities between predictive values and ground truths in Davis dataset using the orphan–drug split setting. |
Model | RMSE | CI | Spearman |
---|---|---|---|
Without dense connection | 0.726 (0.008) | 0.726 (0.008) | 0.620 (0.019) |
Without batch normalization | 0.746 (0.032) | 0.719 (0.014) | 0.604 (0.008) |
MGraphDTA | 0.695 (0.009) | 0.740 (0.002) | 0.654 (0.005) |
Furthermore, an ablation study was performed on the filtered Davis dataset to investigate the effect of the receptive field of MCNN on the performance. Specifically, we increased the receptive field gradually by using convolutional layers with a more and more large kernel ( i.e. , 7, 15, 23, 31). From the results shown in Table 7 , it can be observed that the model performance was slightly decreased as increasing the receptive field. Since there are usually a few residues that are involved in protein and ligand interaction, 65 increasing the receptive field to cover more regions may bring noise information from the portions of the sequence that are not involved in DTA into the model.
Max receptive field | RMSE | CI | Spearman |
---|---|---|---|
31 | 0.718 (0.002) | 0.732 (0.005) | 0.636 (0.013) |
23 | 0.713 (0.008) | 0.732 (0.004) | 0.635 (0.008) |
15 | 0.710 (0.006) | 0.734 (0.005) | 0.639 (0.011) |
7 | 0.695 (0.009) | 0.740 (0.002) | 0.654 (0.005) |
Distribution of activation values of the last layers in the ligand and protein encoders on the Davis, filtered Davis, KIBA, and Metz datasets. |
(1) Visualizing MGNN model based on Grad-AAM.
(2) Visualizing GAT model based on Grad-AAM.
(3) Visualizing GAT model based on graph attention mechanism.
Specifically, we first replaced MGNN with a two layers GAT in which the first graph convolution layer had ten parallel attention heads using the source code provided by GraphDTA. 31 We then trained MGNN-based and GAT-based DTA prediction models under the five-fold cross-validation strategy using the random split setting. Finally, we calculated the atom importance using Grad-AAM and graph attention mechanism and showed the probability map using RDkit. 48
Table 8 shows the quantitive results of MGNN and GAT. MGNN outperformed GAT by a notable margin ( p < 0.01), which further corroborates the superiority of the proposed MGNN. Fig. 10 shows the visualization results of some molecules based on Grad-AAM (MGNN), Grad-AAM (GAT), and graph attention (more examples can be found in ESI Fig. S4 and S5 † ). According to previous studies, 71–75 epoxide, 73 fatty acid, 72,75 sulfonate, 71 and aromatic nitroso 74 are the structural alerts that correlate with specific toxicological endpoints. We found that Grad-AAM (MGNN) does give the highest weights to these structural alerts. Grad-AAM (MGNN) can not only identify important small moieties as shown in Fig. 10(a)–(d) but also reveal the large moieties as shown in Fig. 10(f) , which proves that the MGNN can capture the local and global structures simultaneously. Grad-AAM (GAT) also discerned the structural alerts as shown in Fig. 10(b), (c), (e), and (f) . However, Grad-AAM (GAT) sometimes failed to detect structural alerts as shown in Fig. 10(a) and (d) and we might also notice that the highlighted region involves more extensive regions, and does not correspond to the exact structural alerts as shown in Fig. 10(b), (c), and (e) . These results suggest that the hidden representations learned by GAT were insufficient to well describe the molecules. On the other hand, the graph attention can only reveal some atoms of structural alerts as shown in Fig. 10(c), (d), and (f) . The attention map contained less information about the global structure of a molecule since it only considers the neighborhood of an atom. 45 The superiority of graph attention was that it can highlight atoms and bonds simultaneously, which the Grad-AAM can only highlight the atoms. Fig. 11 shows the distribution of atom importance for Grad-AAM (MGNN), Grad-AAM (GAT), and graph attention. The distribution for Grad-AAM (MGNN) was left-skewed, which suggested that MGNN pays more attention to some particular substituents contributing most to the toxicity while suppressing the less essential substituents. We also found that Grad-AAM (GAT) tends to highlight extensive atoms from the distribution which was consistent with the results shown in Fig. 10(b), (c), and (e) . Conversely, the distribution of graph attention was narrow with most values less than 0.5, which suggested that graph attention often failed to detect important substructures. It is worth noting that some studies utilize global attention mechanisms while dropping all structural information of a graph to visualize a model and may also provide reasonable visual explanations. 46,76 However, these global attention-based methods are model-specific so that the methods can not easily transfer to other graph models. Conversely, Grad-AAM is a universal visual interpretation method that can be easily transferred to other graph models. Moreover, the visual explanation results produced by Grad-AAM may be further improved by applying regularization techniques during the training of MGraphDTA. 77
Model | Proteins | Compounds | MSE | CI | r index |
---|---|---|---|---|---|
GraphDTA | MCNN | GAT | 0.215 (0.007) | 0.843 (0.005) | 0.330 (0.007) |
MGraphDTA | MCNN | MGNN | 0.176 (0.007) | 0.902 (0.005) | 0.430 (0.006) |
Atom importance revealed by Grad-AAM (MGNN), Grad-AAM (GAT), and graph attention in structural alerts of (a) and (b) epoxide, (c) and (d) fatty acid, (e) sulfonate, and (f) aromatic nitroso. |
Distribution of atom importance for Grad-AAM (MGNN), Grad-AAM(GAT), and graph attention. Note that we do not consider the bond importance for Grad-AAM (GAT). |
Overall, Grad-AAM tends to create more accurate explanations than the graph attention mechanism, which may offer biological interpretation to help us understand DL-based DTA prediction. Fig. 12 shows Grad-AAM (MGNN) on compounds with symmetrical structures. The distribution of Grad-AAM (MGNN) was also symmetrical, which suggests that representing compounds as graphs and using GNNs to extract the compounds' pattern is able to preserve the structures of the compounds.
Grad-AAM (MGNN) for molecules with symmetrical structures. |
The receptive field of layer 1, layer 2, and layer 3 of GNN in compound 4-propylcyclohexan-1-one. (a) The receptive field of atom C2. (b) The receptive field of atom C1. |
Grad-AAM (MGNN) for molecules with similar structures. |
4 conclusion, code availability, data availability, author contributions, conflicts of interest, acknowledgements.
† Electronic supplementary information (ESI) available: Details of machine learning construction, vertex features of graphs, data distributions, hyperparameters tuning, and additional visualization results. See DOI: |
‡ Equal contribution. |
arXiv's Accessibility Forum starts next month!
Help | Advanced Search
Title: multi-scale representation learning on proteins.
Abstract: Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein -- HoloProt -- connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure -- comprising secondary and tertiary components -- capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification). On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss.
Comments: | Neural Information Processing Systems 2021 |
Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM) |
Cite as: | [cs.LG] |
(or [cs.LG] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Scientific Reports volume 14 , Article number: 19413 ( 2024 ) Cite this article
Metrics details
Modeling user intention with limited evidence in short-term historical sequences is a major challenge in session recommendation. In this domain, research exploration extends from traditional methods to deep learning. However, most of them solely concentrate on the sequential dependence or pairwise relations within the session, disregarding the inherent consistency among items. Additionally, there is a lack of research on context adaptation in session intention learning. To this end, we propose a novel session-based model named C-HAN, which consists of two parallel modules: the context-embedded hypergraph attention network and self-attention. These modules are designed to capture the inherent consistency and sequential dependencies between items. In the hypergraph attention network module, the different types of interaction contexts are introduced to enhance the model’s contextual awareness. Finally, the soft-attention mechanism efficiently integrates the two types of information, collaboratively constructing the representation of the session. Experimental validation on three real-world datasets demonstrates the superior performance of C-HAN compared to state-of-the-art methods. The results show that C-HAN achieves an average improvement of 6.55%, 5.91%, and 6.17% over the runner-up baseline method on Precision @ K , Recall @ K , and MRR evaluation metrics, respectively.
Introduction.
With the rapid growth of the Internet and the abundance of available information, recommendation system (RS) has become crucial in helping users navigate through this vast amount of data 1 . RS aims to provide personalized and relevant recommendations to users, enabling them to discover new content, products, or services that align with their interests and preferences. The fascinating experience of RS is that they free people from information overload 2 . Conventional methods 3 , 4 , 5 typically focus on complete user interaction records or log files to achieve personalized recommendations. But, due to increasing privacy concerns, obtaining complete user profiles has become more challenging. This presents significant challenges for traditional recommendation systems, limiting their effectiveness and feasibility in practical applications. To deal with this dilemma, session-based recommendation system (SRS) has emerged, which is significantly different from these traditional studies. SRS addresses the challenge of unavailable user profiles by solely utilizing short-term historical data.
Nowadays, SRS has emerged as a popular topic in the field of RS, attracting attention from both academic and industrial communities. Generally, in SRS, an anonymous user’s session (e.g., the sequence of purchases, clicks, browsing) is modeled as short sequences with chronological order, i.e., a sequence of multiple items that the user purchases or clicks. The fundamental idea behind SRS is to predict the user’s next action by analyzing a short sequence. Given the limited information available, the main difficulty is how to effectively and precisely comprehend or capture the intricate relations between items. Modeling sequential dependency is critical to SRS, especially in sessions with strong sequential dependencies like Fig. 1 a. After clicking on the camera, \(u_1\) clicked a series of accessories such as the lens, charger, or camera bag. it is clear from Fig. 1 that the items in session 1 have sequential dependencies. The Markov chain (MC) 6 is a classical method based on sequential dependent assumption, where it predicts the user’s next behavior based on conditional transition probabilities of previous actions. However, MC-based methods strictly adhere to point-level sequential patterns between individual steps 7 , overlooking long-range global dependencies and failing to capture certain complex user behavior patterns, which leads to less accurate or comprehensive recommendation results. Figure 2 shows an instance based on a 1-order Markov chain model w.r.t. a series of electronic devices. The system recommends the camera lens to the user, while the user’s real intention is the display. The disagreement stems from the fact that the model only generates recommendations based on the last point of the user’s interaction trajectory (i.e., local short-term dependence), without considering the previous items that the user has interacted with.
An illustration outlines two distinct connections frequently observed among items within sessions. In ( a ), the sequence-dependent trait is the primary influence, while ( b ) emphasizes the predominance of consistency.
Recently, RNN (recurrent neural network) has been successfully applied to session-based recommendation due to its advantage in modeling sequential data and has achieved remarkable results. Hidasi et al. 8 first proposed the RNN-based model GRU4Rec, where they attempted to model the user click list as a sequence and employ RNN to learn user intention features from it. Li et al. 9 implemented the attention mechanism within the RNN architecture to improve the user’s main intent recognition and proposed the NARM model. STAMP 10 proposed a short-term interest prioritization strategy and combined it with an attention mechanism to capture important information in user behavior sequences. While RNN-based approaches effectively manage sequential data within a session, their reliance on strong dependencies prevents them from adequately modeling other crucial relationships, such as consistency. In this paper, consistency refers to the common characteristics reflected in the items interacted with by the user during a session. These characteristics are not limited to the external attributes of the items but rather reflect the invariance in the user’s intent. For instance, as illustrated in Fig. 1 b, user \(u_2\) clicked on a range of bags, including handbags, backpacks, and shoulder bags, with no clear sequential dependence but evident consistency within session 2. The consistency reflected in the session may be simplicity and vintage design. Accurately capturing the consistency in a session can lead to a better understanding of the user’s interaction intentions, especially for modeling long-term user preferences.
An instance of 1-order MC-based recommendation.
The context information plays a pivotal role in improving the accuracy and personalization of recommendation systems. Various rich contextual information such as holidays, time, festivals, and types have already been incorporated into SRS with the manner of implicit or explicit feedback 11 . The core intuition lies in that the user behaviors and preferences vary in different times, locations, and scenarios. For instance, during holidays, users are more inclined towards recreational activities, such as watching movies, which are more appealing than reading product reports. Hence, how to effectively leverage the abundance of contextual information in SBR to enhance the modeling of user behaviors more properly is a crucial and challenging issue.
In recent years, Graph Neural Networks (GNN) have stood out in numerous applications due to their powerful representation learning capabilities. In the realm of SRS, GNN-based methods have also begun to gain popularity. GNN models user session data as directed graphs, leveraging the relational structures between nodes in the graph to propagate and aggregate information for learning node representations. The GNN method relaxes the strong temporal dependencies between consecutive items in RNN and regards item transitions as pairwise relations to learn the representation of nodes. The SR-GNN 12 is the first model to implement SRS using graph neural networks. GC-SAN 13 incorporates both GGNN and self-attention mechanisms to enhance contextualized item representations. MSGIFSR 14 uses a variety of context information and graph neural networks to enrich the representation of items and proposes a multi-granularity continuous user intent unit method for SRS. While GNN-based approaches are effective at capturing complex relations within items by propagating information over the graph, they relax modeling temporal dependencies in user session sequences, which may limit their performance when dealing with sequentially dependent session recommendation tasks, as shown in Fig. 1 a.
In summary, the problems with existing methods can be summarized as follows: (1) Over-reliance on local sequential dependencies, while neglecting the global long-term dependencies of user behavior in sessions. (2) They struggle to capture and understand certain complex patterns in user behavior, such as consistency issues, leading to inaccurate recommendations. (3) They lack effective mechanisms to integrate and utilize interaction context information, failing to effectively capture the dynamic changes in user interests.
To solve the problems mentioned above, we propose a novel Context-embedded Hypergraph Attention Network and self-attention for session recommendation (named C-HAN). It can capture two kinds of complex relationships among items, i.e., sequential dependency and consistency. Particularly, we learn item representations using a hypergraph attention network from session hypergraph and context information. Simultaneously, we employ self-attention to capture global sequential dependencies between session items. Next, an attention mechanism is employed to integrate both types of information forming the final session representation. Finally, this model can predict the probability of each item being clicked. Our work presents the following key contributions:
We present C-HAN, a novel session recommendation method that effectively incorporates interactive context information and captures consistency and sequential dependencies within a session.
C-HAN utilizes hypergraph and self-attention to learn two types of item representations, and collaboratively generates the final representation in the stage of session representation learning, with the help of soft attention coordination.
We use context information to improve item representation learning, using attention mechanisms to identify those contexts that are important to the user’s interest.
We conducted extensive experiments on the ML-1M, Delicious, and Yoochoose datasets. The findings of the experiment indicate that C-HAN surpasses the state-of-the-art approaches.
Conventional methods.
Traditional approaches typically rely on session-based state transitions or predefined rules to produce suggestions. The Markov chain is the basic operation to implement the state transition-based recommendation method. Such as the FPMC model 6 creates a 1-order Markov transition matrix to learn user preference information. Shani et al. 15 proposed an MDP model by formulating the recommender system’s procedure as a Markov decision process tailored for sequential recommendations. SMF 16 is a sequence-aware recommendation method that captures the dynamic changes of user preferences using a hidden Markov model. However, these methods based on MC only concentrate on capturing the local sequence transition relationship between adjacent actions, neglecting the global dependencies within the entire sequence. Both S-POP 17 and Item-KNN 18 are classic rule-based recommendation methods. S-POP leverages item popularity analysis to predict user preferences and identify trending items, while Item-KNN recommends the K most similar items by establishing similarity rules. Nevertheless, neither of these techniques takes into account the impact of the sequential order of interactions within sessions.
With their natural ability to handle sequential data, RNNs are widely used in session-based recommendation tasks and have achieved remarkable results. GRU4Rec 8 is the first model for SRS that uses the Gated Recurrent Unit (GRU) to process the sequential relationship of items in a session. Tan et al. 19 proposed an improved method for GRU4Rec, which enhances recommendation performance by introducing data augmentation and accounting for temporal shifts in user behavior. Session recommendation methods based on RNN gained popularity in the subsequent years. NARM 9 and STAMP 10 are two notable works that have been proposed after GRU4Rec. Compared to GRU4Rec, both approaches incorporate attention mechanisms with RNN to construct hybrid recommendation models. NARM utilizes an attention mechanism to assign varying weights to items in the user’s current session. This helps to capture changes in user interests. STAMP introduces a short-term memory priority mechanism that prioritizes the impact of the most recent user session item on subsequent user behavior. Since then, the RNN-based session recommendation methods have been further explored. Wang et al. 20 devised an approach that employs RNNs at both the cross-domain and individual user levels to trace users’ collective preferences across multiple domains. Sheng et al. 21 proposed time-based directional attention incorporated with RNN to enhance the accuracy of user preference modeling by detecting sequential signals within sessions. Zhang et al. 22 presented the MBPI hybrid model, based on a concurrent GRU framework. While RNN-based strategies successfully tackle the session-item interrelation, they often prioritize the temporal sequence of items to a degree that may result in overfitting problems.
Caser 7 is the pioneer model that introduced convolutional neural networks (CNN) in SRS. It signifies the start of researchers delving into the utilization of other intricate relationships in SRS that are challenging to uncover in RNNs. Among them, CNN-based and self-attention-based methods have been well explored. HMN 23 utilizes CNNs to analyze item representations and extract multi-scale features, enabling the capture of users’ preferences at the feature level. CSRM 24 proposes a hybrid framework that utilizes two parallel memory encoders to model the session and neighborhood information. Despite the significant success demonstrated by these methods, they are also linked to a noteworthy limitation: an overreliance on sequential connections between neighboring items, leading to other higher-order complex relationships without adjacency being ignored. SASRec 25 effectively captures the inherent dependencies between items within a user session using self-attention, regardless of the length of the sequence. It achieves superior performance compared to methods based on MC, RNN, and CNN.
Graph neural networks (GNN) have been successfully incorporated into SRS, where they excel at capturing intricate transition relationships between vertices. SR-GNN 12 represents the pioneering application of the GNN in SRS. It uses gated graph neural networks (GGNN) to model item sequential transform patterns and integrate transient and long-term features with an attention mechanism for sessions. GC-SAN 26 designed a hybrid framework model combining GNN and self-attention, where GNN is used to capture local dependencies, while self-attention is used to learn long-distance dependencies. FGNN 27 proposes a weighted attention graph layer as an encoder to encode the features of items in a session. Meanwhile, it adopts a Readout function to generate embeddings of sessions. TAGNN 28 uses a target-aware attention GNN to adaptively activate users’ different interests in different target items and capture rich item transitions in the session. DSGNN 29 uses lightweight gating networks to combine dynamic and static intents to improve prediction accuracy. Nevertheless, these methods prioritize pairwise item relations over the strict temporal order found in RNN-based approaches, they overlook intricate many-to-many correlations among items within a session. In real-world situations, item transitions are frequently influenced by the combined impact of preceding items and the complex interrelations among items.
Due to the inherent capability of hypergraphs to represent complex high-order relationships among items, methods based on Hypergraph Neural Networks (HGNNs) 30 , 31 , 32 have garnered significant interest among researchers. Currently, research on the topic is still in the initial stage, with only a limited number of relevant studies available. HGNN 30 is recognized as the first hypergraph convolutional network. It applies the clique expansion technique to approximate hypergraphs as graphs, thereby simplifying the problem to fit within the graph embedding framework. UniGNN 33 proposes a unified hypergraph framework that models the messaging process in graph neural networks using two-stage hypergraphs, which it claims can generalize the GNN model to hypergraphs for downstream tasks. In the latest research, Wang et al. 34 proposed HyperRec model, which uses hypergraph multi-layer convolution to capture user dynamic preferences and short-term intentions for session recommendation. Gao et al. 35 proposed a self-supervised dual hypergraph learning model SDHID with intention disentanglement, which fuses the hypergraph and capsule network to learn the embedding representation of the vertices, and then forms the final embedding of the session through the self-attention aggregation mechanism. Xia et al. 36 introduced the self-supervised hypergraph transformer (SHT) framework, which enhances user representations by incorporating global collaborative relationships through a hypergraph transformer network. Li et al. 37 utilized a heterogeneous HGNN for friend recommendation tasks using human mobility data, demonstrating the flexibility of hypergraph models in capturing intricate spatiotemporal information. Overall, hypergraph neural networks have demonstrated effectiveness in various recommendation scenarios, providing a new perspective for a more comprehensive understanding of user interactions and preferences, resulting in improved recommendation accuracy and novelty.
Closely relevant to our work is the SHARE 32 , which constructs a sub-hypergraph for each session and employs a hypergraph attention layer to aggregate item information to generate session representation. Although SHARE can capture high-level complex item relationships, it almost ignores the sequence-dependent information between items in the session and does not consider the impact of different types of interaction context scenarios on user interest. In contrast to SHARE, our proposed C-HAN method meticulously accounts for both consistency and sequential dependencies among items. Another difference is that the interaction context information is introduced to make the model perceive the scenario information of user interaction.
Hypergraph is an expanded concept in graph theory used to model multi-way relationships. Unlike the traditional graph structure where edges connect two vertices, a hypergraph allows for edges to connect multiple vertices, making it better suited for representing higher-order and complex relationships. The definition and corresponding formula for a hypergraph can be given as follows:
( Hypergraph ). A hypergraph 32 G can be defined as a tuple \(G=(V, E)\) , where \(V=\{v_i\}_{i=1}^N\) is a vertex set, and \(E=\{e_i\}_{i=1}^M\) is hyperedge set. A hyperedge is a nonempty subset with multiple vertices, i.e., \(e_i \subseteq V\) . \(H\in \mathbb {R}^{N\times M}\) denote the incidence matrix, \(H_{ij}=1\) if a vertex \(v_i\) and a hyperedge \(e_j\) are connected, otherwise \(H_{ij}=0\) . \(W_{ii}\) is a positive weight for \(e_i\) , and the diagonal matrix \(W\in \mathbb {R}^{M\times M}\) denotes all weight for all hyperedge. We calculate the degrees of each of the vertices and the hyperedges to form two diagonal matrices called \(D\in \mathbb {R}^{N\times N}\) and \(B\in \mathbb {R}^{M\times M}\) , where \(D_{ii}=\sum _{j}^{M}W_{jj}H_{ij}\) , \(B_{ii}=\sum _{i}^{N}H_{ij}\) .
( session recommendation) . Let \(S=\{S_1, S_2, S_3,..., S_{|S|}\}\) be the set of all sessions, and \(V=\{v_1, v_2, v_3,..., v_N\}\) be the set of all unique items involved in the sessions. A user’s session can be represented as a chronological list of the items clicked, i.e., \(S_i=[v_1,v_2,...,v_{n_i}]\) , \(n_i\) is the session length. Let \(C=[C_1, C_2,...., C_K]\) indicate the interaction context categories set, where K is the number of categories. In this paper, interaction context refers to contextual information during session interactions, such as time, holidays, weekdays, etc. Each interaction context \(C_i (1\le i \le K)\) is a set containing a series of contextual values. Given a session \(S_i\) and the corresponding interaction context sequence \(T=[T_1, T_2,..., T_{n_i}]\) , the task of session recommendation is to suggest the top-N items that the user is likely to interact with. Where \(T_i\) is also a sequence that includes the context when the user interacts with item \(v_i\) , which is denoted as \(T_i =[t_i^1,t_i^2,....t_i^K]\) , where \(t_i^k (1 \le k \le K)\) is determined based on the context type.
In this section, we present the proposed model C-HAN in detail, the pipeline of which is presented in Fig. 3 . It has four components: (1) a self-attention module for sequential information learning; (2) hypergraph information embedding with interaction context; (3) session representation learning module to learn the final representation of the session; (4) the prediction layer uses the refined session’s embedding to predict top-N items that user will likely click next.
The overview of the proposed C-HAN model.
It is widely recognized that sequential transition patterns are essential for SRS, as they encompass the temporal correlation of user behavior, interest evolution, long-term dependency relationships, and other relevant information. This information effectively enhances our understanding of user behavior, improves recommendation accuracy, and contributes to a better user recommendation experience. In this paper, we adopt the self-attention mechanism to model the sequence dependency transition patterns in sessions.
Technically, we represent each item \(v_i\) in a session \(S_i=[v_1, v_2,..., v_{n_i}]\) as an embedding vector with d dimensions, which can be obtained by querying the learnable items embedding matrix \(E^V=[e_1^v,e_2^v,...,e_{|V|}^v]\) through the item’s ID with a looking-up layer. Therefore, the embedding representation of session \(S_i\) is \(E_{s_i}=[e_1^v,e_2^v,...,e_n^v]\) , \(E_{s_i}\in \mathbb R^{n_i\times d}\) . Then, following the work characteristics of self-attention, we transform \(E_{s_i}\) into a different latent space and use the sigmoid activating function to inject non-linearity to generate query Q and key K , respectively. Their mathematical formulas are as follows:
Here, \(W^q\in \mathbb {R}^{d\times d}, W^k\in \mathbb {R}^{d\times d}\) are learnable parameters used to implement the spatial transformation.
Following the acquisition of the query and key, the embedding similarity between each pair of items is calculated using the dot product with scaling. This process helps establish a correlation matrix that represents the relationships between items. To prevent high similarity scores for identical items, we utilize a masking operation inspired by the approach described in 25 . The correlation matrix is calculated as follows:
\(C\in \mathbb {R}^{n_i\times n_i}\) , \(\sqrt{d}\) used to scale attention points.
By analyzing the affinity matrix, our objective is to assess the importance of an item by evaluating its similarity scores with other items. Lower similarity scores indicate that the item is not particularly important, possibly resulting from accidental or curious user interaction. Conversely, if an item shows high similarity to most items in a session, it signifies that the item represents the user’s primary preference and holds greater importance. Building on this understanding, we measure an item’s importance by calculating the average similarity between the item and other items within a session.
\(\alpha _i\) denotes the importance score assigned to item \(v_i\) within a session, and \(C_{ij}\in C\) . To ensure score normalization, we utilize a softmax layer. Consequently, the overall importance of items \(\beta \) can be shown as:
Hypergraph construction.
Drawing inspiration from 31 , we propose adopting hypergraph to model sessions, which is formally expressed as \(G=(V, E)\) , where, a hyperedge is used to model each session, the elements of the vertex set V are all unique items in the session. Figure 4 shows the construction process from sessions to the hypergraph. In contrast to the sequential dependence captured by traditional methods, our approach establishes connections between items within each hyperedge to better capture the complex relationships and transitions among them.
An example of a hypergraph created by three sessions, with each hyperedge marked with a dashed line of a different color.
To capture the impact of contextual information on item learning during user interactions and enhance the adaptability of user interest features to the context, we incorporate a weighted contextual representation for each item within the hyperedge using soft attention. This contextual representation indicates the impact of different types of contextual scenarios.
Specifically, for each item \(v_i\) in a session, given its corresponding interaction context \(T_i = [t_i^1, t_i^2, ..., t_i^K]\) , where \(t_i^k (1 \le k \le K)\) is the value of a certain type of interaction context. We initially represent each of them as one-hot vectors, which are then converted into dense embedding representations \(e_i^{T,k}\in \mathbb {R}^d\) with dimension d by querying a learnable parameter matrix \(C_k\) . Therefore, all context embeddings in \(T_i\) form a context embedding matrix \(E_i^T\in \mathbb R^{K \times d}\) . Next, we use an attention mechanism as shown in Eq. (6) to generate a weight vector \(W_i\in \mathbb R^K\) , where each element \(w_i^k\) represents the degree of influence of the contextual scenario type on user interest when interacting with item \(v_i\) . Utilizing the weight vector, the weighted contextual representation of the user based on the contextual interaction type can be obtained using Eq. ( 7 ). The mathematical representation of Eqs. ( 6 ) and ( 7 ) are shown below:
where \(q_c\in \mathbb R^d\) , \(W_T \in \mathbb R^{d \times d}\) , \(E_i^T \in \mathbb R^{d \times K}\) .
Finally, in a session, the embedding representation of an item \(v_i\) is modeled as an integration \(e_i^{T,*}\) , which is denoted as the initial embedding vector \(e_i^v\) and the weighted interaction context representation \(e_i^{T}\) as shown in Eq. ( 8 ). It means that the item has a contextual-aware embedding representation.
where \(\oplus \) denote concatenation. \(W\in \mathbb R^{d\times 2d}\) and \(b_0 \in \mathbb R^d\) represent learnable parameter matrices and parameter vectors, respectively.
We extend the graph attention network to hypergraphs and employ hyperedges as the medium for information propagation to update vertex representations. This process consists of two stages: aggregating information from vertices to hyperedges, and aggregating information from hyperedges to vertices. The mathematical expressions for these two stages are presented below:
By considering GNN as a two-stage aggregation process, we can naturally extend the designs for GNN to hypergraphs. In this framework, \(\varphi _1\) and \(\varphi _2\) represent permutation-invariant functions that handle the aggregation of messages from vertices and hyperedges, respectively. This approach allows us to leverage the key insight and seamlessly adapt GNN designs to the context of hypergraphs. Figure 5 illustrates the update process of HAN (Hypergraph Attention Network).
An illustration of how the GAT can be applied to hypergraphs. ( a ) Toy examples of a hypergraph. ( b ) Two-stage message passing for hypergraph H . Note that edges showing how messages flow to vertex \(v_3\) are marked in red dotted lines.
Specifically, in the first stage, for each session (hyperedge), we employ an arbitrary permutation-invariant function \(\varphi _1\) to aggregate the feature information of all the vertices connected within it. \(\varphi _1\) satisfies \(\varphi _1(\{x_j\}) = x_j\) . In this paper, we utilize a summation function as the aggregation function as shown in Eq. ( 10 ).
where \(x_i=e_i^{T,*}\) , \(W_{jj}\) is a positive weight of the hyperedge \(h_{e_j}\) and be set to 1 for each hyperedge.
In the second stage, we utilize the \(\varphi _2\) function to update the information of each vertex within a hyperedge by leveraging the associated hyperedges. We can draw inspiration from existing GNN approaches to design \(\varphi _2\) . Among numerous methods, GAT (Graph Attention Network) has achieved success and gained widespread attention by assigning different weights to the neighbors of a central node and aggregating their information to update the central node’s features. As mentioned above, Eq. ( 9 ) shows that GNN-based methods can easily extend our framework. Therefore, we have decided to apply GAT to update vertex information in the hypergraph, which works as follows:
where \(\sigma \) is the LeakyRelu function, \(W\in \mathbb R^{d\times d}\) and \(a \in \mathbb R^{2d}\) are the learnable attentional parameters. \(\mathcal {N}(e_i)=\{e\in \mathcal {E}|v_i\in e\} \) is the incident-edges of vertex \(v_i\) , i.e., the set of all hyperedges containing vertex \(v_i\) .
We propagate and update the representation of vertices in multiple hypergraph convolution layers. Additionally, we adopted the method mentioned in literature 30 , where the non-linear activation function and weight matrix between different convolutional layers were removed to reduce computational complexity. Based on formulas ( 11 ) to ( 13 ), we established the following definition to represent the t th vertex:
where \(x_t^{l+1}\) is the t th item’s representation at the \((l+1)\) th layer.
In this paper, we designed L layers to propagate \(x^0\) in HAN and averaged the learned vertex representations \(x_i^l\) of each layer to obtain the item representation based on HAN:
The input \(x_i^0\) in the 0th layer is initialized with the value obtained from Eq. ( 8 ), i.e., \(x_i^0 = e_i^{T,*}\) .
Within this layer, we construct the ultimate embedding representation of a session by considering two key factors: the consistency information and the sequential dependency information. These distinct types of information are amalgamated through soft attention to form the session’s representation.
The self-attention mechanism in the above section has provided us with relevance scores between each item and the intent of the session. We use these relevance scores as weights to combine with the embedding representations of the items, performing a weighted average operation as the long-term interest representation.
Under the approach proposed in reference 10 , we utilize the last item’s embedding as the instantaneous interest, denoted as \(R_S=e_n^v\) . Incorporate with the long-term preference, we construct the final sequential pattern representation as follows:
where \(W\in \mathbb R^{d\times 2d}\) is a learnable transformation parameter matrix.
We propagate the embedded representations of items with weighted context through our carefully designed hypergraph attention network across multiple convolutional layers, ultimately forming hypergraph embeddings for each item. In this process, for all items in an interactive session, we utilize an average aggregation function to model the consistency,
To model the final intent of user interaction sessions by considering both sequential and consistency, we employ a soft-attention mechanism to automatically blend the two and obtain the final embedding representation as follows:
where \(\alpha \) represents the weight of the fusion of a participating term, and \(W\in \mathbb R^{d\times 2d}\) is a learnable parameter matrix.
Based on the learned session feature vector \(Z_f\) mentioned above, we calculate the similarity score of an item \(v_j\in V\) in session \(S_i\) using the inner product. This allows us to obtain a score vector \(\hat{Z}_{S_i}\) for all items in the candidate set relative to the session \(S_i\) , where \(\hat{Z}_{S_i,j}\) represents the j -th element in this vector. Then, we use the softmax function to transform it into a click probability vector \(\hat{y}_{s_i}\) . The computation is represented as follows:
where \(\hat{y}_{S_i}=[\hat{y}_{S_i,1},\hat{y}_{S_i,2},...,\hat{y}_{S_i,N}]\) , and \(\hat{y}_{S_i,j}\) represents the probability of the user clicking on item \(v_j\) in the next instance. Finally, the top- N items with the highest score in \(\hat{y}_{S_i}\) will be recommended to the user.
For session \(S_i\) , to optimize the parameters of the model, the cross-entropy function is used as the objective function. Algorithm 1 provides the training process of C-HAN and the formal expression for the cross-entropy function is as follows:
The process of C-HAN training.
The purpose of this section is to assess and analyze the effectiveness of C-HAN. Furthermore, we seek to explore and address the following four research questions:
RQ1: Is C-HAN competitive with other baseline methods?
RQ2: What is the performance of the C-HAN model across varying session lengths?
RQ3: Does the inclusion of context information enhance the performance of C-HAN?
RQ4: Is the attention-based fusion approach employed by the model fusion layer able to achieve competitive performance?
To confirm the effectiveness of C-HAN, we conducted experiments on three benchmark datasets, namely ML-1M, Delicious-2K, Yoochoose. The statistics of them are shown in Table 1 .
We arranged the user-clicked item sequences in chronological order for the ML-1M and Delicious-2K datasets. Afterward, we split the user interaction sequences into a training set and a test set. The test set comprised the last 10 days for ML-1M and the last month for Delicious-2K, while the remaining data was used for the training set. Three types of context information were extracted, namely week (7 days), month (12 months), and working day indicators, resulting in a total of 168 context values. For the Yoochoose dataset, we excluded the last 1 day of data and used it as the training set. Additionally, we filtered out sessions with a length of less than 5 or containing fewer than 5 clicks on an item. To augment the training data, we employed a sliding window technique to split the sequences, as suggested in previous literature 9 , 21 . Given the extensive size of the Yoochoose dataset, we used only 1/64 of its data for training and testing, known as Yoochoose-1/64. We also collect four types of context information, i.e., 7 days a week, working day indicators, 6 category types, and 4 time periods in a day, with a total of 336 context values. Table 1 presents comprehensive static statistics of the three processed datasets.
We utilize three widely used evaluation metrics in SRS, i.e., Recall @ K ( R @ K ), MRR @ K , and Precision @ K ( P @ K ), to measure the performance of our approach as well as other comparison methods. These metrics are defined as follows:
here, N refers to the total number of items that the user truly likes in the test set, | hit | is the number of items in the recommendation list that match the user likes, and Rank ( t ) represents the rank of the item the user truly likes in the recommendation list.
In our comparative analysis, we evaluate our method against the following representative baselines:
CASER 7 proposes a personalized top-N sequential recommendation method based on convolutional sequence embedding.
SRGNN 12 is the first model that utilizes GNN for session recommendation.
HyperRec 34 utilizes hypergraphs to represent complex relationships between items, enabling a more comprehensive understanding of multi-order connections for the next-item recommendation.
SERec 38 Learn user and item representations by integrating social network knowledge through heterogeneous graph neural networks.
SDHID 35 introduces hypergraphs and capsule networks to learn vertex embedding, and obtains representations of intra-session patterns by aggregating item embeddings with attention weights.
SHARE 32 employs HGAN to aggregate item information to generate session representation.
We search the general embedding size d from \(\{50,100,150,200,250\}\) and set \(d=100\) . To enhance the generalization capability and mitigate overfitting, we incorporated a random discard approach with a ratio of 30% coupled with \(L2=10^{-5}\) regularization. We utilized the batch method with a size of 256 and employed the Adam optimizer to optimize our model, with a learning rate of 0.001. To prevent overfitting, we applied a decay rate of 0.1 after every three epochs.
To address RQ1 and showcase the comprehensive performance of C-HAN, we conducted a comparison against various baseline methods across three different datasets, with the findings presented in Tables 2 and 3 . The following are some key findings we have summarized.
Tables 2 and 3 show that C-HAN consistently outperforms other baseline methods on all three benchmark datasets. This provides strong evidence for the effectiveness and validity of our strategy in addressing RQ1. Based on further statistical analysis of the data in Tables 2 and 3 , we found that C-HAN achieved the highest average improvement in the precision metric across the three datasets, with an improvement of 6.55%. This was followed by an average improvement of 6.17% in MRR and 5.92% in Recall. The highest average improvement across all performance metrics was observed in the ML–1M dataset, with a value of 10.52%, followed by Yoochoose-1/64 with 4.52%, and finally Deletious-2K with an average improvement of 3.61%. More detailed statistics of the improvement of C-HAN over the runner-up baseline method are shown in Table 4 .
Overall, among all the compared methods, the GNN-based methods achieve better performance than other methods such as Caser. This reflects the unique superiority of graph neural networks in sequential recommendation, because it models user interaction as a graph structure and can accurately capture the complex transformation relationship between vertices. Observing further, we find that introducing hypergraph techniques into the graph embedding learning process achieves better performance. For example, SDID using the hypergraph technique achieves the best performance of all baseline methods. It is directly proved that the design of allowing an edge to be associated with multiple vertices in a hypergraph can better capture the multi-modal information in the graph.
Let’s now narrow our focus on Table 2 to derive the performance capabilities of C-HAN. By comparison, it is known that C-HAN achieves the highest performance than the runner-up method on the ML-1M dataset and the statistics in Table 4 also support this point. This progress can be attributed to the relatively limited candidate itemset size of 3417 in ML-1M, which allows C-HAN to improve the hit rate and ranking of target items while reducing the number of candidate items. Another factor may be that ML-1M has a large-scale data volume, indicating that our method has strong adaptability to large-scale data sets. We also find that Recall @20 and Precision @20 have roughly comparable average improvement rates (6.04% vs. 6.84%) on the three datasets, which indicates that the model tends to increase the coverage of recommendations while ensuring the accuracy of recommendations, that is, it is good at discovering new items that users have not yet interacted with but may be interested in.
To investigate RQ2, we carried out a comparative study on three datasets with different session lengths, examining the effectiveness of the C-HAN, SRGNN, CASER, and SHARE methods. Their performance metrics, including Recall @20 and MRR @20, are depicted in Fig. 6 . Noted that, due to space constraints, the examination of RQ2– RQ4 later in this section relies on the two evaluation measures, Recall @20 and MRR @20.
The model’s effectiveness across various session lengths in three datasets, where each row represents the performance metrics of Recall @20 and MRR @20.
Based on Fig. 6 , when considering Recall @20, the performance of SRGNN, CASER, and SHARE follows a common pattern: initially increasing, then sharply declining as the session length increases. In contrast, our model achieves optimal performance and maintains stability with a gradual decline. This can be attributed to the fact that as the session length increases, more interactive items offer additional information to the model regarding user intentions, resulting in improved accuracy in detecting user intentions. Nonetheless, when session lengths surpass a certain threshold, CASER and SHARE will struggle to handle the increased complexity of user behavior patterns and the growing amount of irrelevant items. This can result in an influx of noise in the models’ performance. Generally, SHARE outperforms CASER, primarily due to its ability to identify higher-order relations of items. The efficiency of our model comes from its ability to detect sequential signal patterns while maintaining user intent consistency. This helps the model mitigate the impact of irrelevant factors and reduce sensitivity to session length. Experimental results further confirm the ability of C-HAN to manage long sessions.
There is a consistent decrease in MRR @20 as session length increases for all models. C-HAN consistently outperforms SHARE, SRGNN, and CASER in all session lengths. It is worth noting that the decline in MRR @20 is more pronounced compared to the decline in Recall @20 for these datasets. This difference can be attributed to the fact that irrelevant items have a stronger negative impact on MRR @20 than on Recall @20.
Influence of context information on the performance of the Recall @20 and MRR @20.
To examine the impact of context information and the attention-based mechanism, we introduce two modified versions of C-HAN: C-HAN-C and C-HAN-A. C-HAN-C refers to the variant that excludes context information, while C-HAN-A represents the version that incorporates solely context information without utilizing the attention-based mechanism. Through these variations, we aim to investigate the specific contributions of each component in the C-HAN model.
Figure 7 depicts the comparative analysis between C-HAN and the two variants. The results presented in Fig. 7 showcase the superior performance of C-HAN in comparison to the other two models. Notably, when compared to C-HAN-C, C-HAN exhibits noteworthy enhancements in Recall @20, with improvements of approximately 2.82%, 3.13%, and 2.50% across the three datasets. Additionally, C-HAN demonstrates substantial enhancements in MRR @20, with improvements of about 17.68%, 3.12%, and 7.38% across the same datasets. These significant improvements underscore the crucial role of context information in session-based recommendations. The presence of rich semantic information within the context enables effective modeling of user behaviors, thereby contributing to the enhanced performance observed in C-HAN.
In comparison to C-HAN-A, C-HAN demonstrates improvements in Recall @20 by approximately 2.10%, 0.58%, and 1.22% across the three datasets, and also shows enhancements in MRR @20 of about 12.70%, 2.06%, and 2.56% across the same datasets. These findings highlight the distinct influences of different types of context information on the learning of item representations. It suggests that the combination of context information and attention-based mechanisms in C-HAN contributes to more accurate and effective learning of item representations, which is superior to solely relying on context information without the attention-based mechanism. Therefore, it is essential to integrate both aspects for optimizing the session-based recommendation process.
Comparison of different fusion strategies on session representation learning.
To answer RQ4, we experiment with two linear integration techniques to build the ultimate user intention representations at the session representation stage, after which we contrast these with our chosen methodology. The two transformation models are referred to as follows: (1) C-HAN-CO: substituting the soft attention utilized in session representation learning with concatenation, i.e., \(Z=R_{f_s} \oplus R_{f_c}\) , \(\oplus \) denotes vector connection. (2) C-HAN-IP: employs the inner product operation as a fusion Pattern, i.e., \(Z=R_{f_s} \odot R_{f_c}\) , where \(\odot \) denotes element-wise multiplication. The results of the different fusion strategies are presented in Fig. 8 .
Figure 8 reveals that C-HAN exhibits superior performance across two evaluation metrics on the three datasets, with C-HAN-IP and C-HAN-CO trailing behind. This result can be attributed to the adaptability of C-HAN in adjusting the weights between sequential transition signals and interaction consistency information, which compose the session representation, based on variations in user interaction context or scenarios. This adaptive adjustment enables C-HAN to capture the user’s intent more accurately.
We have developed a novel C-HAN model to effectively capture the intricate relationships between items within a session and accurately discern user interaction intentions across various contextual interaction scenarios. Our model is distinctive in its ability to concurrently capture sequential dependencies and consistency information among session items, while also considering the influence of diverse interactive contextual information on changes in user interests. C-HAN propagates items with different types of interactive contexts through hypergraph attention convolution layers, iteratively learning consistency. Additionally, it leverages self-attention mechanisms to capture sequential dependencies among multiple items within each session. Ultimately, the model adaptively integrates these two types of information using a soft-attention mechanism to comprehensively represent the session. The results of experiments conducted on three authentic datasets attest to the remarkable performance of C-HAN when compared to other baseline methods. In future work, we will expand C-HAN into social recommendation tasks, introduce users’ trusted social relationships, and integrate the influence of multi-modal behaviors generated by users, such as comments, adding favorites, adding shopping carts, etc. Such information helps to capture users’ intentions accurately to improve the accuracy of session-based recommendations.
The datasets used in experiments can be downloaded from the following URLs: ML-1M: http://www.grouplens.org/node/73 , Delicious-2K: https://grouplens.org/datasets/hetrec-2011 , Yoochoose: http://2015.recsyschallenge.com/challege.html .
Malik, S., Rana, A. & Bansal, M. A survey of recommendation systems. Inf. Resour. Manag. J. 33 , 53–73 (2020).
Google Scholar
Zhao, Z. et al. Dual feature interaction-based graph convolutional network. IEEE Trans. Knowl. Data Eng. 35 , 9019–9030 (2023).
Li, C. & He, K. CBMR: An optimized mapreduce for item-based collaborative filtering recommendation algorithm with empirical analysis. Concurr. Comput. Pract. Exp. 29 (2017).
Abdi, M. H., Okeyo, G. O. & Mwangi, R. W. Matrix factorization techniques for context-aware collaborative filtering recommender systems: A survey. Comput. Inf. Sci. 11 , 1–10 (2018).
Cheng, H. et al. Wide & deep learning for recommender systems. In DLRS@RecSys . 7–10 (2016).
Rendle, S., Freudenthaler, C. & Schmidt-Thieme, L. Factorizing personalized Markov chains for next-basket recommendation. In WWW . 811–820 (2010).
Tang, J. & Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In WSDM 2018, Marina Del Rey, CA, USA, February 5–9, 2018 . 565–573 (2018).
Hidasi, B., Karatzoglou, A., Baltrunas, L. & Tikk, D. Session-based recommendations with recurrent neural networks. In ICLR (Poster) (2016).
Li, J. et al. Neural attentive session-based recommendation. In CIKM . 1419–1428 (2017).
Liu, Q., Zeng, Y., Mokhosi, R. & Zhang, H. STAMP: short-term attention/memory priority model for session-based recommendation. In KDD . 1831–1839 (2018).
Wu, S., Liu, Q., Wang, L. & Tan, T. Contextual operation for recommender systems. IEEE Trans. Knowl. Data Eng. 28 , 2000–2012 (2016).
Wu, S. et al. Session-based recommendation with graph neural networks. In AAAI . 346–353 (2019).
Xia, X. et al. Self-supervised hypergraph convolutional networks for session-based recommendation. In AAAI 2021, Virtual Event, February 2–9, 2021 . 4503–4511 (2021).
Guo, J. et al. Learning multi-granularity consecutive user intent unit for session-based recommendation. In WSDM . 343–352 (2022).
Shani, G., Brafman, R. I. & Heckerman, D. An MDP-based recommender system. CoRR arXiv:1301.0600 (2013).
Eskandanian, F. & Mobasher, B. Modeling the dynamics of user preferences for sequence-aware recommendation using hidden Markov models. In FLAIRS . 425–430 (AAAI Press, 2019).
Adomavicius, G. & Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17 , 734–749 (2005).
Sarwar, B. M., Karypis, G., Konstan, J. A. & Riedl, J. Item-based collaborative filtering recommendation algorithms. In WWW . 285–295 (2001).
Tan, Y. K., Xu, X. & Liu, Y. Improved recurrent neural networks for session-based recommendations. In DLRS@RecSys . 17–22 (ACM, 2016).
Wang, Y., Guo, C., Chu, Y., Hwang, J. & Feng, C. A cross-domain hierarchical recurrent model for personalized session-based recommendations. Neurocomputing 380 , 271–284 (2020).
Sheng, Z., Zhang, T. & Zhang, Y. HTDA: Hierarchical time-based directional attention network for sequential user behavior modeling. Neurocomputing 441 , 323–334 (2021).
Zhang, J., Ma, C., Zhong, C., Mu, X. & Wang, L. MBPI: Mixed behaviors and preference interaction for session-based recommendation. Appl. Intell. 51 , 7440–7452 (2021).
Song, B., Cao, Y., Zhang, W. & Xu, C. Session-based recommendation with hierarchical memory networks. In CIKM 2019, Beijing, China, November 3–7, 2019 . 2181–2184 (2019).
Wang, M. et al. A collaborative session-based recommendation approach with parallel memory modules. In SIGIR 2019, Paris, France, July 21–25, 2019 . 345–354 (2019).
Kang, W. & McAuley, J. J. Self-attentive sequential recommendation. In IEEE International Conference on Data Mining, ICDM 2018, Singapore, November 17–20, 2018 . 197–206 (2018).
Xu, C. et al. Graph contextualized self-attention network for session-based recommendation. In IJCAI 2019, Macao, China, August 10–16, 2019 . 3940–3946 (2019).
Qiu, R., Li, J., Huang, Z. & Yin, H. Rethinking the item order in session-based recommendation with graph neural networks. In CIKM 2019, Beijing, China, November 3–7, 2019 . 579–588 (2019).
Yu, F. et al. TAGNN: Target attentive graph neural networks for session-based recommendation. In SIGIR 2020, Virtual Event, China, July 25–30, 2020 . 1921–1924 (2020).
Zhang, C., Liu, Q. & Zhang, Z. DSGNN: A dynamic and static intentions integrated graph neural network for session-based recommendation. Neurocomputing 468 (2022).
Feng, Y., You, H., Zhang, Z., Ji, R. & Gao, Y. Hypergraph neural networks. In AAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019 . 3558–3565 (2019).
Bai, P. H. S., Feihu, T. Hypergraph convolution and hypergraph attention. In Pattern Recognition: The Journal of the Pattern Recognition Society 110 (2021).
Wang, J., Ding, K., Zhu, Z. & Caverlee, J. Session-based recommendation with hypergraph attention networks. In SDM 2021, Virtual Event, April 29–May 1, 2021 . 82–90 (2021).
Huang, J. & Yang, J. UNIGNN: A unified framework for graph and hypergraph neural networks. In IJCAI 2021, Virtual Event/Montreal, Canada, 19–27 August 2021 . 2563–2569 (2021).
Wang, J., Ding, K., Hong, L., Liu, H. & Caverlee, J. Next-item recommendation with sequential hypergraphs. In SIGIR . 1101–1110 (ACM, 2020).
Gao, R. et al. Self-supervised dual hypergraph learning with intent disentanglement for session-based recommendation. Knowl. Based Syst. 270 , 110528 (2023).
Xia, L., Huang, C. & Zhang, C. Self-supervised hypergraph transformer for recommender systems. In KDD . 2100–2109 (ACM, 2022).
Li, R., Zhang, L., Liu, G. & Wu, J. Next basket recommendation with intent-aware hypergraph adversarial network. In SIGIR . 1303–1312 (ACM, 2023).
Chen, T. & Wong, R. C. An efficient and effective framework for session-based social recommendation. In WSDM . 400–408 (ACM, 2021).
Download references
This work was supported by the Natural Science Foundation of Inner Mongolia (2023LHMS06025), and the Basic Scientific Research Foundation of Colleges and Universities Directly under the Inner Mongolia Autonomous Region (GXKY22135).
Authors and affiliations.
College of Computer Science and Technology, Inner Mongolia Minzu University, Tongliao, 028000, China
Zhigao Zhang, Hongmei Zhang & Zhifeng Zhang
School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, China
Zhigao Zhang & Bin Wang
You can also search for this author in PubMed Google Scholar
Conceptualization, Zhang Z.G.; methodology, Zhang Z.G.; software, Zhang Z.G.; validation, Zhang Z.G., Zhang H.M., Wang B., and Zhang Z.F.; formal fanalysis, Zhang Z.G.; investigation, Zhang Z.G.; resources, Zhang Z.G.; data curation, Zhang Z.G., Wang B.; writing—original draft preparation, Zhang Z.G.; writing—review and editing, Zhang Z.G., Zhang Z.F., and Wang B.; visualization, Zhang H.M.; supervision, Zhang H.M.; project administration, Zhang H.M.; funding acquisition, Wang B.
Correspondence to Zhifeng Zhang .
Competing interests.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .
Reprints and permissions
Cite this article.
Zhang, Z., Zhang, H., Zhang, Z. et al. Context-embedded hypergraph attention network and self-attention for session recommendation. Sci Rep 14 , 19413 (2024). https://doi.org/10.1038/s41598-024-66349-7
Download citation
Received : 05 March 2024
Accepted : 01 July 2024
Published : 21 August 2024
DOI : https://doi.org/10.1038/s41598-024-66349-7
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.
New citation alert added.
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Please log in to your account
Bibliometrics & citations, view options, recommendations, local structure in graph classes, local antimagic vertex coloring of a graph.
Let $$G=(V,E)$$G=(V,E) be a connected graph with $$\left| V \right| =n$$V=n and $$\left| E \right| = m.$$E=m. A bijection $$f:E \rightarrow \{1,2, \dots , m\}$$f:E {1,2, ,m} is called a local antimagic labeling if for any two adjacent vertices u and v, $...
Spectral clustering with graph learning usually performs eigen-decomposition on the adaptive graph to obtain embedded representation for clustering. In terms of adaptive graph learning, the embedded representation is usually treated as the principal ...
Published in, publication history.
Other metrics, bibliometrics, article metrics.
Login options.
Check if you have access through your login credentials or your institution to get full access on this article.
Share this publication link.
Copying failed.
Affiliations, export citations.
We are preparing your search results for download ...
We will inform you here when the file is ready.
Your file of search results citations is now ready.
Your search export query has expired. Please try again.
You have full access to this open access article
Explore all metrics
Digital Soil Mapping (DSM) is fundamental for soil monitoring, as it is limited and strategic for human activities. The availability of high temporal and spatial resolution data and robust algorithms is essential to map and predict soil properties and characteristics with adequate accuracy, especially at a time when the scientific community, legislators and land managers are increasingly interested in the protection and rational management of soil.
Proximity and remote sensing, efficient data sampling and open public environmental data allow the use of innovative tools to create spatial databases and digital soil maps with high spatial and temporal accuracy. Applying machine learning (ML) to soil data prediction can improve the accuracy of maps, especially at scales where geostatistics may be inefficient. The aim of this research was to map the nitrogen (N) levels in the soils of the Nurra sub-region (north-western Sardinia, Italy), testing the performance of the Ranger, Random Forest Regression (RFR) and Support Vector Regression (SVR) models, using only open source and open access data. According to the literature, the models include soil chemical-physical characteristics, environmental and topographic parameters as independent variables. Our results showed that predictive models are reliable tools for mapping N in soils, with an accuracy in line with the literature. The average accuracy of the models is high (R 2 = 0.76) and the highest accuracy in predicting N content in surface horizons was obtained with RFR (R 2 = 0.79; RMSE = 0.32; MAE = 0.18). Among the predictors, SOM has the highest importance. Our results show that predictive models are reliable tools in mapping N in soils, with an accuracy in line with the literature. The results obtained could encourage the integration of this type of approach in the policy and decision-making process carried out at regional scale for land management.
Explore related subjects.
Avoid common mistakes on your manuscript.
Digital Soil Mapping (DSM) has been the main spatial information practice in soil science for many years. This sub-discipline of soil science received international recognition in 2005 with the establishment of a dedicated working group led by IUSS (Arrouays et al. 2017 ). Today, the main processes of DSM are based on geostatistical methods, machine learning (ML) models, and algorithms (Heung et al. 2016 ; Khaledian and Miller 2020 ; Padarian et al. 2019 ; Wadoux et al. 2020 ). Geostatistics refers to methods of studying environmental phenomena based on their spatial variability, starting from real data collected in the field (Hoffimann et al. 2021 ). These tools are widely used for drafting prediction maps, especially through different Kriging algorithms (Keskin and Grunwald 2018 ; Santra et al. 2017 ; Zhang et al. 2020 ). Alongside them, however, ML (i.e., tools obtaining comparable results), is increasingly being used (Taghizadeh-Mehrjardi et al. 2021 ; Wadoux et al. 2020 ).
Indeed, ML is applied in several fields, such as monitoring of hydrogeological risk (Jain et al. 2020 ; Ma et al. 2021 ), wildfire prevention (Elia et al. 2020 ), the prediction of soil physical–chemical parameters (Li et al. 2023a , b ; Li et al. 2022 ; Wang et al. 2021 , 2022 ; Xu et al. 2021 ), and human health (Aghazadeh et al. 2019 ; Piunti 2019 ). Consequently, the number of algorithms to reference is as numerous as the fields of application. Depending on the objective, the sampling characteristics and the dataset, it is necessary to choose one algorithm over another (Li et al. 2023a , b ; Wadoux et al. 2020 ). A relevant aspect in the application of ML is the abundance and quality of databases (Chen et al. 2022 ). In environmental science, the application of ML requires extensive and costly surveying campaigns, which can be supported by existing databases, often shared by institutions and governmental bodies according to the logic of open data (Hengl et al. 2017 ) . It is precisely in the environmental field that we are witnessing in recent years the proliferation of open databases, especially by public institutions (Worthy 2015 ), and in the field of soil science (Orgiazzi et al. 2018 ). Furthermore, the increased use of open data in digital soil mapping is recent and strictly related to the use of new spatial analysis tools, such as Google Earth Engine (GEE), and the availability of large datasets of remote sensing data acquired by satellite missions (Copernicus, Landsat) (Poppiel et al. 2021 ). National and international agencies are developing policies and tools to share soil data, also for scientific purposes, such as the LUCAS soil project implemented by the EU Environment Agency (Orgiazzi et al. 2018 ). Indeed, today almost all medium/large scale studies focused on digital soil mapping integrate field data with updated, publicly managed, high-resolution open data (Radočaj et al. 2024 ; Searle et al. 2021 ). This type of data, coupled by a ML algorithm, appears to be more efficient, also in terms of cost–benefit, than the traditional approach using a geostatistical algorithm (Radočaj et al. 2022a ).
Soil mapping can have two main purposes: i) assignment of a class associated with observed soil, or ii) identification of one or more soil features (Zhang et al. 2017 ). Among these, physical–chemical parameters were extensively investigated to create regional (Brungard et al. 2021 ; Maleki et al. 2023 ), local and field scale distribution maps (Chlingaryan et al. 2018 ; Söderström et al. 2016 ; Zhou et al. 2023 ). Among the chemical parameters, the map elaboration for soil macronutrients (N, P, and K) represents a pivotal step, for environmental and agricultural development agencies, farmers, etc., to understand their spatial distribution and consequently improve nutrient input management while avoiding soil water pollution. Nitrogen is a fundamental macronutrient for the development of plant species, not the least because of the quantities that plants require for sustenance (Högberg et al. 2017 ). In fact, plant species accumulate N in different forms and through different modalities, throughout their life cycle and predominantly during the growth phases (Das et al. 2022 ). The continuous input of N needed by crops has a significant impact on production cycles and markets (Dimkpa et al. 2020 ). Use of N fertilizers has a significant economic weight; this entails careful and constant monitoring over time, to highlight the spatial distribution dynamics of N deficits and surpluses (Singh 2018 ; Wang et al. 2019 ).
The Nurra subregion (northwestern Sardinia) provides an excellent paradigmatic case to explore previously reported questions. Indeed, it encompasses several environmental conditions, passing from natural areas (Parks protected and ruled by laws) to highly productive enterprises, mainly located in plains, and represented by: the production of famous, high-quality wines that are exported around the world; from intensive to semi-intensive agricultural activities; cattle and sheep farming for meat and milk-derived products. Additionally, the area has undergone extensive urbanization due to the presence of extended urban areas (Sassari and Alghero) and famous tourist locations (Arru et al. 2019 ).
However, the objectives of this research were to: i) assess the effectiveness and performance of some ML models using only open access environmental databases; ii) predict N values in soil surface horizons of the Nurra sub-region (Sardinia, Italy) and iii) based on the predicted values, draw up a sub-regional scale map. Only open-access data were used, provided, and implemented by different bodies and organizations at different hierarchical levels. Variables under investigation have been selected through data exploration, i.e., an in-depth analysis of the dataset to study its distribution and main characteristics from a statistical point of view. Random tree models were used since they are in common use and integrated, as algorithms, in several statistical software packages, such as “CART”, “RF” and “Ranger,” of Rstudio (RStudio Team 2011 ). Furthermore, this approach has three important characteristics. It is: i) easy to reproduce with open-source software; ii) powered by public open data; iii) oriented to produce outputs that can be easily integrated into decision-making processes (Fig. 1 ).
Workflow Diagram
The study area, which covers 1,330 km 2 , is located in NW Sardinia (Italy, Fig. 2 ), in the Nurra sub-region (40°48′28.8″N 8°15′14.4″E). Different geological substrates are featured in the area. The most extensive is the limestone formation, followed by pyroclastic flow deposits (south), aeolian sandstones, and gravel (Carmignani et al. 2015 ). The study area is characterized by high pedodiversity, (Aru e Baldaccini 1983) with Alfisols (Rhodoxeralfs, Palexeralfs, Haploxeralf), Inceptisols (Xerochrepts) and Entisols (Fluvents—Xerofluvents, Aquents—Fluvaquents, Psamments—Xeropsamments, ( Keys to Soil Taxonomy, 13th Edition 2022 )) dominating. The main land uses are: agriculture (65%), urban settlements (5%), and natural areas (30%, CORINE Land Cover Copernicus Land Monitoring Service). The vegetative cover is mainly divided into forest vegetation (30%), such as hardwood and coniferous trees, and arable crops (40%), as described by Corine Land Cover (CLC). A part of the forest is located on the coastline of Asinara’s Bay. These are relatively recent conifer plantations placed behind the dunes. Approximately 10% of the surface is occupied by olive trees. The central part of the study area is characterized by irrigated, arable lands.
Study area framework
The construction, implementation, and validation of the dataset is a pivotal part of the mapping process; the predictive results of the model depend on its characteristics and composition. The availability of quality data determines the accuracy of the model; therefore, it is necessary to build a general dataset that includes a carefully selected range of variables that, as a whole, influence the values of the variable we want to predict (Wadoux et al. 2020 ). Only open sources have been used in this work. The use of open sources increases the level of replicability of this research, thus providing the possibility to compare results. Furthermore, as shown by several authors (Ferreira et al. 2022 ; Nussbaum et al. 2018 ; Wadoux et al. 2020 ), the availability of data, especially those related to soil characteristics, stimulates research regarding the conditions of this resource. At the same time, the existence and availability of freely accessible data increases society’s awareness of soil resource issues (Gorelick et al. 2017 ; Orgiazzi et al. 2018 ). In this work, chemical, physical, topographical, and land-use-related predictors are used. In Table 1 , the main characteristics of the predictors are reported (type, source and resolution).
Soil data used in the study are available on the official website of the Sardinian Soil Survey. Footnote 1 These data are provided in ESRI shapefile format with a geometric punctual structure. Each one of these points represents a sample collected by different institutions involved in several projects: Regional Agency (AGRIS, LAORE), University of Sassari and Cagliari. There are 1511 samplings in the study area, each point is associated with the prosaic card’s code and the relative link that contains the profile description and chemical and physical parameters. Unfortunately, 981 of the 1511 maps contained only physical property data, reducing the number of observations available to apply the models. Further data will be added by LUCAS. Footnote 2
Topography a directly and indirectly affects the dynamics of soil N concentrations (Weintraub et al. 2017 ). In this research, we studied the spatial variation of the Topographic Position Index (TPI), which expresses the shape of the space making up the landscape. We demonstrated the relationship between topographic index and N concentration in soil, especially in forest watersheds (Dai et al. 2022 ; Li et al. 2020 ). The data relating to the topography were explored using the Digital Terrain Model (DTM), developed by the cartographic office of the Sardinian Region, available on the Regional Geoportal (Regione Autonoma della Sardegna 2023 ) at the resolution 10 × 10 m. The TPI values were calculated through the SAGAGIS tool (Conrad et al. 2015 ).
Nitrogen is one of the essential macronutrients in vegetation. The color and vigour of the plant depend on the soil N concentration. Soil N is susceptible to runoff due to water-induced soil erosion (Sequi et al. 2017 ). A covariate related to the hydrography of the study area consisted of an estimation of soil water erosion. This estimate was made available by the European Soil Data Centre (ESDAC) and was achieved using the Revised Universal Soil Loss Equation (RUSLE) model. This empirical model is defined by the following equation:
K = Soil Erodibility, (Panagos et al. 2014 );
R = Erosivity, (Panagos et al. 2015a , b , c , d );
C = Vegetation Cover, (Panagos et al. 2015a , b , c , d );
l = Slope length, s = Steepness (Panagos et al. 2015b );
P = Support Practices, (Panagos et al. 2015c ).
This model estimates soil loss per year (t/ha −1 ). Another important dataset, related to hydrography, is the Euclidean distance between the cell and the waterbodies. The presence of water affects N concentration in the surface horizons of soil (Amicabile 2016 ) and is, therefore, included in the data set. Our aim was to assess the influence of these and other predictors to improve the accuracy of the predictions.
Soil N concentrations in the surface horizons are intrinsically linked to vegetation cover conditions (Chen et al. 2014 ), so vegetation data contribute to assessing land degradation processes (Ridwan et al. 2024 ). Therefore, the vegetation index could help to detect and describe soil conditions. The vegetation spectral indices were obtained by combining several satellite images (Chlingaryan et al. 2018 ). One of the covariates selected to represent the vegetation cover was the Normalized Difference Vegetations Index (NDVI), which represents the vigour of the vegetation with a range of values from [−1; +1], interpreted by the color of the leaves (Antognelli 2018 ). This index estimates the vigour of the vegetation by photosynthesis and is found by the satellite image combination, product by Landsat 8, Footnote 3 through the elaborations of the following band:
n o 4 Red (0.64–0.67 µm).
n o 5 Near-Infrared (0.85–0.88 µm).
the band is elaborated through the following equation:
NIR corresponds to the band 5;
VIS corresponds to band 4.
The final NDVI reading is the average of the values and the image detected in the summer and winter seasons in the years from 2016 to 2020. Data of the images are as follows (Table 2 ):
The Exploratory Data and Spatial Analysis (EDA) was implemented using R software. In this study, EDA consisted of analysing the distribution and composition of any predictors, through use of descriptive statistics. It was articulated in five parts: i) data collection, ii) data cleaning, iii) univariate statistics, iv) multivariate statistics, and v) spatial distribution analysis.
Once collected, all data selected in a vectorial dataset in the QGIS workspace (QGIS Development Team 2023 ) covered a wide study area with 100 × 100-m cell grids. The matrix associated with the vectorial grid showed the cell as the row and the variable as the column. The raster dataset was appropriately re-scaled and transformed into a vector dataset using the QGIS raster statistics procedure. The Raster dataset was re-scaled and incorporated into a vectorial dataset using the QGIS raster statistics procedure (QGIS Development Team 2023 ).
In the final dataset, a general check was carried out to identify and remove the null values (NA) and outliers.
Univariate statistics were used to describe the distribution of the values of the predictor and dependent variable.
To detect multicollinearity, we created a correlation matrix. Multicollinearity is a phenomenon that arises during regression analysis when multiple variables exhibit significant correlations not only with the dependent variable but also with each other (Shrestha 2020 ). If two covariates are correlated, it increases the absolute error of the predictions (Daoud 2017 ). Therefore, this analysis helped identify variables that had no impact on prediction quality or, worse, adversely affected it. According to the literature (Chan et al. 2022 ; Lindner et al. 2022 ), we removed the covariates with a correlation coefficient >0.80, because if the value of Pearson correlation coefficient is close to 0.8, collinearity is probable (Shrestha 2020 ).
Another analysis that we conducted on the N value point dataset was the study of spatial autocorrelations, which is the phenomenon associated with the presence of a systematic spatial variation in a variable. A positive spatial autocorrelation is the trend of a site or nearby space to have similar values (Chlingaryan et al. 2018 ; Li et al. 2016 ; Nguyen and Vu 2019 ). The Moran index (Moran 1948 ) enables an estimation of the grade of global spatial autocorrelation. The index is given by:
N is the number of the events;
\({X}_{i}\) and \({X}_{i}\) are the values taken from the intensity at the points i and j with \(i\ne j\) ;
X is the average of the covariate considered;
\({w}_{ij}\) is an element of the matrix containing arbitrary event weights.
The weights are determined according to the contiguity of the events. The range values of the index I are [−1;+1] (Tybl 2016 ). The values closest to 1 and −1 indicate the presence of clustering. While values close to zero indicate a random spatial distribution. This approach could be useful for strengthening model selection. In the absence of high spatial correlation, it is preferable to use multivariate statistical methods rather than geostatistical methods.
This type of model has been used widely in both classification and regression problems. (Wadoux et al. 2020 ) analysed a large amount of peer-reviewed literature and found that, in the case of classification, 80% of the articles contained the application of at least one random tree model. More than one model was chosen in this research, as it is common to use several models of different types to compare results (Wadoux et al. 2020 ; Zhou et al. 2023 ).
The selection of algorithms was based on the results of previous applications in this field. As described by several authors (Wadoux et al. 2020 ), ML tools have not previously considered soil mechanics, phenomena, and properties, but rather learn from the data on which they are trained. For this reason, it can be useful to understand the results of the model applications in similar situations. In this case, to select the models, we search for a similar case study, where the goal is to predict the values of chemical components in the soil (Dai et al. 2022 ; Flynn et al. 2023 ; Forkuor et al. 2017 ; Hengl et al. 2017 ; Li et al. 2023a , b ; Li et al. 2022 ; Prado Osco et al. 2019 ; van der Westhuizen et al. 2023 ; Wadoux et al. 2020 ; Wang et al. 2022 ; Xiaorui et al. 2023 ; Xu et al. 2021 ; Zhou et al. 2023 ). Following the bibliography analysis, the algorithms selected were Random Forest Regression (RFR), Ranger, and Support Vector Machine Regression (SVR).
While the RF and model is often used in fields, such as medicine (Sarica et al. 2017 ), it is also widely used in soil mapping (Wadoux et al. 2020 ).
This method is based on the creation of forests of decision trees to improve the accuracy of predictions, and is, therefore, classified as an ensemble algorithm, i.e. one that includes a number of other models (Zhou et al. 2023 ). Unlike other ML models, RF randomly selects the subset of independent variables to subdivide the nodes (leaves), making it more accurate and further minimising the instability of the trees (Forkuor et al. 2017 ; van der Westhuizen et al. 2023 ). It is possible to choose the number of trees that make up the forest (Tree Number = 500), each of which is created independently using a single sample of the training data.
Ranger is a fast implementation of RF mostly used for large datasets (Wright and Ziegler 2017 ). Both belong to the class of tree models. The Ranger package, implemented in the R workspace, enables managing some other aspects in the model realisation phase.
Specifically, the parameters to be handled in the function are different from those of RF and allow the implementation of model management and refinement. The main ones used in the model training phase are:
Quantreg, if enabled it performs a quantile prediction through a regression forest;
Num.trees, which adjusts the quantity of trees in the forest;
Write.forest, to store the results of the model;
Min.node.size, which is the minimum size of the leaves, the value 5 is recommended for this parameter if a regression is performed .
Importance, which makes a ranking of the importance of the independent variables in the prediction, for regression the importance is based on the value of the variance of the results and is coded with the terminology “ impurity” (Xu et al. 2016 ).
This makes this phase more refined compared to other models. We demonstrated the computational and memory efficiency of a ranger in the implementation done in R software, the algorithm manages many more values and variables in less time than RF, making it very effective and fast compared to other models (Wright and Ziegler 2017 ).
Algorithm 1 RFR Program Code
Algorithm 2 Ranger Program Code
SVR, an extension of Support Vector Machine for Regression issues (Lee et al. 2020 ; Ramedani et al. 2014 ) is not a widely used model in this field, but there are some examples of its application in regression issues to predict the values of different soil properties (Li et al. 2023a , b ; Wang et al. 2021 ; Xu et al. 2021 ; Zhou et al. 2023 ). This algorithm implements a function whose purpose is to predict the dependent variable. One of the reasons we chose this algorithm is the difference in the inner workings of the tree models. SVR formulations are analogous to common linear regression, but there are some differences concerning it (Ramedani et al. 2014 ). This algorithm projects the data into a high-dimension space, through the Kernel function (the choice of kernel depends on the characteristics of the data and can have a significant impact on the performance of the model (Forkuor et al. 2017 )), to identify a separation hyperplane due to the support vector. Into the limit of the vector, managed by the cost parameter (C), the prediction occurs, i.e., the value predicted is located in this range (Adwad and Khanna 2015 ).
Algorithm 3 Ranger Program Code
Two different techniques were used to validate the models. The first divided the model into two parts, in random mode. The larger part of the dataset was used to train the models (training dataset). The second part was used to test the performance of the model on unknown data (test dataset). The split of the dataset was 75% for the training dataset and the rest for the test dataset. The cross-validation, or k-fold cross-validation (CV), is a statistical technique that consists of dividing the training dataset into k parts to limit the overfitting phenomenon. The overfitting problems are essential when one wants to use ML tools, both in the case of classification and regression issues (Berrar 2019 ; Wang et al. 2021 ). According to the bibliography (Aghazadeh et al. 2019 ; Berrar 2019 ; Dharumarajan 2019 ; Hounkpatin et al. 2022 ; Khaledian and Miller 2020 ; Li et al. 2023a , b ; Liu et al. 2022 ; Maleki et al. 2023 ; Mashaba-Munghemezulu et al. 2021 ; Nolan et al. 2018 ; Radočaj et al. 2022b ; Rahman et al. 2020 ; Uddameri et al. 2020 ; Van Der Westhuizen et al. 2022 , 2023 ; Wadoux et al. 2020 ; Wang et al. 2021 ; Xu et al. 2021 ; Zhang et al. 2021 ; Zhou et al. 2023 ), the most widely used and efficient CVs are those with K = 5 and K = 10. In this paper, we have chosen a CV of K = 10.
The metrics used to assess the accuracy of the performance can be different according to the issue at hand. In this paper, we use the metrics that assess the residual of the prediction, i.e., the difference between actual and predicted values. The most common are the coefficient of determination (R 2 ), the root-mean-square error (RMSE), and the mean absolute error (MAE). These metrics are used in several soil mapping cases to compare the performance of the different models chosen (Chlingaryan et al. 2018 ; Dai et al. 2022 ; Lee et al. 2020 ; Liang et al. 2018 ; Prado Osco et al. 2019 ; Wadoux et al. 2020 ; Zhang et al. 2019 ). The formulas are as follows:
\(O\) is the real value of N;
\(P\) is the prediction.
The following table shows the results of the descriptive statistical analysis (Table 3 ):
The final dataset consisted of 300 observations and 18 predictors.
The correlation matrix (Fig. 3 ) did not indicate a high association between the predictors, so we excluded the potential presence of the phenomenon of multicollinearity. Results from the spatial autocorrelations (Fig. 4 ) indicated a value of 0.108. These relationships were, therefore, like random spatial phenomena; in these cases, it might be more appropriate to apply a multivariate statistical algorithm to study the distribution of variables, rather than using a ‘traditional’ geostatistical approach.
Correlation matrix
Moran I scatterplot
In the tree models, it is possible to verify the importance of the variables in the predictions (Figs. 5 and 6 ). The importance of the variables is defined in models such as RFR and Ranger; that is why the evaluation of the importance is based on the deep mechanics of the model when it creates the tree that will compose the random forest in the regression process. The statistics analysed by the function are InNodePurity (Increase in Node Purity), which assesses how the purity of the node (detected by a metric such as the Gini index or the entropy) increases when a node is split based on a specific variable. High values in this case indicate a greater influence of the variable in the node splitting, in this case, process.
Plot of covariates importance in RFR model (RFR2 = RFR standard run; RFR*2 = RFR with tenfold CV )
Plot of covariates importance in Ranger model (Ranger2 = RFR standard run; Ranger*2 = RFR with tenfold CV )
The SOM represents the principal source of organic N in the soil, which amounts to approximately 97–98%. Vegetation accumulated N in the ammoniacal and nitrate forms and returned it to the soil as organic N after death (Sequi et al. 2017 ).
For this reason, we justified the high relevance of SOM. It is important to verify, in a subsequent phase, if there is a spatial relationship between the distribution of the prediction and the values of SO. The class of variables that had the most influence in the prediction of N values were the same in both models (Table 4 ). It was possible to say that the predictors with more influence belonged to the class of chemical characteristics of the soil. The topography, especially altitude, also is important.
Residual analysis was performed on the predictions made in the test phase to assess the performance of the models and their accuracy when working with unknown data. SOM contributes approximately 98% of organic nitrogen in the soil. Most plant accumulate N directly from the soil as ammonium and nitrate. After death, plant N is returned to the soil in organic form (Sequi et al. 2017 ). For this reason and because of the importance of the variable in the prediction, we chose to relate the residuals of the results and the value of the SOM.
The greater density of values in the Ranger prediction corresponded to fewer residual values (Fig. 7 ). This shows that the model generated relatively accurate results, with less deviation from the real value. Most of the results are located in the negative component of the plots, i.e., the model tends to underestimate the prediction relative to the real value. The values that were aligned in the first row of the graph were instead overlapped in the second row, corresponding to the zero value of the y-axis. The model was, therefore, able to predict these specific values without error.
Plot of residual in Ranger model (first row: application without CV tenfold ; second row: application with CV tenfold )
In the model without CV validation, there is an inherent tendency to overestimate values in the range from 0 to 0.5. As can be seen in Fig. 7 , this tendency is eliminated in the model to which tenfold CV validation has been applied. The values that are aligned in the first line in the graph are superimposed at the zero value of the y-axis in the second. The model was therefore able to predict these certain values without committing any errors. The statistics on the residuals of the two applications of the model are shown in Table 5 .
The residual from the RFR model was very similar to the Ranger result. Again, the model showed the previously observed trend, but with greater moderation compared to the Ranger results. Contrary to Ranger, the application of the model with CV did not eliminate all the trends, resulting in an overestimated prediction corresponding to a real value of 0. The density of the predicted values was concentrated near the zero value in both RFR applications with and without CV (Fig. 8 ). The model with CV had a higher accuracy in the density curve, indicating a lower residual between the prediction and the real values.
Plot of residual in RFR model (first row: application without CV tenfold ; second row: application with CV tenfold )
The statistics in Tables 5 and 6 show the affinity between the tree models in this application. Both the mean and the variance were similar. Additionally, in the complex, RFR performance was aligned with the width of the residual distribution. We can say that, even if it is short, RFR residuals assumed a high precision compared to Ranger residuals.
The SVR was influenced by the tendency to overestimate the lowest N, both with and without CV. While in the previous models, the CV limited this type of problem, in this case, the opposite was true. From the plots (Fig. 9 ) we can see an increase in the overestimated values, although the trend observed in the plot showing the relationships with SOM concentration was decreasing.
Plot of residual in SVR model (first row: application without CV tenfold ; second row: application with CV tenfold )
The residual statistics in Table 7 indicated that the model had a wider bound than the other models. This suggests that the predictions had a higher error. The density, although more balanced, was less concentrated near the value of 0 on the x-axis, indicating an increase in the dispersion of the residuals and, therefore, a general increase in the error.
This analysis shows that CV has a positive significant influence on the model performance, regarding the tree algorithms, reducing some negative systematic trends. This does not happen in the case of the SVR algorithm and there have been some difficulties related to the overestimations of the lowest N values.
This analysis demonstrated the reliability of the models in a regression prediction. The results near the real values produced a more solid DSM that was typical of the landscape characteristics. Part of the potential of these tools lies in providing a measure of the error that underlies the process of producing spatial information.
Table 8 shows the metrics related to the quality of the predictions in the training phase. These metrics are used to assess the quality of the model in predicting the training values.
From the values in Table 8 , it is possible to state that the better model performance, in the training phase was obtained by SVR since the algorithm had higher R 2 values and the lowest error metrics. RFR had better quality compared to Ranger. RFR had an R 2 of 0.86 while Ranger has 0.85. Additionally, the RMSE value was lower than Ranger which had a 0.29 for the RMSE, while RFR had 0.27. For the MAE, the opposite occurred as RFR and Ranger had 0.17 and 0.16, respectively.
Table 9 shows the metric values that represent the performance quality of the prediction in the test phase.
In the test phase, the situation was reversed as SVR had the lowest performance quality in terms of the selected metrics. RFR had the highest performance quality, with a prediction that approximated the real value. The values were slightly lower than in the training phase, in fact, the highest R 2 value was obtained by RFR at 0.79. Our results align with findings in other similar works. The R 2 of the RFR model predictions was higher than that obtained by Maleki et al. ( 2023 ), even if the metric error values were worse in this case. The R 2 of RFR and SVR were comparable to those obtained by other researchers (Lee et al. 2020 ; Liang et al. 2018 ), while the RMSE values showed higher precision in respect to those obtained by (Liu et al. 2023 ; Prado Osco et al. 2019 ). SVR resulted in RMSE and R 2 values better than those found by (Xiaorui et al. 2023 ) for the same model application. The MAE values were more moderate than those obtained by Prado Osco et al. ( 2019 ).
The graphs in Fig. 10 show the quality of the predictions for each model. In an optimal state, the predictions (red) should agree with the real values (black dots). In this case, all models had difficulty in predicting the highest values of N. RFR can accurately predict the value of N close to 0. Ranger and SVR cannot accurately predict the value around 0 g Kg −1 of N in the soil, in particular SVR which predicts a negative value.
Graphs of the prediction value
Figure 11 shows the graphs comparing the real N values and the predictions. In an optimal state, the predictions would appear as a perfect diagonal, indicating that the prediction matches the real values. We have used a color scale for the prediction point to show the error: red indicates a high error, orange and yellow indicate a medium error, while the green point indicates a prediction close to the real values. The points in the RFR graph are more aligned along the diagonal, which, when compared to the other graphs, shows the higher quality of its prediction.
Graphs of the distance between the prediction values and the actual values
As the previous graphs show, SVR and Ranger tended to overestimate N values close to or equal to 0, which did not happen in the case of RFR applications. Finally, it is possible to observe how SVR, in some states, obtains negative values in its prediction, in correspondence with a real value equal to 0.
The models were used to produce prediction maps (Fig. 12 ). They showed the distribution of N concentration over the study area and the influence of some critical patterns:
In the western part, where there was a wooded vegetation cover (with a predominance of deciduous trees), the N concentration was higher than in the area occupied by agricultural activity, due to the absence of vegetation with a long-life cycle. Even if there was a contribution of N synthesis in the fertilization phase, the N was subject to different types of losses (e.g., denitrification and leaching).
The same scenario characterized the arable crops and pastures that occupy the central part, while the opposite was true for the area occupied by shrub and tree vegetation.
The hinterland of the city of Sassari (east-central sector) was one of the areas with the higher predicted N values, which was why the area was mostly occupied by olive groves along the city limits.
Predictions Maps
The presence of a large area cultivated almost exclusively with olive trees ensures, in this condition, an adequate soil N concentration, partly due to the fertilizer applied. The low level was concentrated along the coast, where the highest level of urbanization was found. According to Amicabile ( 2016 ), all models showed an increased concentration of N, corresponding to the high levels of SOM. The predictions showed an accumulation of N along the course of the rivers, due to leaching, which manifested itself with a storage towards the lowest part. In the map product of the SVR predictions, this phenomenon was more evident. It was possible to observe high values close to the hydrographic network of the main river (Riu Mannu), localized in the eastern part.
The relationship between N concentrations in the surface horizons clearly shows that in soils of the investigated areas, the N concentrations increased as the ecosystem’s conservation status increased. It clearly shows how in areas with a forest cover (with a prevalence of broad-leaved trees), N concentration is higher than in the same areas occupied by agricultural activities, due to the lack of long-cycle, high-coverage vegetation in the latter. Even if there is an input of synthetic N, due to fertilisation actions in the field, it should be remembered that N in soils is subject to various types of loss (mainly through leaching and denitrification (Amicabile 2016 )). This is true for agricultural areas affected by arable crops or pastures for sheep breeding, while on soils with tree-type vegetation the opposite phenomenon occurs. Evidence of this can be seen in the fact that the models have, in all three cases, identified the maximum content in the areas bordering the city of Sassari, attributable to the massive presence of olive groves.
Concerning the difference between the model predictions, the main difference between the maps predicted by the tree models and the SVR was the localization of the higher values. In the tree models, the higher values of N were localized in the boundaries of the city of Sassari, while the SVR predicted higher values along the western coast of the municipality of Sassari. The RFR and Ranger map products showed a high N value on the surface of the municipality of Sorso (northeast of Sassari) compared to the SVR map. This behaviour could be explained by the difference in performance in the presence of low-density sampling points.
This research was conducted to evaluate the effectiveness and performance of some ML models using only open environmental databases. The use of open-source data will be pivotal in the future, especially due to the large datasets acquired by remote sensing or proximity sensors. However, great importance assumes the possibilities of the use of most effective algorithm. The results showed that the RFR performed strongly. The main outcomes also revealed that by using ML algorithms, it was possible to predict N values at a medium scale coupling large open environmental databases to obtain a reliable performance. More specifically, the applied models showed approximately the same performance, with the RFR showing the highest R 2 while the RSME showed the lowest. The spatial visualization of the results demonstrated the distribution of the N value in a middle-scale map, where it was possible to detect potential critical areas that could require specific actions in the environmental policy framework. Our next steps with this research are to improve the models by incorporating additional data sources to improve the spatio-temporal scale, taking into account the quality of the data, assessed on the basis of a deep exploratory data analysis. Indeed, the high spatio-temporal resolution is crucial for the implementation of effective soil management policies in areas of high human activity density.
The data used to support this study are available by contacting the corresponding author.
Available on: http://www.sardegnaportalesuolo.it/opendata , redacted by Agris Sardegna.
Available on: https://esdac.jrc.ec.europa.eu/projects/lucas
Available on: https://earthexplorer.usgs.gov/
Adwad M, Khanna R (2015) Efficient learning machines. Springer, New York
Book Google Scholar
Aghazadeh M, Orooji A, Kamkar Haghighi M (2019) Developing an intelligent system for prediction of optimal dose of warfarin in Iranian adult patients with artificial heart valve. Front Health Inform 8(1):25. https://doi.org/10.30699/fhi.v8i1.213
Article Google Scholar
Amicabile S (2016) Manuale di Agricoltura (Terza). Ulrico Hoepli
Antognelli S (2018, maggio 28) Indici di vegetazione NDVI e NDMI: Istruzioni per l’uso. Agricolus . https://www.agricolus.com/indici-vegetazione-ndvi-ndmi-istruzioni-luso/
Arrouays D, Lagacherie P, Hartemink AE (2017) Digital soil mapping across the globe. Geoderma Reg 9:1–4. https://doi.org/10.1016/j.geodrs.2017.03.002
Arru B, Furesi R, Madau FA, Pulina P (2019) Recreational services provision and farm diversification: a technical efficiency analysis on Italian agritourism. Agriculture 9(2):42. https://doi.org/10.3390/agriculture9020042
Berrar D (2019) Cross-validation. In: Encyclopedia of bioinformatics and computational biology. Elsevier, pp 542–545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Brungard C, Nauman T, Duniway M, Veblen K, Nehring K, White D, Salley S, Anchang J (2021) Regional ensemble modeling reduces uncertainty for digital soil mapping. Geoderma 397:114998. https://doi.org/10.1016/j.geoderma.2021.114998
Carmignani L, Oggiano G, Funedda A, Conti P, Pasci S (2015) The geological map of Sardinia (Italy) at 1:250,000 scale. J Maps. https://doi.org/10.1080/17445647.2015.1084544
Chan JY-L, Leow SMH, Bea KT, Cheng WK, Phoong SW, Hong Z-W, Chen Y-L (2022) Mitigating the multicollinearity problem and its machine learning approach: a review. Mathematics 10(8):1283. https://doi.org/10.3390/math10081283
Chen B, Liu E, Tian Q, Yan C, Zhang Y (2014) Soil nitrogen dynamics and crop residues. A review. Agron Sustain Dev 34(2):429–442. https://doi.org/10.1007/s13593-014-0207-8
Article CAS Google Scholar
Chen S, Arrouays D, Leatitia Mulder V, Poggio L, Minasny B, Roudier P, Libohova Z, Lagacherie P, Shi Z, Hannam J, Meersmans J, Richer-de-Forges AC, Walter C (2022) Digital mapping of GlobalSoilMap soil properties at a broad scale: a review. Geoderma 409:115567. https://doi.org/10.1016/j.geoderma.2021.115567
Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69. https://doi.org/10.1016/j.compag.2018.05.012
Conrad O, Bechtel B, Bock M, Dietrich H, Fischer E, Gerlitz L, Wehberg J, Wichmann V, Böhner J (2015) System for automated geoscientific analyses (SAGA) v. 2.1.4. Geosci Model Dev 8(7):1991–2007. https://doi.org/10.5194/gmd-8-1991-2015
Dai L, Ge J, Wang L, Zhang Q, Liang T, Bolan N, Lischeid G, Rinklebe J (2022) Influence of soil properties, topography, and land cover on soil organic carbon and total nitrogen concentration: a case study in Qinghai-Tibet plateau based on random forest regression and structural equation modeling. Sci Total Environ 821:153440. https://doi.org/10.1016/j.scitotenv.2022.153440
Daoud JI (2017) Multicollinearity and regression analysis. J Phys Conf Ser 949:012009. https://doi.org/10.1088/1742-6596/949/1/012009
Das PP, Singh KR, Nagpure G, Mansoori A, Singh RP, Ghazi IA, Kumar A, Singh J (2022) Plant-soil-microbes: a tripartite interaction for nutrient acquisition and better plant growth for sustainable agricultural practices. Environ Res 214:113821. https://doi.org/10.1016/j.envres.2022.113821
Dharumarajan S (2019) The need for digital soil mapping in India. Geoderma Reg 16:e00204
Dimkpa CO, Fugice J, Singh U, Lewis TD (2020) Development of fertilizers for enhanced nitrogen use efficiency—trends and perspectives. Sci Total Environ 731:139113. https://doi.org/10.1016/j.scitotenv.2020.139113
Elia M, D’Este M, Ascoli D, Giannico V, Spano G, Ganga A, Colangelo G, Lafortezza R, Sanesi G (2020) Estimating the probability of wildfire occurrence in Mediterranean landscapes using artificial neural networks. Environ Impact Assess Rev 85:106474. https://doi.org/10.1016/j.eiar.2020.106474
Ferreira CSS, Seifollahi-Aghmiuni S, Destouni G, Ghajarnia N, Kalantari Z (2022) Soil degradation in the European Mediterranean region: processes, status and consequences. Sci Total Environ 805:150106. https://doi.org/10.1016/j.scitotenv.2021.150106
Flynn KC, Baath G, Lee TO, Gowda P, Northup B (2023) Hyperspectral reflectance and machine learning to monitor legume biomass and nitrogen accumulation. Comput Electron Agric 211:107991. https://doi.org/10.1016/j.compag.2023.107991
Forkuor G, Hounkpatin OKL, Welp G, Thiel M (2017) High resolution mapping of soil properties using remote sensing variables in South-Western Burkina Faso: a comparison of machine learning and multiple linear regression models. PLoS ONE 12(1):e0170478. https://doi.org/10.1371/journal.pone.0170478
Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R (2017) Google earth engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ 202:18–27. https://doi.org/10.1016/j.rse.2017.06.031
Hengl T, Leenaars JGB, Shepherd KD, Walsh MG, Heuvelink GBM, Mamo T, Tilahun H, Berkhout E, Cooper M, Fegraus E, Wheeler I, Kwabena NA (2017) Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutr Cycl Agroecosyst 109(1):77–102. https://doi.org/10.1007/s10705-017-9870-x
Heung B, Ho HC, Zhang J, Knudby A, Bulmer CE, Schmidt MG (2016) An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265:62–77. https://doi.org/10.1016/j.geoderma.2015.11.014
Hoffimann J, Zortea M, De Carvalho B, Zadrozny B (2021) Geostatistical learning: challenges and opportunities. Front Appl Math Stat 7:689393. https://doi.org/10.3389/fams.2021.689393
Högberg P, Näsholm T, Franklin O, Högberg MN (2017) Tamm review: on the nature of the nitrogen limitation to plant growth in Fennoscandian boreal forests. For Ecol Manage 403:161–185. https://doi.org/10.1016/j.foreco.2017.04.045
Hounkpatin KOL, Bossa AY, Yira Y, Igue MA, Sinsin BA (2022) Assessment of the soil fertility status in Benin (West Africa)—digital soil mapping using machine learning. Geoderma Reg 28:e00444. https://doi.org/10.1016/j.geodrs.2021.e00444
Jain P, Coogan SCP, Subramanian SG, Crowley M, Taylor S, Flannigan MD (2020) A review of machine learning applications in wildfire science and management. Environ Rev 28(4):478–505. https://doi.org/10.1139/er-2020-0019
Keskin H, Grunwald S (2018) Regression kriging as a workhorse in the digital soil mapper’s toolbox. Geoderma 326:22–41. https://doi.org/10.1016/j.geoderma.2018.04.004
Keys to Soil Taxonomy, 13th Edition (2022)
Khaledian Y, Miller BA (2020) Selecting appropriate machine learning methods for digital soil mapping. Appl Math Model 81:401–418. https://doi.org/10.1016/j.apm.2019.12.016
Lee H, Wang J, Leblon B (2020) Using linear regression, random forests, and support vector machine with unmanned aerial vehicle multispectral images to predict canopy nitrogen weight in corn. Remote Sensing 12(13):2071. https://doi.org/10.3390/rs12132071
Li C, Li X, Meng X, Xiao Z, Wu X, Wang X, Ren L, Li Y, Zhao C, Yang C (2023a) Hyperspectral estimation of nitrogen content in wheat based on fractional difference and continuous wavelet transform. Agriculture 13(5):1017. https://doi.org/10.3390/agriculture13051017
Li J, Zhang T, Shao Y, Ju Z (2023b) Comparing machine learning algorithms for soil salinity mapping using topographic factors and sentinel-1/2 data: a case study in the yellow river delta of China. Remote Sensing 15(9):2332. https://doi.org/10.3390/rs15092332
Li R, Xu J, Luo J, Yang P, Hu Y, Ning W (2022) Spatial distribution characteristics, influencing factors, and source distribution of soil cadmium in Shantou City, Guangdong Province. Ecotoxicol Environ Saf 244:114064. https://doi.org/10.1016/j.ecoenv.2022.114064
Li X, McCarty GW, Du L, Lee S (2020) Use of topographic models for mapping soil properties and processes. Soil Systems 4(2):32. https://doi.org/10.3390/soilsystems4020032
Li Z, Wang J, Tang H, Huang C, Yang F, Chen B, Wang X, Xin X, Ge Y (2016) Predicting grassland leaf area index in the meadow steppes of northern China: a comparative study of regression approaches and hybrid geostatistical methods. Remote Sensing 8(8):632. https://doi.org/10.3390/rs8080632
Liang L, Di L, Huang T, Wang J, Lin L, Wang L, Yang M (2018) Estimation of leaf nitrogen content in wheat using new hyperspectral indices and a random forest regression algorithm. Remote Sensing 10(12):1940. https://doi.org/10.3390/rs10121940
Lindner T, Puck J, Verbeke A (2022) Beyond addressing multicollinearity: robust quantitative analysis and machine learning in international business research. J Int Bus Stud 53(7):1307–1314. https://doi.org/10.1057/s41267-022-00549-z
Liu F, Wu H, Zhao Y, Li D, Yang J-L, Song X, Shi Z, Zhu A-X, Zhang G-L (2022) Mapping high resolution national soil information grids of China. Sci Bull 67(3):328–340. https://doi.org/10.1016/j.scib.2021.10.013
Liu J, Yang K, Tariq A, Lu L, Soufan W, El Sabagh A (2023) Interaction of climate, topography and soil properties with cropland and cropping pattern using remote sensing data and machine learning methods. Egypt J Remote Sens Space Sci 26(3):415–426. https://doi.org/10.1016/j.ejrs.2023.05.005
Ma Z, Mei G, Piccialli F (2021) Machine learning for landslides prevention: a survey. Neural Comput Appl 33(17):10881–10907. https://doi.org/10.1007/s00521-020-05529-8
Maleki S, Karimi A, Mousavi A, Kerry R, Taghizadeh-Mehrjardi R (2023) Delineation of soil management zone maps at the regional scale using machine learning. Agronomy 13(2):445. https://doi.org/10.3390/agronomy13020445
Mashaba-Munghemezulu Z, Chirima GJ, Munghemezulu C (2021) Modeling the spatial distribution of soil nitrogen content at smallholder maize farms using machine learning regression and sentinel-2 data. Sustainability 13(21):11591. https://doi.org/10.3390/su132111591
Moran PAP (1948) The interpretation of statistical maps. J Roy Stat Soc Ser B 10(2):243–251. https://doi.org/10.1111/j.2517-6161.1948.tb00012.x
Nguyen TT, Vu TD (2019) Identification of multivariate geochemical anomalies using spatial autocorrelation analysis and robust statistics. Ore Geol Rev 111:102985. https://doi.org/10.1016/j.oregeorev.2019.102985
Nolan BT, Green CT, Juckem PF, Liao L, Reddy JE (2018) Metamodeling and mapping of nitrate flux in the unsaturated zone and groundwater, Wisconsin, USA. J Hydrol 559:428–441. https://doi.org/10.1016/j.jhydrol.2018.02.029
Nussbaum M, Spiess K, Baltensweiler A, Grob U, Keller A, Greiner L, Schaepman ME, Papritz A (2018) Evaluation of digital soil mapping approaches with large sets of environmental covariates. Soil 4(1):1–22. https://doi.org/10.5194/soil-4-1-2018
Orgiazzi A, Ballabio C, Panagos P, Jones A, Fernández-Ugalde O (2018) LUCAS soil, the largest expandable soil dataset for Europe: a review. Eur J Soil Sci 69(1):140–153. https://doi.org/10.1111/ejss.12499
Padarian J, Minasny B, McBratney AB (2019) Using deep learning for digital soil mapping. Soil 5(1):79–89. https://doi.org/10.5194/soil-5-79-2019
Panagos P, Ballabio C, Borrelli P, Meusburger K, Klik A, Rousseva S, Tadić MP, Michaelides S, Hrabalíková M, Olsen P, Aalto J, Lakatos M, Rymszewicz A, Dumitrescu A, Beguería S, Alewell C (2015a) Rainfall erosivity in Europe. Sci Total Environ 511:801–814. https://doi.org/10.1016/j.scitotenv.2015.01.008
Panagos P, Borrelli P, Meusburger K (2015b) A new European slope length and steepness factor (LS-Factor) for modeling soil erosion by water. Geosciences 5(2):117–126. https://doi.org/10.3390/geosciences5020117
Panagos P, Borrelli P, Meusburger K, Alewell C, Lugato E, Montanarella L (2015c) Estimating the soil erosion cover-management factor at the European scale. Land Use Policy 48:38–50. https://doi.org/10.1016/j.landusepol.2015.05.021
Panagos P, Borrelli P, Meusburger K, van der Zanden EH, Poesen J, Alewell C (2015d) Modelling the effect of support practices (P-factor) on the reduction of soil erosion by water at European scale. Environ Sci Policy 51:23–34. https://doi.org/10.1016/j.envsci.2015.03.012
Panagos P, Meusburger K, Ballabio C, Borrelli P, Alewell C (2014) Soil erodibility in Europe: a high-resolution dataset based on LUCAS. Sci Total Environ 479–480:189–200. https://doi.org/10.1016/j.scitotenv.2014.02.010
Piunti V (2019) ALGORITMI DI MACHINE LEARNING SUPERVISIONATO: POSSIBILI APPLICAZIONI NEL SETTORE ASSICURATIVOSANITARIO [UNIVERSITÀ POLITECNICA DELLE MARCHE FACOLTÀ DI ECONOMIA “GIORGIO FUÀ”]. https://tesi.univpm.it/bitstream/20.500.12075/7161/2/TESI%20VALENTINO%20PIUNTI.pdf
Poppiel RR, Demattê JAM, Rosin NA, Campos LR, Tayebi M, Bonfatti BR, Ayoubi S, Tajik S, Afshar FA, Jafari A, Hamzehpour N, Taghizadeh-Mehrjardi R, Ostovari Y, Asgari N, Naimi S, Nabiollahi K, Fathizad H, Zeraatpisheh M, Javaheri F, Rahmati M (2021) High resolution middle eastern soil attributes mapping via open data and cloud computing. Geoderma 385:114890. https://doi.org/10.1016/j.geoderma.2020.114890
Prado Osco L, Marques Ramos AP, Roberto Pereira D, Akemi Saito Moriya É, Nobuhiro Imai N, Takashi Matsubara E, Estrabis N, De Souza M, Marcato Junior J, Gonçalves WN, Li J, Liesenberg V, Eduardo Creste J (2019) Predicting canopy nitrogen content in citrus-trees using random forest algorithm associated to spectral vegetation indices from UAV-imagery. Remote Sens 11(24):2925. https://doi.org/10.3390/rs11242925
QGIS Development Team (2023) QGIS [Software]. Open Source Geospatial Foundation Project. http://qgis.osgeo.org
Radočaj D, Gašparović M, Jurišić M (2024) Open remote sensing data in digital soil organic carbon mapping: a review. Agriculture 14(7):1005. https://doi.org/10.3390/agriculture14071005
Radočaj D, Jurišić M, Antonić O, Šiljeg A, Cukrov N, Rapčan I, Plaščak I, Gašparović M (2022a) A multiscale cost-benefit analysis of digital soil mapping methods for sustainable land management. Sustainability 14(19):12170. https://doi.org/10.3390/su141912170
Radočaj D, Jurišić M, Antonić O, Šiljeg A, Cukrov N, Rapčan I, Plaščak I, Gašparović M (2022b) A multiscale cost-benefit analysis of digital soil mapping methods for sustainable land management. Sustainability 14(19):12170. https://doi.org/10.3390/su141912170
Rahman MM, Zhang X, Ahmed I, Iqbal Z, Zeraatpisheh M, Kanzaki M, Xu M (2020) Remote sensing-based mapping of senescent leaf C: N ratio in the sundarbans reserved forest using machine learning techniques. Remote Sens 12(9):1375. https://doi.org/10.3390/rs12091375
Ramedani Z, Omid M, Keyhani A, Shamshirband S, Khoshnevisan B (2014) Potential of radial basis function based support vector regression for global solar radiation prediction. Renew Sustain Energy Rev 39:1005–1011. https://doi.org/10.1016/j.rser.2014.07.108
Regione Autonoma della Sardegna (2023) Sardegna Geoportale [Webgis]. SardegnaMappe. https://www.sardegnageoportale.it/webgis2/sardegnamappe/?map=download_raster
Ridwan I, Kadir S, Nurlina N (2024) Wetland degradation monitoring using multi-temporal remote sensing data and watershed land degradation index. Global J Environ Sci Manag 10(1):83–96. https://doi.org/10.22034/gjesm.2024.01.07
RStudio Team (2011) RStudio: Integrated Development for R [Software]. RStudio Team (2020). http://www.rstudio.com/
Santra P, Kumar M, Panwar N (2017) Digital soil mapping of sand content in arid western India through geostatistical approaches. Geoderma Reg 9:56–72. https://doi.org/10.1016/j.geodrs.2017.03.003
Sarica A, Cerasa A, Quattrone A (2017) Random forest algorithm for the classification of neuroimaging data in alzheimer’s disease: a systematic review. Front Aging Neurosci 9:329. https://doi.org/10.3389/fnagi.2017.00329
Searle R, McBratney A, Grundy M, Kidd D, Malone B, Arrouays D, Stockman U, Zund P, Wilson P, Wilford J, Van Gool D, Triantafilis J, Thomas M, Stower L, Slater B, Robinson N, Ringrose-Voase A, Padarian J, Payne J, Andrews K (2021) Digital soil mapping and assessment for Australia and beyond: a propitious future. Geoderma Reg 24:e00359. https://doi.org/10.1016/j.geodrs.2021.e00359
Sequi P, Ciavatta C, Milano T (2017) Fondamenti della chimica del Suolo. Pàtron Editore
Shrestha N (2020) Detecting Multicollinearity in regression analysis. Am J Appl Math Stat 8(2):39–42. https://doi.org/10.12691/ajams-8-2-1
Singh B (2018) Are nitrogen fertilizers deleterious to soil health? Agronomy 8(4):48. https://doi.org/10.3390/agronomy8040048
Söderström M, Sohlenius G, Rodhe L, Piikki K (2016) Adaptation of regional digital soil mapping for precision agriculture. Precision Agric 17(5):588–607. https://doi.org/10.1007/s11119-016-9439-8
Taghizadeh-Mehrjardi R, Hamzehpour N, Hassanzadeh M, Heung B, Ghebleh Goydaragh M, Schmidt K, Scholten T (2021) Enhancing the accuracy of machine learning models using the super learner technique in digital soil mapping. Geoderma 399:115108. https://doi.org/10.1016/j.geoderma.2021.115108
Tybl A (2016) An overview of spatial econometrics. SSRN Electron J. https://doi.org/10.2139/ssrn.2778679
Uddameri V, Silva A, Singaraju S, Mohammadi G, Hernandez E (2020) Tree-based modeling methods to predict nitrate exceedances in the Ogallala aquifer in Texas. Water 12(4):1023. https://doi.org/10.3390/w12041023
van der Westhuizen S, Heuvelink GBM, Hofmeyr DP (2023) Multivariate random forest for digital soil mapping. Geoderma 431:116365. https://doi.org/10.1016/j.geoderma.2023.116365
Van Der Westhuizen S, Heuvelink GBM, Hofmeyr DP, Poggio L (2022) Measurement error-filtered machine learning in digital soil mapping. Spat Stat 47:100572. https://doi.org/10.1016/j.spasta.2021.100572
Wadoux AMJ-C, Minasny B, McBratney AB (2020) Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth Sci Rev 210:103359. https://doi.org/10.1016/j.earscirev.2020.103359
Wang L, Chen S, Li D, Wang C, Jiang H, Zheng Q, Peng Z (2021) Estimation of paddy rice nitrogen content and accumulation both at leaf and plant levels from UAV hyperspectral imagery. Remote Sens 13(15):2956. https://doi.org/10.3390/rs13152956
Wang N, Luo Y, Liu Z, Sun Y (2022) Spatial distribution characteristics and evaluation of soil pollution in coal mine areas in Loess Plateau of northern Shaanxi. Sci Rep 12(1):16440. https://doi.org/10.1038/s41598-022-20865-6
Wang X, Fan J, Xing Y, Xu G, Wang H, Deng J, Wang Y, Zhang F, Li P, Li Z (2019) The effects of mulch and nitrogen fertilizer on the soil environment of crop plants. Adv Agron 153:121–173. https://doi.org/10.1016/bs.agron.2018.08.003
Weintraub SR, Brooks PD, Bowen GJ (2017) Interactive effects of vegetation type and topographic position on nitrogen availability and loss in a temperate montane ecosystem. Ecosystems 20(6):1073–1088. https://doi.org/10.1007/s10021-016-0094-8
Worthy B (2015) The impact of open data in the UK: complex, unpredictable, and political. Public Adm 93(3):788–805. https://doi.org/10.1111/padm.12166
Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17. https://doi.org/10.18637/jss.v077.i01
Xiaorui L, Jiamin Y, Longji Y (2023) Predicting the high heating value and nitrogen content of torrefied biomass using a support vector machine optimized by a sparrow search algorithm. RSC Adv 13(2):802–807. https://doi.org/10.1039/D2RA06869A
Xu R, Nettleton D, Nordman DJ (2016) Case-specific random forests. J Comput Graph Stat 25(1):49–65. https://doi.org/10.1080/10618600.2014.983641
Xu S, Wang M, Shi X, Yu Q, Zhang Z (2021) Integrating hyperspectral imaging with machine learning techniques for the high-resolution mapping of soil nitrogen fractions in soil profiles. Sci Total Environ 754:142135. https://doi.org/10.1016/j.scitotenv.2020.142135
Zhang G, Liu F, Song X (2017) Recent progress and future prospect of digital soil mapping: a review. J Integr Agric 16(12):2871–2885. https://doi.org/10.1016/S2095-3119(17)61762-3
Zhang P, Yin Z-Y, Jin Y-F (2021) State-of-the-art review of machine learning applications in constitutive modeling of soils. Archiv Comput Methods Eng 28(5):3661–3686. https://doi.org/10.1007/s11831-020-09524-z
Zhang Y, Ji W, Saurette DD, Easher TH, Li H, Shi Z, Adamchuk VI, Biswas A (2020) Three-dimensional digital soil mapping of multiple soil properties at a field-scale using regression kriging. Geoderma 366:114253. https://doi.org/10.1016/j.geoderma.2020.114253
Zhang Y, Sui B, Shen H, Ouyang L (2019) Mapping stocks of soil total nitrogen using remote sensing data: a comparison of random forest models with different predictors. Comput Electron Agric 160:23–30. https://doi.org/10.1016/j.compag.2019.03.015
Zhou J, Xu Y, Gu X, Chen T, Sun Q, Zhang S, Pan Y (2023) High-precision mapping of soil organic matter based on UAV imagery using machine learning algorithms. Drones 7(5):290. https://doi.org/10.3390/drones7050290
Download references
Open access funding provided by Università degli Studi di Sassari within the CRUI-CARE Agreement. Partial financial support was received from University of Sassari (FAR 2022, 2023, 2024).
The authors have no relevant financial or non-financial interests to disclose.
Authors and affiliations.
Dipartimento Di Architettura, Design E Urbanistica, Università Di Sassari, Via Piandanna 4, 07100, Sassari, Italy
Alessandro Auzzas, Gian Franco Capra & Antonio Ganga
Department of Biology and Chemistry, California State University, Monterey Bay, Seaside, CA, 93955, USA
Arun Dilipkumar Jani
You can also search for this author in PubMed Google Scholar
Correspondence to Antonio Ganga .
Conflict of interest.
The authors declare no competing interests.
The authors were compliant with the ethical standards.
Research meets all applicable standards relating to ethics and research integrity.
All authors provided informed consent.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Auzzas, A., Capra, G.F., Jani, A.D. et al. An improved digital soil mapping approach to predict total N by combining machine learning algorithms and open environmental data. Model. Earth Syst. Environ. (2024). https://doi.org/10.1007/s40808-024-02127-8
Download citation
Received : 16 May 2024
Accepted : 02 August 2024
Published : 20 August 2024
DOI : https://doi.org/10.1007/s40808-024-02127-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
IMAGES
COMMENTS
Graph neural networks have emerged as a popular and powerful tool for learning hierarchical representation of graph data. In complement to graph convolution operators, graph pooling is crucial for extracting hierarchical representation of data in graph neural networks. However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we ...
However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we propose a novel graph pooling strategy that leverages node affinity to ...
A structure-aware kernel representation is introduced to explicitly exploit advanced topological information for efficient graph pooling without eigendecomposition of the graph Laplacian and is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets. Graph neural networks have emerged as a popular and powerful tool for learning ...
In this paper, we propose a novel graph pooling strategy that leverages node affinity to improve the hierarchical representation learning of graph data. Node affinity is computed by harmonizing the kernel representation of topology information and node features. In particular, a structure-aware kernel representation is introduced to explicitly ...
Fig. 1. Framework of the proposed pooling operator. Node affinity is computed on the basis of the topological affinity of nodes and the similarity of signals they supporting with the help of kernel methods. On the basis of the node affinity, the coarsened graph is constructed by graph downsampling with seed node selection and graph reduction with soft-assignment. Specifically, for a graph Gl ...
Graph Search is a network of interconnected data instantly accessible in a single place. Thanks to the power of advanded machine learning, you can efficiently navigate through the complexity of the EPFL academic world (courses, sessions, concepts, people, publications and laboratories) in order to better understand it, quickly find information, create new collaborations and make decisions.
The third component employs convolutional or message-passing steps to learn node representations based on graph adjacencies (see Supplementary Note 1 for more details on graph convolutions and ...
Graph representation learning has received widespread attention in recent years. Most of the existing graph representation learning methods are based on supervised learning and require the complete graph as input. It needs a lot of computation memory cost. Besides, real-world graph data lacks labels and the cost of manually labeling data is expensive. Self-supervised learning provides a ...
Geometric Deep Learning Graph Neural Networks Autonomous Driving. Articles Cited by Public access Co-authors. Title. ... Dynamic scenario representation learning for motion forecasting with heterogeneous graph convolutional recurrent networks. ... Multiscale representation learning of graph data with node affinity. X Gao, W Dai, C Li, H Xiong ...
The first is to learn a affinity matrix Λ [23] from the original data in which Λ ij is the similarity between the i th and the j th sample, the second is to apply spectral clustering methods on the affinity matrix to find the segmentation of data. Compared these two sub-problems, learning a discriminative affinity matrix is more important.
Multiscale Representation Learning of Graph Data With Node Affinity. Xing Gao 0005, Wenrui Dai, Chenglin Li, Hongkai Xiong, Pascal Frossard. Multiscale Representation Learning of Graph Data With Node Affinity. IEEE Trans. Signal and Information Processing over Networks, 7: 30-44, 2021.
A wide variety of deep neural network models for graph-structured data have been proposed to solve tasks like node/graph classification and link prediction. By effectively learning low-dimensional embeddings of graph nodes, they have shown state-of-the-art performance. However, most existing models learn node embeddings by exploring flat information propagation across the edges within the ...
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. Graph neural networks (GNNs) have been widely used in DTA prediction. However, existing shallow GNNs are insufficient to capture the global structure of compounds. Besides, the interpretability of the graph-based DTA models Most popular 2022 physical and theoretical chemistry articles
Multi-scale Protein Representation (Multi-scale Graph) The multi-scale graph is obtained by connecting the surface node and the backbone nodes. The above mentioned nodes have an edge if they have the same residue identifier r. The graph is encoded by the multi-scale message passing network.
Machine learning. Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued ...
The increase in the affinity data available in DT knowledge-bases allows the use of advanced learning techniques such as deep learning architectures in the prediction of binding affinities.
Key takeaway: 'The proposed graph pooling strategy, leveraging node affinity, improves hierarchical representation learning of graph data in graph neural networks, achieving state-of-the-art performance on public graph classification benchmark datasets.'
space. To address this problem, graph neural networks (GNNs) have been adopted in DTA prediction.31-36 The GNN-based methods represent the drugs as graphs and use GNN for DTA prediction. For instance, Tsubaki et al.34 proposed to use GNN and CNN to learn low-dimensional vector representation of compound graphs and protein sequences ...
On the software front, we have developed a random dynamic graph CNN to learn point clouds (Fig. 1e). Graph Neural Networks 10 (GNNs) update graph node feature representations through information ...
Multiscale Representation Learning of Graph Data With Node Affinity. Xing Gao Wenrui Dai Chenglin Li Hongkai Xiong Pascal Frossard Published in: IEEE Trans. Signal Inf. Process. over Networks (2021)
To address the above problems, we proposed a multiscale graph neural network (MGNN) and a novel visual explanation method called gradient-weighted affinity activation mapping (Grad-AAM) for DTA prediction and interpretation. An overview of the proposed MGraphDTA is shown in Fig. 2. The MGNN with 27 graph convolutional layers and a multiscale ...
Semi-supervised graph learning aims to improve learning performance by leveraging unlabeled nodes. Typically, it can be approached in two different ways, including <italic>predictive representation learning</italic> (PRL) where unlabeled data provide clues on input distribution and <italic>label-dependent regularization</italic> (LDR) which smooths the output distribution with unlabeled nodes ...
Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause. View a PDF of the paper titled Multi-Scale Representation Learning on Proteins, by Vignesh Ram Somnath and 2 other authors. Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein ...
It is because that the representations of drug and disease are intrinsic synergic on biological context and topological proximity in the heterogenous network. To solve these problems, a novel computational Drug Repositioning method by Collaborative Learning based on graph convolutional inductive Network (DRCLN) is developed in this manuscript.
GNN models user session data as directed graphs, leveraging the relational structures between nodes in the graph to propagate and aggregate information for learning node representations.
More importantly, ALML reveals the equivalence between graph cut and graph modularity learning in multilayer graph scenarios in theory. To solve the optimization problem involved in ALML, this paper proposes an efficient alternating algorithm with quadratic-level time complexity, which is satisfactory in multilayer graph clustering scenarios ...
However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we propose a novel graph pooling strategy that leverages node affinity to improve the hierarchical representation learning of graph data. Node affinity is computed by harmonizing the kernel representation of topology ...
With the widespread adoption of mobile internet, users generate vast amounts of location-based data across multiple social networking platforms. This data is valuable for applications such as personalized recommendations and targeted advertising. Accurately identifying users across different platforms enhances understanding of user behavior and preferences. To address the complexity of cross ...
Digital Soil Mapping (DSM) is fundamental for soil monitoring, as it is limited and strategic for human activities. The availability of high temporal and spatial resolution data and robust algorithms is essential to map and predict soil properties and characteristics with adequate accuracy, especially at a time when the scientific community, legislators and land managers are increasingly ...