TY - JOUR
T1 - MOSER
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
AU - Mohammad,
AU - Najafi, Matin
AU - Ma, Chenhao
AU - Li, Xiaodong
AU - Cheng, Reynold
AU - Laks, V. S.
AU - Lakshmanan,
N1 - Publisher Copyright:
© 2023, VLDB Endowment. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Given a graph G, a motif (e.g., 3-node clique) is a fundamental building block for G. Recently, motif-based graph analysis has attracted much attention due to its efficacy in tasks such as clustering, ranking, and link prediction. These tasks require Network Motif Discovery (NMD) at the early stage to identify the motifs of G. However, existing NMD solutions have two drawbacks: (1) Lack of theoretical guarantees on the quality of the samples generated using the existing methods, and (2) inefficient algorithms, which are not scalable for large graphs. These limitations hinder the exploration of motifs for analyzing large graphs. To address the above issues, we propose a novel solution named MOSER (MOtif Discovery using SERial Test). This novel NMD framework leverages a significance testing method known as the serial test, which differs from the existing solutions. We further propose two fast incremental subgraph counting algorithms, allowing MOSER to scale to larger graphs than ever possible before. Extensive experimental results show that using MOSER can improve the state-of-the-art up to 5 orders of magnitude in efficiency and that the motifs found by MOSER facilitate downstream tasks such as link prediction.
AB - Given a graph G, a motif (e.g., 3-node clique) is a fundamental building block for G. Recently, motif-based graph analysis has attracted much attention due to its efficacy in tasks such as clustering, ranking, and link prediction. These tasks require Network Motif Discovery (NMD) at the early stage to identify the motifs of G. However, existing NMD solutions have two drawbacks: (1) Lack of theoretical guarantees on the quality of the samples generated using the existing methods, and (2) inefficient algorithms, which are not scalable for large graphs. These limitations hinder the exploration of motifs for analyzing large graphs. To address the above issues, we propose a novel solution named MOSER (MOtif Discovery using SERial Test). This novel NMD framework leverages a significance testing method known as the serial test, which differs from the existing solutions. We further propose two fast incremental subgraph counting algorithms, allowing MOSER to scale to larger graphs than ever possible before. Extensive experimental results show that using MOSER can improve the state-of-the-art up to 5 orders of magnitude in efficiency and that the motifs found by MOSER facilitate downstream tasks such as link prediction.
UR - https://www.scopus.com/pages/publications/85183594098
U2 - 10.14778/3632093.3632118
DO - 10.14778/3632093.3632118
M3 - Conference article
AN - SCOPUS:85183594098
SN - 2150-8097
VL - 17
SP - 591
EP - 603
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 3
Y2 - 24 August 2024 through 29 August 2024
ER -