TY - GEN
T1 - From Motif to Path
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
AU - Wang, Qihao
AU - Cao, Hongtai
AU - Li, Xiaodong
AU - Chang, Kevin Chen Chuan
AU - Cheng, Reynold
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - While motif has been widely employed in graph analytics, a fundamental question remains open: How should overlapping motif edges connect into a path? Existing works address this question with simple but inconsistent generalizations from standard graphs. This paper studies this issue by proposing the concept of connectivity degree (CD), i.e. the number of overlapping nodes needed for motif edges to be adjacent, as the requirement for path connection. We further study three research questions. First, is CD significant? We study how CD impacts motif analytics, more specifically, three motif-based methods. Second, how to estimate the right CD? We develop a minimax estimator based on minimizing the worst-case risk. Finally, how to detect the connected components with connectivity degree, an important task by itself and necessary for our estimator. As the traditional BFS or DFS approaches are not valid anymore, we develop a disjoint set algorithm instead. Our experiments validate that our CD can improve the performance of motif analytics. Also, our estimator is effective and our connected component detection algorithm is efficient.
AB - While motif has been widely employed in graph analytics, a fundamental question remains open: How should overlapping motif edges connect into a path? Existing works address this question with simple but inconsistent generalizations from standard graphs. This paper studies this issue by proposing the concept of connectivity degree (CD), i.e. the number of overlapping nodes needed for motif edges to be adjacent, as the requirement for path connection. We further study three research questions. First, is CD significant? We study how CD impacts motif analytics, more specifically, three motif-based methods. Second, how to estimate the right CD? We develop a minimax estimator based on minimizing the worst-case risk. Finally, how to detect the connected components with connectivity degree, an important task by itself and necessary for our estimator. As the traditional BFS or DFS approaches are not valid anymore, we develop a disjoint set algorithm instead. Our experiments validate that our CD can improve the performance of motif analytics. Also, our estimator is effective and our connected component detection algorithm is efficient.
KW - graph analytics
KW - motif
KW - social network
UR - https://www.scopus.com/pages/publications/85195690882
U2 - 10.1109/ICDE60146.2024.00227
DO - 10.1109/ICDE60146.2024.00227
M3 - Conference contribution
AN - SCOPUS:85195690882
T3 - Proceedings - International Conference on Data Engineering
SP - 2751
EP - 2764
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
Y2 - 13 May 2024 through 17 May 2024
ER -