TY - GEN
T1 - U-MLLA
T2 - 8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
AU - Jiang, Yufeng
AU - Li, Zongxi
AU - Chen, Xiangyan
AU - Xie, Haoran
AU - Cai, Jing
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Medical image segmentation is fundamental to computer-assisted diagnosis but faces challenges across diverse imaging modalities. Linear attention mechanisms succeed in natural images but are limited in medical segmentation due to insufficient spatial dependency and tissue heterogeneity modeling. Research indicates that successful dense prediction requires balanced permutation variance, strong inductive capabilities, and precise absolute position information. Current linear attention approaches satisfy the first two requirements but critically lack the third, significantly impacting medical segmentation where spatial localization is essential. To address these limitations, we propose U-MLLA, which integrates U-Net with mamba-like linear attention (MLLA) for multiscale feature and context capture. We further introduce complementary conditional and absolute positional encoding (APE) to compensate for position information deficits in linear attention. Experiments show U-MLLA provides robust features, and the complementary strategies significantly improve multi-organ and tumor segmentation. APE particularly excels with complex structures requiring precise boundary delineation. This cognitively inspired architecture adapts 93% of ImageNet-1k weights and increases effectiveness. Comprehensive evaluations across six challenging datasets (e.g., FLARE22, AMOS22CT/MR, ACDC) and 24 tasks, U-MLLA achieves state-of-the-art performance with an average DSC of 88.32%, outperforming nnUNetV2-2D and SwinUNetR by 4.37% and 1.98%. These results highlight U-MLLA’s potential for clinical applications that require precise anatomical delineation, where APE is essential for maintaining spatial context and differentiating similar structures. The code is available at https://github.com/csyfjiang/U-MLLA.
AB - Medical image segmentation is fundamental to computer-assisted diagnosis but faces challenges across diverse imaging modalities. Linear attention mechanisms succeed in natural images but are limited in medical segmentation due to insufficient spatial dependency and tissue heterogeneity modeling. Research indicates that successful dense prediction requires balanced permutation variance, strong inductive capabilities, and precise absolute position information. Current linear attention approaches satisfy the first two requirements but critically lack the third, significantly impacting medical segmentation where spatial localization is essential. To address these limitations, we propose U-MLLA, which integrates U-Net with mamba-like linear attention (MLLA) for multiscale feature and context capture. We further introduce complementary conditional and absolute positional encoding (APE) to compensate for position information deficits in linear attention. Experiments show U-MLLA provides robust features, and the complementary strategies significantly improve multi-organ and tumor segmentation. APE particularly excels with complex structures requiring precise boundary delineation. This cognitively inspired architecture adapts 93% of ImageNet-1k weights and increases effectiveness. Comprehensive evaluations across six challenging datasets (e.g., FLARE22, AMOS22CT/MR, ACDC) and 24 tasks, U-MLLA achieves state-of-the-art performance with an average DSC of 88.32%, outperforming nnUNetV2-2D and SwinUNetR by 4.37% and 1.98%. These results highlight U-MLLA’s potential for clinical applications that require precise anatomical delineation, where APE is essential for maintaining spatial context and differentiating similar structures. The code is available at https://github.com/csyfjiang/U-MLLA.
KW - Linear Attention in Vision
KW - Medical Image Segmentation
KW - Position Encoding
KW - Semantic Segmentation
KW - UNet
UR - https://www.scopus.com/pages/publications/105031134668
U2 - 10.1007/978-981-95-5634-2_6
DO - 10.1007/978-981-95-5634-2_6
M3 - Conference contribution
AN - SCOPUS:105031134668
SN - 9789819556335
T3 - Lecture Notes in Computer Science
SP - 76
EP - 90
BT - Pattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
A2 - Kittler, Josef
A2 - Xiong, Hongkai
A2 - Lin, Weiyao
A2 - Yang, Jian
A2 - Chen, Xilin
A2 - Lu, Jiwen
A2 - Yu, Jingyi
A2 - Zheng, Weishi
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 15 October 2025 through 18 October 2025
ER -