当前位置: X-MOL 学术Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAROS: A dataset for whole-body region and organ segmentation in CT imaging
Scientific Data ( IF 9.8 ) Pub Date : 2024-05-10 , DOI: 10.1038/s41597-024-03337-6
Sven Koitka , Giulia Baldini , Lennard Kroll , Natalie van Landeghem , Olivia B. Pollok , Johannes Haubold , Obioma Pelka , Moon Kim , Jens Kleesiek , Felix Nensa , René Hosch

The Sparsely Annotated Region and Organ Segmentation (SAROS) dataset was created using data from The Cancer Imaging Archive (TCIA) to provide a large open-access CT dataset with high-quality annotations of body landmarks. In-house segmentation models were employed to generate annotation proposals on randomly selected cases from TCIA. The dataset includes 13 semantic body region labels (abdominal/thoracic cavity, bones, brain, breast implant, mediastinum, muscle, parotid/submandibular/thyroid glands, pericardium, spinal cord, subcutaneous tissue) and six body part labels (left/right arm/leg, head, torso). Case selection was based on the DICOM series description, gender, and imaging protocol, resulting in 882 patients (438 female) for a total of 900 CTs. Manual review and correction of proposals were conducted in a continuous quality control cycle. Only every fifth axial slice was annotated, yielding 20150 annotated slices from 28 data collections. For the reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined. The SAROS dataset serves as an open-access resource for training and evaluating novel segmentation models, covering various scanner vendors and diseases.



中文翻译:

SAROS:CT 成像中全身区域和器官分割的数据集

稀疏注释区域和器官分割 (SAROS) 数据集是使用癌症成像档案 (TCIA) 的数据创建的,旨在提供大型开放访问 CT 数据集,其中包含身体标志的高质量注释。采用内部分割模型对 TCIA 随机选择的病例生成注释建议。该数据集包括 13 个语义身体区域标签(腹腔/胸腔、骨骼、大脑、乳房植入物、纵隔、肌肉、腮腺/颌下/甲状腺、心包、脊髓、皮下组织)和 6 个身体部位标签(左/右臂) /腿、头、躯干)。病例选择基于 DICOM 系列描述、性别和成像协议,共 900 个 CT,共有 882 名患者(438 名女性)。在连续的质量控制周期中对提案进行人工审查和纠正。仅对每五个轴向切片进行注释,从 28 个数据集合中生成 20150 个带注释的切片。为了下游任务的可重复性,预先定义了五个交叉验证折叠和一个测试集。 SAROS 数据集作为开放获取资源,用于培训和评估新颖的分割模型,涵盖各种扫描仪供应商和疾病。

更新日期:2024-05-10
down
wechat
bug