当前位置: X-MOL 学术Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries
Scientific Data ( IF 5.8 ) Pub Date : 2024-04-11 , DOI: 10.1038/s41597-024-03196-1
Yuxiao Gou 1 , Yiping Zhang 1 , Jian Zhu 1 , Yidan Shu 1, 2
Affiliation  

Natural language processing techniques enable extraction of valuable information from large amounts of published literature for the application of data science and technology, i.e. machine learning in the field of materials science. Nevertheless, the automated extraction of data from full-text documents remains a complex task. We propose a document-level natural language processing pipeline for literature extraction of comprehensive information on layered cathode materials for sodium-ion batteries. The pipeline enhances entity recognition with contextual supplementary information while capturing the article structure. Finally, a heuristic multi-level relationship extraction algorithm is employed in relation extraction to extract experimental parameters and complex performance relationships respectively. We successfully extracted a comprehensive dataset containing 5265 records from 1747 documents, encompassing essential information such as chemical composition, synthesis parameters, and electrochemical properties. By implementing our pipeline, we have made significant progress in overcoming the challenges associated with data scarcity in battery informatics. The extracted datasets provide a valuable resource for further research and development in the field of layered cathode materials.



中文翻译:


钠离子电池层状正极材料的文档级信息提取管道



自然语言处理技术能够从大量已发表的文献中提取有价值的信息,用于数据科学和技术的应用,即材料科学领域的机器学习。然而,从全文文档中自动提取数据仍然是一项复杂的任务。我们提出了一种文档级自然语言处理管道,用于钠离子电池层状正极材料综合信息的文献提取。该管道在捕获文章结构的同时,通过上下文补充信息增强了实体识别。最后,在关系提取中采用启发式多级关系提取算法,分别提取实验参数和复杂的性能关系。我们成功地从 1747 个文档中提取了包含 5265 条记录的综合数据集,涵盖化学成分、合成参数和电化学特性等基本信息。通过实施我们的管道,我们在克服电池信息学数据稀缺相关挑战方面取得了重大进展。提取的数据集为层状正极材料领域的进一步研究和开发提供了宝贵的资源。

更新日期:2024-04-13
down
wechat
bug