当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text Mining Metal–Organic Framework Papers
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2018-01-29 00:00:00 , DOI: 10.1021/acs.jcim.7b00608
Sanghoon Park 1 , Baekjun Kim 1 , Sihoon Choi 1 , Peter G. Boyd 2 , Berend Smit 2 , Jihan Kim 1
Affiliation  

We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal–organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m2/g, cm3/g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure–property relationships among MOFs as well as collecting a large set of data for references.

中文翻译:

文本挖掘金属-有机框架论文

我们已经开发了一种简单的文本挖掘算法,该算法使我们可以使用html文件作为输入来识别金属有机框架(MOF)的表面积和孔体积。该算法搜索与这两个量相关联的通用单位(例如m 2 / g,cm 3 / g),以利于搜索。从200多个MOF的样本集数据中,该算法设法识别出正确的表面积和孔体积值的90%和88.8%。进一步应用于随机选择的MOF html文件的测试集,两个量的准确性分别为73.2%和85.1%。大多数错误源于非正统的句子结构,这使得难以识别正确的数据以及MOF的粗体表示法(例如1a),因此很难确定其真实姓名。当发现MOF之间的结构-属性关系以及收集大量数据以供参考时,这些类型的工具将非常有用。
更新日期:2018-01-29
down
wechat
bug