de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
 
 
DownloadE-Mail
  Automatic extraction of polymer data from tables in XML documents of scientific articles

Oka, H., Yoshizawa, A., Shindo, H., Matsumoto, Y., & Ishii, M. (in preparation). Automatic extraction of polymer data from tables in XML documents of scientific articles.

Item is

Basic

show hide
Item Permalink: Version Permalink: http://pubman.nims.go.jp/pubman/item/escidoc:1898226:1
Genre: Journal Article

Files

show Files
hide Files
:
supplementaldata_200424.zip (Supplementary material), 220KB
Description:
-
Visibility:
Public
MIME-Type / Checksum:
application/zip / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Oka, Hiroyuki1, Author
Yoshizawa, Atsushi1, Author
Shindo, Hiroyuki2, 3, Author
Matsumoto, Yuji2, 3, Author
Ishii, Masashi1, Author
Affiliations:
1National Institute for Materials Science, escidoc:persistent22              
2Nara Institute of Science and Technology, escidoc:persistent22              
3RIKEN AIP, escidoc:persistent22              

Content

show
hide
Free keywords: Automatic data extraction, Polymer data, Table, XML documents, Polymer-name recognition
 Abstract: Automatic extraction of polymer data from tables in scientific articles was examined using table matrix structures. XML documents of articles were used to accurately reproduce tables by constructing the matrix structures in plain text. By utilizing XML tags that systematically manage contents in XML documents, such as simple tables, complicated column- and row-span tables, and fused tables were accurately reproduced. After table reproduction, four processes of data formatting for machine readability, polymer- and property-name recognition, and polymer data extraction were performed. In polymer-name recognition, our original recognizer was used. The recognizer was prepared through automatic annotation using our rule-based program based on typical character patterns of polymer full names and abbreviations and deep neural network learning of polymer names. In property-name recognition, partial string-matching using polymer property index terms and stop words was performed. In this study, glass transition temperature (Tg), melting temperature (Tm), and decomposition temperature (Td) were selected as the target polymer properties. Through these five processes, 2,043 data for Tg, 1,436 for Tm, and 2,183 for Td were extracted from approximately 18,000 scientific articles of Elsevier, and the F scores for the extraction were 0.896, 0.876, and 0.837, respectively. These results indicate that the automatic extraction system created in this study can efficiently and accurately collect masses of polymer data from a large number of scientific articles.

Details

show
hide
Language(s): eng - English
 Dates: 2020-04-24
 Publication Status: Not specified
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Method: -
 Identifiers: DOI: 10.11503/nims.1190
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Computational Materials Science
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Elsevier
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: - Identifier: ISSN: 0927-0256