Cell line name recognition in support of the identification of synthetic lethality in cancer from text

dc.contributor.authorKaewphan, Suwisa
dc.contributor.authorVan Landeghem, Sofie
dc.contributor.authorOhta, Tomoko
dc.contributor.authorVan de Peer, Yves
dc.contributor.authorGinter, Filip
dc.contributor.authorPyysalo, Sampo
dc.date.accessioned2016-02-19T06:13:29Z
dc.date.available2016-02-19T06:13:29Z
dc.date.issued2016-01
dc.description.abstractMOTIVATION : The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broadcoverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. RESULTS : We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers.en_ZA
dc.description.librarianhb2015en_ZA
dc.description.sponsorshipAcademy of Finland and Research Foundation Flanders (FWO) .en_ZA
dc.description.urihttp://bioinformatics.oxfordjournals.orgen_ZA
dc.identifier.citationKaewphan, S, Van Landeghem, S, Ohta, T, Van de Peer, Y, Ginter, F & Pyysalo, S 2016, 'Cell line name recognition in support of the identification of synthetic lethality in cancer from text ', Bioinformatics, vol. 32, no. 2, pp. 276-282.en_ZA
dc.identifier.issn1367-4803 (print)
dc.identifier.issn1460-2059 (online)
dc.identifier.other10.1093/bioinformatics/btv570
dc.identifier.urihttp://hdl.handle.net/2263/51469
dc.language.isoenen_ZA
dc.publisherOxford University Pressen_ZA
dc.rights© The Author 2015. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).en_ZA
dc.subjectRecognitionen_ZA
dc.subjectCell line namesen_ZA
dc.subjectNormalizationen_ZA
dc.subjectIdentificationen_ZA
dc.subjectSynthetic lethalityen_ZA
dc.titleCell line name recognition in support of the identification of synthetic lethality in cancer from texten_ZA
dc.typeArticleen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kaewphan_Cell_2016.pdf
Size:
133.6 KB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: