paper.pdf (371.24 kB)
Download fileClassification of cancer pathology reports: a large-scale comparative study
preprint
posted on 2020-06-29, 17:45 authored by Stefano MartinaStefano Martina, Leonardo Ventura, Paolo FrasconiWe report about the application of state-of-the-art deep learning
techniques to the automatic and interpretable assignment of ICD-O3
topography and morphology codes to free-text cancer reports. We present
results on a large dataset (more than 80 000 labeled and 1 500 000
unlabeled anonymized reports written in Italian and collected from
hospitals in Tuscany over more than a decade) and with a large number of
classes (134 morphological classes and 61 topographical classes) for which we obtained the approval from the institutional ethics committee (CEAV 14081 oss 27/11/2018). We
compare alternative architectures in terms of prediction accuracy and
interpretability and show that our best model achieves a multiclass
accuracy of 90.3% on topography site assignment and 84.8% on morphology
type assignment. We found that in this context hierarchical models are
not better than flat models and that an element-wise maximum aggregator
is slightly better than attentive models on site classification.
Moreover, the maximum aggregator offers a way to interpret the
classification process.
Funding
Italian Ministry of Education, University, and Research, Grant 2017TWNMH2.
History
Email Address of Submitting Author
stefano.martina@unifi.itORCID of Submitting Author
0000-0001-6024-1752Submitting Author's Institution
University of FlorenceSubmitting Author's Country
- Italy