Document-type classification errors in bibliometric databases: Insights from the engineering/manufacturing field
Domenico Augusto MAISANO, Lucrezia FERRARA, Fiorenzo FRANCESCHINI
Abstract. Document types (DTs) – e.g., research articles, reviews, conference proceedings, letters, etc. – are not only used to classify scientific publications, but also to routinely guide inclusion-exclusion decisions in bibliometric assessments, often without adequate consideration of the quality of underlying content. This study examines DT-classification errors in Scopus and Web of Science (WoS), focusing on engineering/manufacturing publications. These errors – which may directly affect publication/citation counts, citation-impact indicators, and consequently academic evaluations and careers – are analyzed in a corpus of about 10,000 documents, using a recent semi-automated method. The results indicate that these errors, while occurring in several percentage points, are far from negligible. Furthermore, statistical analyses reveal systematic differences among publishers (e.g., Springer, Elsevier, Taylor & Francis, etc.), with some contributing more to errors, probably due to editorial styles or inconsistent metadata. This study provides insights for researchers, evaluators and database managers, highlighting the need for publisher-specific guidelines to enhance classification accuracy and reduce errors.
Keywords
Document-Type Classification, Quality, Performance Indicators
Published online 9/10/2025, 10 pages
Copyright © 2025 by the author(s)
Published under license by Materials Research Forum LLC., Millersville PA, USA
Citation: Domenico Augusto MAISANO, Lucrezia FERRARA, Fiorenzo FRANCESCHINI, Document-type classification errors in bibliometric databases: Insights from the engineering/manufacturing field, Materials Research Proceedings, Vol. 57, pp 80-89, 2025
DOI: https://doi.org/10.21741/9781644903735-10
The article was published as article 10 of the book Italian Manufacturing Association Conference
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
References
[1] Harzing, A. W. (2013). Document categories in the ISI Web of Knowledge: Misunderstanding the social sciences?. Scientometrics, 94(1), 23-34. https://doi.org/10.1007/s11192-012-0738-1
[2] Donner, P. (2017). Document type assignment accuracy in the journal citation index data of Web of Science. Scientometrics, 113(1), 219-236. https://doi.org/10.1007/s11192-017-2483-y
[3] Franceschini, F., Maisano, D., Mastrogiacomo, L. (2015). Influence of omitted citations on the bibliometric statistics of the major Manufacturing journals. Scientometrics, 103(3), 1083-1122. https://doi.org/10.1007/s11192-015-1583-9
[4] García-Pérez M.A. (2010) Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in psychology. Journal of the American society for information science and technology, 61/10: 2070-85. https://doi.org/10.1002/asi.21372
[5] Sigogneau, A. (2000). An analysis of document types published in journals related to physics: Proceeding papers recorded in the Science Citation Index database. Scientometrics, 47(3), 589-604. https://doi.org/10.1023/A:1005628218890
[6] Mokhnacheva, Y. V. (2023). Document Types Indexed in WoS and Scopus: Similarities, Differences, and Their Significance in the Analysis of Publication Activity. Scientific and Technical Information Processing, 50(1), 40-46. https://doi.org/10.3103/S0147688223010033
[7] Maisano, D.A., Mastrogiacomo, L., Ferrara L., and Franceschini F. (2025) A large-scale semi-automated approach for assessing document-type classification errors in bibliometric databases. Scientometrics, 130, 1901-1938. https://doi.org/10.1007/s11192-025-05244-y
[8] Franceschini, F., Maisano, D. (2014). Sub-field normalization of the IEEE scientific journals based on their connection with Technical Societies. Journal of Informetrics, 8(3), 508-533. https://doi.org/10.1016/j.joi.2014.04.005
[9] Haupka, N., Culbert, J. H., Schniedermann, A., Jahn, N., Mayr, P. (2024). Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar. arXiv preprint arXiv:2406.15154.
[10] Ross, S.M. (2017). Introductory statistics. Academic Press (Elsevier), London. https://doi.org/10.1016/B978-0-12-804317-2.00031-X
[11] Franceschini, F., Maisano, D., Mastrogiacomo, L. (2016). Empirical analysis and classification of database errors in Scopus and Web of Science. Journal of informetrics, 10(4), 933-953. https://doi.org/10.1016/j.joi.2016.07.003
[12] Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). Do Scopus and WoS correct “old” omitted citations? Scientometrics, 107, 321-335. https://doi.org/10.1007/s11192-016-1867-8
[13] Franceschini, F., & Maisano, D. (2011). Bibliometric positioning of scientific manufacturing journals: A comparative analysis. Scientometrics, 86(2), 463-485. https://doi.org/10.1007/s11192-010-0301-x
[14] Franceschini, F., Maisano, D., Turina, E. (2012). European research in the field of production technology and manufacturing systems: an exploratory analysis through publications and patents. The International Journal of Advanced Manufacturing Technology, 62(1-4), 329-350. https://doi.org/10.1007/s00170-011-3791-7



