Transkribus in philological practice
an experience report
DOI:
https://doi.org/10.24206/lh.v11i2.66189Keywords:
Philology, Digital Humanities, Transkribus, Manuscript edition, Handwritten text recognition (HTR)Abstract
Seeking to understand a little more the use of Transkribus in philological work, this article narrates the experience of its use in a graduate research, in which an HTR model was created and applied in a Book of Minutes, and the generated transcription was exported and edited. The article also briefly narrates the historical path of Transkribus, its functioning and the functioning of HTR, seeking to bring more elements to the discussion to base the choices and analyses of other researchers. We found that the use of Transkribus for transcription is already consolidated in several researches, and can be very useful, including for Philologists, especially for transcription of large volumes of text. The caveats are in the export of the transcription (and subsequent transformation into editing) and the risks inherent to digital technologies.
Downloads
References
BOMMASANI, Rishi; et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. Disponível em: https://arxiv.org/pdf/2108.07258. Acesso em: 17 out. 2024.
CLAUSNER, Christian; PLETSCHACHER, Stefan; ANTONACOPOULOS, Apostolos. Aletheia: an advanced document layout and text ground-truthing system for production environments. In: INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2011, Beijing. Proceedings […]. Beijing: IEEE, 2011. p. 48-52. Disponível em: https://www.primaresearch.org/www/assets/papers/ICDAR2011_Clausner_Aletheia.pdf. Acesso em: 19 fev. 2025.
CUÉLLAR, Álvaro. La Inteligencia Artificial al rescate del Siglo de Oro: transcripción y modernización automática de mil trescientos impresos y manuscritos teatrales. Hipogrifo. Revista de literatura y cultura del Siglo de Oro, v. 11, n. 1, p. 1-22, 2023. Disponível em: https://www.revistahipogrifo.com/index.php/hipogrifo/article/view/1262. Acesso em: 17 out. 2024.
DEPIZZOLATTI, A. T. ; PEROZO-VASQUEZ, Joel G. ; SANTIAGO-ALMEIDA, M. M. . Inteligencia artificial y las humanidades: uso del TRANSKRIBUS en la transcripción de manuscritos. In: XIII CONGRESO UNIVERSITARIO INTERNACIONAL SOBRE CONTENIDOS, INVESTIGACIÓN, INNOVACIÓN Y DOCENCIA - CUICIID 2023, 2023, Madrid. Libro de Actas del Congreso CUICIID 2023. Madrid: Forum XXI - UCM, 2023. P. 985, ISBN 978-84-09-48185-9. DOI: 10.15178/CUICIID2023 Disponível em: http://doi.org/10.15178/CUICIID2023. Acesso em: 17 out. 2024.
E-MANUSCRIPTA.CH. Biblioteca digital de manuscritos suíços. Disponível em: https://www.e-manuscripta.ch/. Acesso em: 19 fev. 2025.
EMILIANO, António. Tipo medieval para computador: uma ferramenta informática para filólogos, historiadores da língua e paleógrafos. Signo: Revista de História da Cultura Escrita, v. 15, 2005, p. 139-176. ISSN 1134-1165. Disponível em: https://ebuah.uah.es/dspace/handle/10017/7624. Acesso em: 17 out. 2024.
EUROPEAN COMMISSION. Recognition and Enrichment of Archival Documents (READ). Community Research And Development Information Service CORDIS (Luxemburgo). Disponível em: https://cordis.europa.eu/project/id/674943/reporting. Acesso em: 17 out. 2024.
KAHLE, P.; COLUTTO, S.; HACKL, G.; MUHLBERGER, G. Transkribus: a service platform for transcription, recognition and retrieval of historical documents. In: INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, 14., 2017, Kyoto. Anais [...]. Kyoto: IEEE, 2017. Disponível em: https://doi.org/10.1109/icdar.2017.307. Acesso em: 17 out. 2024
KHAN, Salman et al. Transformers in vision: a survey. ACM Computing Surveys, v. 54, n. 10s, p. 200, jan. 2022. DOI: 10.1145/3505244. Disponível em: https://doi.org/10.1145/3505244. Acesso em: 17 out. 2024.
LI, Minghao; et al. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. Proceedings of the AAAI Conference on Artificial Intelligence, v. 37, n. 11, p. 13094-13102, 2022. DOI: 10.1609/aaai.v37i11.26538. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/26538. Acesso em: 17 out. 2024.
MASSOT, Marie-Laure; SFORZINI, Arianna; VENTRESQUE, Vincent. Transcribing Foucault’s handwriting with Transkribus. Journal of Data Mining and Digital Humanities, 2019. Disponível em: https://hal.science/hal-01913435v3. Acesso em: 17 out. 2024.
MEMON, Jamshed; SAMI, Maira; KHAN, Rizwan Ahmed; UDDIN, Mueen. Handwritten optical character recognition (OCR): a comprehensive systematic literature review. IEEE Access, v. 8, p. 142642-142668, 2020. Disponível em: https://doi.org/10.1109/ACCESS.2020.3012542. Acesso em: 17 out. 2024.
MUEHLBERGER, G.; et al. Transforming scholarship in the archives through handwritten text recognition Transkribus as a case study. Journal of Documentation. v. 75, n. 5, p. 954-976, 2019. Disponível em: https://doi.org/10.1108/JD-07-2018-0114. Acesso em: 17 out. 2024.
NOCKELS, J.; GOODING, P.; AMES, S.; et al. Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research. Archival Science, v. 22, p. 367-392, 2022. Disponível em: https://doi.org/10.1007/s10502-022-09397-0. Acesso em: 17 out. 2024.
NOCKELS, J.; BENS, P.; TERRAS, M. As implicações do reconhecimento de texto manuscrito para acessar o passado em escala. Jornal de Documentação, v. 80, n. 7, p. 148-167, 2024. Disponível em: https://doi.org/10.1108/JD-09-2023-0183. Acesso em: 17 out. 2024.
PRImA Research. Aletheia Document Analysis System. Disponível em: https://www.primaresearch.org/tools/Aletheia. Acesso em: 19 fev. 2025.
READ-COOP. Transkribus. Disponível em: https://www.transkribus.org/. Acesso em: 19 out. 2024.
RUIZ-PARRADO, Victoria; HERADIO, Ruben; ARANDA-ESCOLASTICO, Ernesto; SÁNCHEZ, Ángel; VÉLEZ, José F. A bibliometric analysis of off-line handwritten document analysis literature (1990–2020). Pattern Recognition, v. 125, p. 108513, 2022. DOI: 10.1016/j.patcog.2021.108513. Disponível em: https://doi.org/10.1016/j.patcog.2021.108513 Acesso em: 17 out. 2024.
SANTIAGO-ALMEIDA, M.M. De filho e mesquita a pessoa e assis. Polifonia, [S. l.], v. 18, n. 23, 2011. Disponível em: https://periodicoscientificos.ufmt.br/ojs/index.php/polifonia/article/view/28. Acesso em: 30 out. 2024.
SCHMIDHUBER, Jürgen. Deep learning in neural networks: An overview. Neural Networks, v. 61, p. 85-117, 2015. Disponível em: https://doi.org/10.1016/j.neunet.2014.09.003. Acesso em: 30 out. 2024.
SINHA, Swati; GURAV, Yash; BHAGAT, Priyanka; JADHAV, Rajeshri. A review of literature on handwritten text recognition. International Journal of Research in Engineering, Science and Management, v. 3, n. 2, p. 616-619, fev. 2020. Disponível em: https://www.ijresm.com/Vol.3_2020/Vol3_Iss2_February20/IJRESM_V3_I2_159.pdf. Acesso em: 17 out. 2024.
STOKES, P.; KIESSLING, B.; STÖKL BEN EZRA, D.; TISSOT, R.; GARGEM, E. H. The eScriptorium VRE for Manuscript Cultures. In: CLIVAZ, C.; ALLEN, G. V. (Ed.). Classics@ Journal, Ancient Manuscripts and Virtual Research Environments, n. 18, 2021. Disponível em: https://classics-at.chs.harvard.edu/classics18-stokes-kiessling-stokl-ben-ezra-tissot-gargem/. Acesso em: 19 fev. 2025.
STRÖBEL, Phillip Benjamin; CLEMATIDE, Simon; HODEL, Tobias; VOLK, Martin. Transformer-based HTR for Historical Documents. In: WORKSHOP ON COMPUTATIONAL METHODS IN THE HUMANITIES, 2022, Lausanne. Anais [...] Lausanne: [s.n.], 2022. Disponível em: https://doi.org/10.48550/arXiv.2203.11008 Acesso em: 17 out. 2024.
TERRAS, Melissa. The role of the library when computers can read: critically adopting handwritten text recognition (HTR) technologies to support research. In: HERVIEUX, Sandy; WHEATLEY, Amanda (Ed.). The rise of AI: implications and applications of artificial intelligence in academic libraries. 1. ed. Chicago: American Library Association, 2022. p. 137-149. Disponível em: https://www.research.ed.ac.uk/files/255303209/Rise_of_AI_Chapter_11.pdf. Acesso em: 17 out. 2024.
VASWANI, Ashish; et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA, 2017. p. 5998-6008. Disponível em: https://arxiv.org/pdf/1706.03762. Acesso em: 17 out. 2024.
VINJIT, B. M.; BHOJAK, M. K.; KUMAR, S.; CHALAK, G. A review on handwritten character recognition methods and techniques. In: INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2020, Chennai. Anais [...]. Chennai: IEEE, 2020. p. 1224-1228. Disponível em: https://doi.org/10.1109/ICCSP48568.2020.9182129. Acesso em: 17 out. 2024.
WANG, Yintong; XIAO, Wenjie; LI, Shuo. Offline handwritten text recognition using deep learning: a review. Journal of Physics: Conference Series, v. 1848, n. 1, p. 012015, abr. 2021. DOI: 10.1088/1742-6596/1848/1/012015. Disponível em: https://doi.org/10.1088/1742-6596/1848/1/012015 Acesso em: 17 out. 2024.
WEBER, Andreas; AMERYAN, Mahya; WOLSTENCROFT, Katherine; STORK, Lise; HEERLIEN, Maarten; SCHOMAKER, Lambert. Towards a Digital Infrastructure for Illustrated Handwritten Archives. In: IOANNIDES, Marinos (Ed.). Digital Cultural Heritage. Lecture Notes in Computer Science, v. 10605. Springer, 2018. p. 155-166. DOI: 10.1007/978-3-319-75826-8_13.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ana T. Depizzolatti, Manoel Mourivaldo Santiago-Almeida

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following:
a. The authors hold copyright of the published papers; authors are the sole responsible party for published papers content; the published paper is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License which allows the sharing of the publication as long as there is acknowledgement of authorship and publishing by Revista LaborHistórico.
b. Authors should seek previous permission from the journal in order to publish their articles as book chapters. Such publications should acknowledge first publishing by LaborHistórico.
c. Authors may publish and distribute their papers (for example, at institutional repositories, author's sites) at any time during or after the editorial process by Revista LaborHistórico.