PRiSMHA documents

Info

PRiSMHA (Providing Rich Semantic Metadata for Historical Archives) [CSTO168023]
PI: Annamaria Goy
Coordinator: Dipartimento di Informatica (Unito)
Partner: Dipartimento di Studi Storici (Unito)
In collaboration with: Fondaz. Ist. piemontese A. Gramsci - Torino and Polo del '900
Funded by: Fondazione Compagnia di San Paolo, Università di Torino
Duration: 2017-2020

Activities and results

The main goal of the PRiSMHA project is to demonstrate that a rich semantic representation of the content of archival historical documents can significantly improve the access to archival resources, and it is sustainable, thanks to collaborative semantic annotation, supported by automatic Information Extraction techniques.

THE SEMANTIC MODEL

HERO (Historical Event Representation Ontology) is a modular ontology that covers the different aspects of historical events: The event type, the place where it occurred, the time when it took place, and the participants in the event, with their roles. HERO is available at w3id.org/hero/HERO.
HERO-900 is a domain ontology that refine HERO by introducing notions relevant to the history of the 20th century, and in particular to the students and workers protest during the years 1968-1969 in Italy, which is domain selected for the PRiSMHA project.
An application version of HERO has been implemented in OWL 2 DL and drives the annotation system, representing its domain knowledge. This version of HERO currently contains 447 classes, 380 properties, 161 individuals and 4,661 logical axioms.

THE ANNOTATION PLATFORM

PRiSMHA architecture

The figure shows the main modules of the PRiSMHA overall architecture.
The ontology (HERO+HERO-900) drives the user interaction of both the Crowdsourcing Platform UI and the Final UI, and represents the "vocabulary" of the Semantic KB, which is implemented as a RDF triplestore and contains assertions about domain entities.
PRiSMHA implements a hybrid strategy, by integrating user-generated content and automatic techniques: user-generated content is provided through the Crowdsourcing Platform, while automatic techniques are represented by the Information Extraction (IE) module and the LOD linking module. The IE module offers a support to user annotation by identifying relevant entities within the document, and providing information about them. The LOD linking module offers suggestions retrieved from external datasets (Wikidata, in the current prototype), besides the possibility of linking the PRiSMHA entity to an external one. Data in the Semantic KB are accessible through both a SPARQL endpoint and a Final User Interface (UI)).

DIGITIZATION AND OCR OF ARCHIVAL DOCUMENTS

At the beginning of the project, we selected 200 documents from the archive collections of the Fondaz. Ist. piemontese A. Gramsci, containing newspaper and review articles, leaflets, and a few images, besides some textual biographies of preeminent figures of the Italian history of the 20th century, specifically involved in the historical period in focus. The 200 documents from the archives have been digitized and uploaded on the Polo del '900 archival platform 9centRo, together with their standard metadata.
These documents have been analyzed to select candidates to undergo an OCR process, and only 13 have been considered unsuited for the OCR process, thus obtaining a quite satisfactory text for 187 documents. Besides the standard advantages of having a full text available, in PRiSMHA full text is of paramount importance in order to apply Information Extraction techniques supporting users in the annotation process.

PRiSMHA teams

Links (presentations, papers, videos, etc)

Presentations and videos [in Italian]

Papers and dissertations

  • D. Colla, A. Goy, M. Leontino, D. Magro, C. Picardi, Bringing Semantics into Historical Archives with Computer-aided Rich Metadata Generation, Journal on Computing and Cultural Heritage - Special Issue Computational Archival Science, in press, ACM Press, 2022

  • A. Goy, C. Re, D. Colla, M. Leontino, Turning 1968 memories into usable texts, First International Conference on Recent Advances in Digital Humanities (RADH 2021), University of Bucharest, Romania, 2021 [pdf]

  • D. Colla, A. Goy, M. Leontino, D. Magro, Wikidata Support in the Creation of Rich Semantic Metadata for Historical Archives, Applied Sciences - Special Issue AI and HCI Methods and Techniques for Cultural Heritage Curation, Exploration and Fruition, 11(10), 4378, MDPI, 2021 [url]

  • A. Goy, D. Colla, D. Magro, C. Accornero, F. Loreto, D.P. Radicioni, Building Semantic Metadata for Historical Archives through an Ontology-driven User Interface, Journal on Computing and Cultural Heritage, 13(3), 1-36, ACM Press, 2020 [url]

  • A. Goy, C. Accornero, D. Astrologo, D. Colla, M. D'Ambrosio, R. Damiano, M. Leontino, A. Lieto, F. Loreto, D. Magro, E. Mensa, A. Montanaro, V. Mosca, S. Musso, D.P. Radicioni, C. Re, Fruitful synergies between computer science, historical studies and archives: the experience in the PRiSMHA project, Proc. 11th Int. Joint Conf. on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Vol. 3: KMIS (KMIS 2019), SciTePress, 2019, 225-230 [url]

  • A. Goy, D. Magro, Collections revisited from the perspective of historical testimonies, Int. J. Metadata, Semantics and Ontologies, 13(4), Inderscience Publishers, 2019, 300-316 [url]

  • A. Goy, D. Magro, A. Baldo, A Semantic Web Approach to Enable a Smart Route to Historical Archives,
    Journal of Web Engineering, 18(4-6), River Publishers, 2019, 287-318 [url]

  • A. Goy, R. Damiano, F. Loreto, D. Magro, S. Musso, D. Radicioni, C. Accornero, D. Colla, A. Lieto, E. Mensa, M. Rovera, D. Astrologo, B. Boniolo, M. D'Ambrosio, PRiSMHA (Providing Rich Semantic Metadata for Historical Archives), Proc. Contextual Representation of Objects and Events in Language (CREOL 2017), 2017 [pdf]

  • C. Re, L'OCR come strumento per garantire l'accessibilità ai contenuti testuali dei documenti d'archivio: il progetto PRiSMHA e la Fondazione Istituto piemontese Antonio Gramsci di Torino, Università di Torino, 2021 [pdf]

  • C. Zinnarosu, Analisi e design della User Interface per gli utenti finali del progetto PRiSMHA, Università di Torino, 2020 [pdf]