Numerous valuable historic and cultural sources - a major part of our cultural heritage - are currently imperilled and scattered in various national archives. Arts and Humanities are sciences that are mainly based on the interpretation of cultural objects such as texts, paintings and works of arts, or historical/ethnological remains and monuments. Such objects are often unique, very valuable, fragile, irreplaceable and locally preserved in scientific collections at museums, in archives, or in urban and historic areas. Archives, museums and other cultural institutions do not simply conserve these objects. They also manage a large of documentation on them in the form of photo collections, expertise, records, scientific studies and analyses. Both the objects themselves as well as the supplementary documentation are often accessible only through physical contact with users. Duplicates such as text documents (e.g., critical editions), or image documents (facsimiles, photographs) on paper are extremely expensive in terms of manpower, know-how and printing costs, and often these expenses cannot be justified for a small scientific audience. Electronic formats for object documentation might alleviate this access problem. Numerous initiatives have been started and supported to highlight and investigate a variety of challenges that museums and other culture-historical institutions are facing in an increasingly digital, media saturated landscape. However, full knowledge and usage of this material are severely impeded by access problems, due to the lack of appropriate content-based search and retrieval aids that help users to find what they really need even when electronically and digitized copies are available. Preserving contents does not consist in simply storing them, but in actively transforming them to adapt them technically and keep them intelligible. Moreover, many informal and non-institutional contacts between cultural archives constitute specific professional communities which today, however, still lack effective and efficient technological support for cooperative and collaborative knowledge working. The creation of digital libraries, enhanced by annotation collaboratory facilities, is the technological response to bundle documents, interpretation knowledge, work processes and an expert network in a very flexible working environment. Object and document collections in the Arts and Humanities always represent work in progress. The inventory at cultural institutions is growing steadily due to donations, acquisitions, and by virtue of their own daily scientific and conservation services. These additions must be incorporated into the existing collections but often space difficulties, problems of scientific know-how and lack of personnel have to be dealt with. Professionals and experts classify, analyze, assess and expose or edit these objects and documents. Highly qualified external specialists are frequently difficult to locate, if they are not part of a scholarly network. Internal experts are often overburdened with routine work in times of small cultural budgets and can only invest time sporadically and intermittently in integrating new inventories. Many scientific members of cultural institutions have temporary contracts and leave after a few years, taking with them a great part of the accumulated know-how. The intrinsic nature of the document processing procedures supporting the progressive work on historic material, as outlined in this introduction, poses several constraints that require solutions specifically tailored to the tasks mentioned above. Over the years, Intelligent Systems are becoming valuable working instruments for researchers involved in humanistic sciences. The new challenge is now to provide these people with tools that are able to facilitate the fruition and investigation of the cultural heritage, so that even non-experts or communities of researchers may use up-to-date tools for both their personal work and for collaborative purposes. Technologically, the World Wide Web can serve both as a standard communication platform for such communities and as a gateway for document-centered digital library applications. Yet, while the Web may solve the problem of the diffusion and access of this material in its digital form, new automated tools are needed to allow a more intelligent processing and a personalized utilization of this knowledge. According to the situation previously described, besides the effectiveness and the efficiency of such solutions, such automatic tools must be able to cope with situations in which the continuous growth of the available material and knowledge is a fundamental and unavoidable issue.The application of symbolic ML methods allows to organize and classify documents, to cope with the incrementality and the need for continuous updating and refining classification theories and concepts, in order to improve accuracy according to new available documents. Techniques for text categorization and information extraction can be applied to selected blocks and provide information that can be added to the documents as metadata, in order to improve the effectiveness and efficiency of the retrieval procedure. In document image understanding, the possibility of associating relevant text to underlying informative content allows to search documents at a semantic level rather than just at a syntactic one, in the perspective of the Semantic Web. © 2013 Author.

Symbolic machine learning methods for historical document processing

ESPOSITO, Floriana
2013-01-01

Abstract

Numerous valuable historic and cultural sources - a major part of our cultural heritage - are currently imperilled and scattered in various national archives. Arts and Humanities are sciences that are mainly based on the interpretation of cultural objects such as texts, paintings and works of arts, or historical/ethnological remains and monuments. Such objects are often unique, very valuable, fragile, irreplaceable and locally preserved in scientific collections at museums, in archives, or in urban and historic areas. Archives, museums and other cultural institutions do not simply conserve these objects. They also manage a large of documentation on them in the form of photo collections, expertise, records, scientific studies and analyses. Both the objects themselves as well as the supplementary documentation are often accessible only through physical contact with users. Duplicates such as text documents (e.g., critical editions), or image documents (facsimiles, photographs) on paper are extremely expensive in terms of manpower, know-how and printing costs, and often these expenses cannot be justified for a small scientific audience. Electronic formats for object documentation might alleviate this access problem. Numerous initiatives have been started and supported to highlight and investigate a variety of challenges that museums and other culture-historical institutions are facing in an increasingly digital, media saturated landscape. However, full knowledge and usage of this material are severely impeded by access problems, due to the lack of appropriate content-based search and retrieval aids that help users to find what they really need even when electronically and digitized copies are available. Preserving contents does not consist in simply storing them, but in actively transforming them to adapt them technically and keep them intelligible. Moreover, many informal and non-institutional contacts between cultural archives constitute specific professional communities which today, however, still lack effective and efficient technological support for cooperative and collaborative knowledge working. The creation of digital libraries, enhanced by annotation collaboratory facilities, is the technological response to bundle documents, interpretation knowledge, work processes and an expert network in a very flexible working environment. Object and document collections in the Arts and Humanities always represent work in progress. The inventory at cultural institutions is growing steadily due to donations, acquisitions, and by virtue of their own daily scientific and conservation services. These additions must be incorporated into the existing collections but often space difficulties, problems of scientific know-how and lack of personnel have to be dealt with. Professionals and experts classify, analyze, assess and expose or edit these objects and documents. Highly qualified external specialists are frequently difficult to locate, if they are not part of a scholarly network. Internal experts are often overburdened with routine work in times of small cultural budgets and can only invest time sporadically and intermittently in integrating new inventories. Many scientific members of cultural institutions have temporary contracts and leave after a few years, taking with them a great part of the accumulated know-how. The intrinsic nature of the document processing procedures supporting the progressive work on historic material, as outlined in this introduction, poses several constraints that require solutions specifically tailored to the tasks mentioned above. Over the years, Intelligent Systems are becoming valuable working instruments for researchers involved in humanistic sciences. The new challenge is now to provide these people with tools that are able to facilitate the fruition and investigation of the cultural heritage, so that even non-experts or communities of researchers may use up-to-date tools for both their personal work and for collaborative purposes. Technologically, the World Wide Web can serve both as a standard communication platform for such communities and as a gateway for document-centered digital library applications. Yet, while the Web may solve the problem of the diffusion and access of this material in its digital form, new automated tools are needed to allow a more intelligent processing and a personalized utilization of this knowledge. According to the situation previously described, besides the effectiveness and the efficiency of such solutions, such automatic tools must be able to cope with situations in which the continuous growth of the available material and knowledge is a fundamental and unavoidable issue.The application of symbolic ML methods allows to organize and classify documents, to cope with the incrementality and the need for continuous updating and refining classification theories and concepts, in order to improve accuracy according to new available documents. Techniques for text categorization and information extraction can be applied to selected blocks and provide information that can be added to the documents as metadata, in order to improve the effectiveness and efficiency of the retrieval procedure. In document image understanding, the possibility of associating relevant text to underlying informative content allows to search documents at a semantic level rather than just at a syntactic one, in the perspective of the Semantic Web. © 2013 Author.
2013
978-1-4503-1789-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/70752
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact