The paper presents and evaluates methods and tools for the semi-automated compilation and exploration of web corpora. These methods and tools have been developed over the past few years in the context of research on the “web as/for corpus” and have proved extremely useful for the compilation of large general purpose reference corpora for a variety of languages as well as for the creation of monolingual/multilingual corpora for terminology extraction (Baroni and Bernardini 2004; Baroni and Bernardini 2006; Baroni et al. 2009). More recently they have started to attract attention in the context of Critical Discourse Analysis, where flexible user-friendly tools for the compilation and exploration of corpora (e.g. WebBootCaT and Sketch Engine) might be promising allies in the effort to join forces between corpus linguistics and critical studies (Baker 2006; Gabrielatos 2007; Wild et al. 2013). Without questioning the validity of the established practice of building carefully compiled traditional corpora, the possibility of creating ad hoc corpora in a few minutes for a variety of domains and genres can certainly contribute to spread the use of quantitative evidence to support, validate and stimulate the work of researchers primarily engaged in qualitative analysis of language data. For their characteristics, these quick ad hoc corpora could be defined as ‘renewable corpora’, since they are easily and rapidly created and recreated on the basis of customized criteria and variable parameters, as well as ‘sustainable corpora’, since they can be maintained, updated and regenerated at an extremely favourable cost-effectiveness ratio. These characteristics make them particularly useful in the context of reasearch dealing with issues whose topicality requires continuous updating of the resources or in the case of issues involving multiple agencies. This is the case of the compilation of corpora for such diverse transnational issues as sustainable tourism or immigration covered in the case studies discussed. As the paper will argue, taking the approach to “sustainability” in the context of international tourism organizations and the parliamentary discussion of the 2014 Immigration Act, there seems to be ample evidence to the fact that the simultaneous ‘distant reading’ (to borrow Moretti’s recently coined term) of many different texts – which is what corpora basically offer – can prove to be useful in many research and teaching contexts and perfectly integrates traditional approaches based on ‘close reading’ of individual texts. In this scenario, the potential of quick ad hoc ‘sustainable’ corpora definitely deserves further investigation.

Sustainable corpora for transnational subjects. Methods and tools

Maristella Gatto
2017-01-01

Abstract

The paper presents and evaluates methods and tools for the semi-automated compilation and exploration of web corpora. These methods and tools have been developed over the past few years in the context of research on the “web as/for corpus” and have proved extremely useful for the compilation of large general purpose reference corpora for a variety of languages as well as for the creation of monolingual/multilingual corpora for terminology extraction (Baroni and Bernardini 2004; Baroni and Bernardini 2006; Baroni et al. 2009). More recently they have started to attract attention in the context of Critical Discourse Analysis, where flexible user-friendly tools for the compilation and exploration of corpora (e.g. WebBootCaT and Sketch Engine) might be promising allies in the effort to join forces between corpus linguistics and critical studies (Baker 2006; Gabrielatos 2007; Wild et al. 2013). Without questioning the validity of the established practice of building carefully compiled traditional corpora, the possibility of creating ad hoc corpora in a few minutes for a variety of domains and genres can certainly contribute to spread the use of quantitative evidence to support, validate and stimulate the work of researchers primarily engaged in qualitative analysis of language data. For their characteristics, these quick ad hoc corpora could be defined as ‘renewable corpora’, since they are easily and rapidly created and recreated on the basis of customized criteria and variable parameters, as well as ‘sustainable corpora’, since they can be maintained, updated and regenerated at an extremely favourable cost-effectiveness ratio. These characteristics make them particularly useful in the context of reasearch dealing with issues whose topicality requires continuous updating of the resources or in the case of issues involving multiple agencies. This is the case of the compilation of corpora for such diverse transnational issues as sustainable tourism or immigration covered in the case studies discussed. As the paper will argue, taking the approach to “sustainability” in the context of international tourism organizations and the parliamentary discussion of the 2014 Immigration Act, there seems to be ample evidence to the fact that the simultaneous ‘distant reading’ (to borrow Moretti’s recently coined term) of many different texts – which is what corpora basically offer – can prove to be useful in many research and teaching contexts and perfectly integrates traditional approaches based on ‘close reading’ of individual texts. In this scenario, the potential of quick ad hoc ‘sustainable’ corpora definitely deserves further investigation.
2017
9788820767402
File in questo prodotto:
File Dimensione Formato  
GATTO - Sustainable corpora.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.78 MB
Formato Adobe PDF
1.78 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
GATTO_Sustainable corpora- Pre-print.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/212965
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact