The paper explores some of the issues raised by the notion of the web-as-corpus (Kilgarriff-Greffenstette 2003; Baroni – Bernardini 2006) on both theoretical and applicative grounds. The basic assumption is that the very existence of a trend within corpus linguistics that can be freely labelled as web-as-corpus does not simply question the way we conceive of a corpus in modern linguistics, but rather “serves as a magnifying glass for the methodological issues that corpus linguists have discussed all along” (Hundt et al. 2007), and possibly mirrors also changes taking place in society at large. The idea of considering the web as a corpus presupposes a definition of what a corpus is and possibly entails a renegotiation of what a corpus can be. The paper thus revises some key issues in corpus linguistics such as “authenticity”, “representativeness”, “size” “content” (Sinclair 1991; McEnery-Wilson 2001: Tognini Bonelli 2001; Hunston 2002) in the light of the peculiarities of the web as a ‘spontaneous’, ‘self-generating’ collection of texts, and explores some of the new issues which the emerging notion of the web-as-corpus seems to raise, such as “dynamism”, “reproducibility”, “relevance and reliability”, “distributed architecture” (Fletcher 2004; Baroni 2006; Lüdeling 2007). These theoretical issues are then tested on applicative grounds through case studies based on the most common tools and methods devised to exploit the enormous potential of the web as a linguistic resource, either via ordinary search engines or through linguistically-oriented tools such as WebCorp or BootCat (Kehoe – Renouf 2002; Baroni - Bernardini 2004 and 2006). This ongoing process finally leaves room for further investigations. While the reasons for turning to the web as a corpus may have in fact been pre-eminently opportunistic at the outset (size, low cost, ease of access...), it is also self-evident that the web has also imposed itself to the linguists’ attention as an object of scientific enquiry thanks to its intrinsic peculiarities as “a social phenomenon ... whose chief stock-in-trade is language” (Crystal 2006). Indeed, the ‘changing face’ (Renouf 2006) of corpus linguistics clearly signifies the “convergence of technologies and standards in several related fields which have in common the goal of delivering linguistic content through electronic means” (Wynne 2002), and can be seen as the outcome of a wider process of redefinition in terms of flexibility, multiplicity and complexity which corpus linguistics is undergoing along with other fields of human activity.

Negotiating corpus identity: from body to web

GATTO, MARISTELLA
2009-01-01

Abstract

The paper explores some of the issues raised by the notion of the web-as-corpus (Kilgarriff-Greffenstette 2003; Baroni – Bernardini 2006) on both theoretical and applicative grounds. The basic assumption is that the very existence of a trend within corpus linguistics that can be freely labelled as web-as-corpus does not simply question the way we conceive of a corpus in modern linguistics, but rather “serves as a magnifying glass for the methodological issues that corpus linguists have discussed all along” (Hundt et al. 2007), and possibly mirrors also changes taking place in society at large. The idea of considering the web as a corpus presupposes a definition of what a corpus is and possibly entails a renegotiation of what a corpus can be. The paper thus revises some key issues in corpus linguistics such as “authenticity”, “representativeness”, “size” “content” (Sinclair 1991; McEnery-Wilson 2001: Tognini Bonelli 2001; Hunston 2002) in the light of the peculiarities of the web as a ‘spontaneous’, ‘self-generating’ collection of texts, and explores some of the new issues which the emerging notion of the web-as-corpus seems to raise, such as “dynamism”, “reproducibility”, “relevance and reliability”, “distributed architecture” (Fletcher 2004; Baroni 2006; Lüdeling 2007). These theoretical issues are then tested on applicative grounds through case studies based on the most common tools and methods devised to exploit the enormous potential of the web as a linguistic resource, either via ordinary search engines or through linguistically-oriented tools such as WebCorp or BootCat (Kehoe – Renouf 2002; Baroni - Bernardini 2004 and 2006). This ongoing process finally leaves room for further investigations. While the reasons for turning to the web as a corpus may have in fact been pre-eminently opportunistic at the outset (size, low cost, ease of access...), it is also self-evident that the web has also imposed itself to the linguists’ attention as an object of scientific enquiry thanks to its intrinsic peculiarities as “a social phenomenon ... whose chief stock-in-trade is language” (Crystal 2006). Indeed, the ‘changing face’ (Renouf 2006) of corpus linguistics clearly signifies the “convergence of technologies and standards in several related fields which have in common the goal of delivering linguistic content through electronic means” (Wynne 2002), and can be seen as the outcome of a wider process of redefinition in terms of flexibility, multiplicity and complexity which corpus linguistics is undergoing along with other fields of human activity.
2009
978-88-6194-057-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/52744
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact