Panel 4: Corpus-based translation studies (Claudio Fantinuoli, Federico Zanettin)

Corpus linguistics has become a major paradigm and research methodology in linguistics and translation studies, with applications ranging from professional human translation to machine (assisted) translation and terminology, from descriptive linguistic and translation research to language teaching and translator training.

In the last 20 years or so, taking advantage of technological advancements in terms of computational power and availability of electronic texts, enormous progress has been made as regards the development of applications for professional translators and machine translation system users. Translation memories and statistical machine translation have changed the way translated texts are created and consumed. At the same time, corpus-based work has entered the curricula at translation training institutions, and theoretical and descriptive research has investigated topics such as translation universals, translation ideology, translator style, and interpreted language.

Whereas the success of machine (assisted) translation systems relies on automation and data quantity, descriptive and pedagogic applications depend on manual analysis and data quality. Corpus-based research in descriptive translation studies critically depends on the availability of suitable tools and resources. Yet, there is still a lack of user-friendly tools allowing researchers in the soft sciences to create and analyze corpora, especially but not exclusively parallel ones, according to the standards of the discipline. As the necessary steps to prepare corpus resources may be technically complex or daunting in terms of manual labour required, those who have the technical expertise (programmers, computational linguists etc.) are very often not willing to spend time on the manual tasks, while those who are prepared to get their hands dirty with the texts (linguists, translation scholars and students) are often not capable of stitching together the various tools available f! or corpus analysis.

Thus, many corpus-based descriptive translation investigations suffer from a piecemeal, fragmentary and tentative approach; the variety of data sets, methods and tools used do not combine into a single overall framework and the results are often hardly commensurable. To unfold the full potential of corpus linguistics methodologies, new high-quality, easy to use linguistic resources are needed by corpus-based translation studies scholars.

This panel aims to provide a framework for discussing corpus data, tools and approaches which may allow translation scholars to collaborate among them and with the NLP community, in order to improve the quality of resources and make them available and accessible, with the ultimate goal of bridging the gap between the hard and soft sides of this multi-faceted field.

This panel welcomes contributions related, but not limited to the following topics:

  • NLP-oriented perspectives and methods for T&I research
  • Corpus-based methodologies and T&I studies
  • Annotation models for descriptive translation studies
  • Translation and corpus design
  • Qualitative and quantitative approaches to corpus analysis in T&I studies
  • Corpus-based translation studies and minority languages
  • Accessibility issues: copyright and data distribution
  • Corpus compilation tools for T&I studies
  • Metadata for descriptive translation research
  • Methods and techniques for data collection
  • Corpus-based analysis of translation shifts
  • Parallel corpora in T&I studies
  • Alignment of parallel corpora
  • Usability of software for corpus building and analysis
  • Spoken corpora and alignment of transcriptions and audio/video recordings