Overview
Recent advances in natural language generation
have led to the development of models capable
of generating high-quality and human-like texts
among many languages and domains. However,
it is known that such models can be misused
for malicious purposes, including but not limited to generating fake news, spreading propaganda, and facilitating fraud. This tutorial aims
at bringing awareness of artificial text detection,
a fast-growing niche field devoted to mitigating
the misuse of these models. It targets NLP researchers and industrial practitioners who work
with text generative models and/or on mitigating ethical, social, and privacy harms. Our
tutorial provides the attendees with a comprehensive background on this topic and reviews
in a holistic manner: (1) issues of generative
models that can exacerbate their misuse, (2)
terminologies and task definitions, (3) models
well-studied for the task, (4) existing datasets
and benchmarks, (5) approaches to detecting
generated texts, (6) standard crowd-sourcing
practices and related critical studies, (7) downstream applications, and (8) established risks
of harm. We conclude by outlining unresolved
methodological problems and future work directions.
There are 4 parts of the Tutorial:
Part 1: Introduction
Part 2: Landscape
Part 3: Automatic and Human Artificial Text Detectors
Part 4: Conclusion - Application, Ethics & Summary