PLOVER and POLECAT: A New Political Event Ontology and Dataset | Andy Halterman

PLOVER and POLECAT: A New Political Event Ontology and Dataset

Abstract

POLECAT is a new global political event dataset intended to serve as the successor to the dataset produced by the DARPA Integrated Conflict Early Warning System (ICEWS) project. POLECAT’s event data are machine coded from millions of multi-language international news reports and will soon cover the period 2010-to-present. These data are generated using the Next Generation Event Coder (NGEC), a new automated coder that replaces the use of extensive (and difficult to update) dictionaries with a more flexible set of expert annotations of an event’s characteristics. In contrast to existing automated event coders, it uses a combination of NLP tools, transformer-based neural networks, and actor information sourced from Wikipedia. POLECAT’s event data are based on an event-mode-context ontology, the Political Language Ontology for Verifiable Event Records (PLOVER), that replaces the older CAMEO ontology used in past datasets such as ICEWS and Phoenix. These innovations offer substantial improvements in the scope and accuracy of political event data in terms of the what, how, why, where, and when of domestic and international interactions. After detailing PLOVER and POLECAT, we illustrate the innovations and improvements through a preliminary comparison to the existing-ICEWS event data system.

Publication
ISA 2023
Andy Halterman
Andy Halterman
Assistant Professor, MSU Political Science

My research interests include natural language processing, text as data, and subnational armed conflict

Related