political science | Andy Halterman

Building Full-Text Search for BibDesk PDFs

Here’s how most of my teaching prep days went last semester: “I know I’ve read papers using propensity score matching, but I have no idea which ones,” or “surely people have put histograms in published papers,” or “can I find an example of how authors describe calipers when they’re doing matching?

Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts

Evaluating LLMs' abilities to apply political science codebooks.

Using Synthetic Text Data to Train Better Classifiers

I’m excited to share my latest paper, now out in Political Analysis, which introduces a new approach to training supervised text classifiers. The core idea is simple: instead of relying solely on expensive hand-labeled data, we can use generative large language models (LLMs) to generate synthetic training examples, then fit a classifier on the synthetic text (and any real training data we have).

Synthetically generated text for supervised text analysis

Synthetically generated text can help researchers address common issues in supervised text analysis.

Latent Civil War: Improving Inference and Forecasting with a Civil War Measurement Model

A new Bayesian measurement model for civil war and improved forecasting techniques.

Simple political actor classification with "soft" dictionaries

As political scientists, we are often interested in using text to understand the actions of political actors. Thankfully, have a growing set of tools for identifying political actors in text, including named entity recognition and dependency parses, custom event models, or hand labeling events text.

Introducing Mordecai 3: A Neural Geoparser and Event Geocoder in Python

Researchers working with text data are often faced with the problem of identifying place names in text and linking them to their geographic coordinates. In social science, we might want to measure news coverage of specific locations, track discussions of specific places in government documents, or geolocate events such protests to the locations where they occur.

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks

A bag of tricks for efficient, custom event data production, using transformer-based classifiers, question-answering models, Wikipedia entity linking, and active learning.

PLOVER and POLECAT: A New Political Event Ontology and Dataset

POLECAT is a new global event dataset for social science research, coded in the PLOVER event ontology.

Event Data in 30 Lines of Python

Much of my work involves improving large-scale systems to extract political events from text (see code from our NSF project on the subject here). These systems are designed for full production use over many hundreds of sources both daily and for the past in many dozens of event categories, including protests, armed conflict, statements, arrests, and humanitarian aid.