Large-scale Datasets and Archives I Contributed to

Political Organizations in News (PONs)

NSF #1657872, PI: Edwin Amenta (Department of Sociology, UC Irvine)

Web-scraped, manipulated, and analyzed over a million news articles mentioning 34 social movements over the twentieth century, published in national, local, and African American newspapers;

Contributed two chapters and all data visualizations to Edwin Amenta and Neal Caren’s Rough Draft of History: A Century of US social Movements in the News (Princeton University Press, 2022)

Currently using Large Language Modes (LLMs) to expand the PONs dataset.

Foreign-Invested Enterprises in China (FIEC) Dataset & Government Procurements Research Team

NSF #2238897, PI: Samantha Vortherms (Department of Political Science, UC Irvine)

Web-scraped foreign-invested enterprises in China data from the Ministry of Commerce website;

Led a team of graduate students from UC San Diego and compiled a dataset of Green Public Procurements in China.

Policy Development across Wikipedia Language Editions

PIs: Benjamin Mako Hill (Department of Communication, UWash-Seattle) and Seth Frey (Department of Communication, UC Davis)

Scraped, manipulated, and analyzed the creation and diffusion of 60 policies across 245 Wikipedia language Editions;

Presented at Wiki Workshop 2023 and Wikimania 2024.

China's Cultural Revolution in Memories

Led by Haihui Zhang (University of Pittsburgh East Asian Library)

China’s Cultural Revolution in Memories: CR/10 is an experimental oral history project. It collects ordinary people’s memories of China's Great Proletarian Cultural Revolution (1966–1976).

Interviews I conducted are posted on University of Pittsburgh’s Digital Collections website and featured in the documentaryThe Revolution They Remember (see trailer on YouTube).