Datasets — Weijun Yuan

Political Organizations in News (PONs)

NSF #1657872, PI: Edwin Amenta (Department of Sociology, UC Irvine)

Web-scraped, manipulated, and analyzed over a million news articles mentioning 34 social movements over the twentieth century, published in national, local, and African American newspapers.

Contributed two chapters and all data visualizations to Edwin Amenta and Neal Caren's Rough Draft of History: A Century of US Social Movements in the News (Princeton University Press, 2022).

Currently using Large Language Models (LLMs) to expand the PONs dataset.

Foreign-Invested Enterprises in China (FIEC) Dataset & Government Procurements Research Team

NSF #2238897, PI: Samantha Vortherms (Department of Political Science, UC Irvine)

Web-scraped foreign-invested enterprises in China data from the Ministry of Commerce website.

Led a team of graduate students from UC San Diego and compiled a dataset of Green Public Procurements in China.

Policy Development across Wikipedia Language Editions

PIs: Benjamin Mako Hill (Department of Communication, UWash-Seattle) and Seth Frey (Department of Communication, UC Davis)

Scraped, manipulated, and analyzed the creation and diffusion of 60 policies across 245 Wikipedia language editions.

Presented at Wiki Workshop 2023 and Wikimania 2024.

China's Cultural Revolution in Memories

Led by Haihui Zhang (University of Pittsburgh East Asian Library)

China's Cultural Revolution in Memories: CR/10 is an experimental oral history project. It collects ordinary people's memories of China's Great Proletarian Cultural Revolution (1966–1976).

Interviews I conducted are posted on University of Pittsburgh's Digital Collections website and featured in the documentary The Revolution They Remember (see trailer on YouTube).