Search for a command to run...
US Congress Co-sponsorship Temporal Networks Overview This codebase implements a pipeline to collect raw data and generate a dataset of US Congress bill co-sponsorship networks, from the 93rd to the 118th sessions, using House of Representatives and Senate relationships. The pipeline scrapes raw data from the Congress.gov API, the GovInfo Bulk Data Repository, and the Congressional Bills Project, cleans inconsistent metadata (names, parties, districts), and exports the results as temporal labeled hypergraphs. The type of bills included are: hconres and sconres: House and Senate Concurrent Resolutions. Must be passed by both chambers but not signed by the President. They do not have the force of law and are used to make or amend rules that apply to both chambers. hjres and sjres: House and Senate Joint Resolutions. Must be passed by both chambers and signed by the President. They have the force of law and are used for continuing or emergency appropriations, declarations of war, and proposing amendments to the Constitution. hres and sres: House and Senate Simple Resolutions. They do not require the approval of both chambers. They do not have the force of law and are used for internal rules, procedures, or other matters within the prerogative of one chamber. hr and s: House and Senate Bills. They address domestic and international issues and programs. Public bills pertain to matters that affect the general public, while private bills affect just certain individuals and organizations. Repository Structure The repository is organized into data collection and processing notebooks and helper modules. Jupyter Notebooks Members_Scraper.ipynb: Purpose: Scrapes and compiles a master list of all US legislators (Representatives and Senators). Output: Generates intermediate lookup files required by the bills scraper: congress_members_names.csv, congress_members_parties.csv, and congress_members_terms.csv. Dependency: This notebook must be run before Congress_Bills_Scraper.ipynb to ensure metadata lookups exist. Congress_Bills_Scraper.ipynb: Purpose: The main pipeline. It downloads bill metadata, sponsor/co-sponsor lists, cleans the data using the helper modules, and generates the final CSV and hypergraph files. Key Steps: Download Raw Data; Metadata Normalization; Integration; Hypergraph Construction. Python Helper Modules These modules contain dictionaries and logic to handle historical inconsistencies found in raw government data. name_info.py: Handles name harmonization, including, converting nicknames (e.g., "Bill" to "William"), fixing specific typos in the raw data (e.g., "Foto Sunia" to "Fofo Sunia"), and normalizing accents and formatting. party_info.py: Manages party codes and handles legislators who switched parties mid-career. It maps historical codes (e.g., 100, 200) to standard abbreviations (D, R, I) based on the specific Congress session. district_info.py: Maps legislators to their specific district for a given Congress session, accounting for redistricting and "At Large" designations. gender_info.py: Contains dictionaries to impute missing gender data based on first names and specific legislator lookups (e.g., identifying "Shelley Sekula Gibbs" as female). Output Files File List File/Folder Name Type Description cleaned_bill_sponsors_93-118.csv CSV Bill sponsorship data with outcome metadata cleaned_bill_cosponsors_93-118.csv CSV Bill co-sponsorship data by legislator congress_members_static.csv CSV Static attributes of the legislators congress_members_parties.csv CSV Temporal legislator-party relations congress_members_terms.csv CSV Temporal legislator-district relations legislators_93-118.csv CSV Legislator's dynamic attributes for each congress xgi_graphs Folder Hypergraphs in XGI format hgx_graphs Folder Hypergraphs in HypergraphX format hif_format Folder Hypergraphs in the Hypergraph Interchange Format hedge_lists Folder TSV files: each row is a labeled hyperedge (bill) node_labels Folder Node metadata files by chamber and congress Data Fields Cleaned Bill Sponsors Field Name Field Type Description BillNum Int Bill number BillType Cat Bill type; takes values in [hconres, sconres, hjres, sjres, hres, sres, hr, s] Chamber Int Code of the chamber: 0 indicates House and 1 indicates Senate CongressNum Int Congress number CosponsorNum Int Number of legislators that co-sponsored the bill CosponsorWWNum Int Number of legislators that co-sponsored the bill, including the withdrawals PassH Int Whether the bill passed in the House PassS Int Whether the bill passed in the Senate PLaw Int Whether the bill became law (passed both chambers and signed by the President) PolicyArea String Policy area of the bill IntroducedDate Date Date the bill was introduced in the Chamber Title String Title of the bill bioguideId String Unique id associated with the sponsor's record in the Biographical Directory of the US Congress Cleaned Bill Cosponsors Field Name Field Type Description BillNum Int Bill number BillType Cat Bill type; takes values in [hconres, sconres, hjres, sjres, hres, sres, hr, s] Chamber Int Code of the chamber: 0 indicates House, and 1 indicates Senate CongressNum Int Congress number bioguideId String Unique id associated with the legislator's record in the Biographical Directory of the US Congress Legislators Field Name Field Type Description District String District represented by the legislator Party Cat Political party of the legislator State Cat State represented by the legislator CongressNum Int Congress number bioguideId String Unique id associated with the legislator's record in the Biographical Directory of the US Congress Chamber Int Code of the chamber: 0 indicates House, and 1 indicates Senate Congress Members Static Field Name Field Type Description bioguideId String Unique id associated with the legislator's record in the Biographical Directory of the US Congress birthYear Int Birth year of the legislator NameFirst String First name of the legislator NameLast String Last name of the legislator directOrderName String Full name of the legislator Gender Int Gender of the legislator (0 for Man and 1 for Woman) cosponsoredLegislation Int Number of co-sponsored legislations (updated October 2025) sponsoredLegislation Int Number of sponsored legislations (updated October 2025) Congress Members Terms Field Name Field Type Description bioguideId String Unique id associated with the legislator's record in the Biographical Directory of the US Congress Chamber Int Code of the chamber: 0 indicates House, and 1 indicates Senate CongressNum int Congress number StartTerm Date Date when the legislator started his term for that state and district EndTerm Date Date when the legislator ended his term for that state and district State Cat State code of the legislator District String District code of the legislator (if House member) Congress Members Parties Field Name Field Type Description bioguideId String Unique id associated with the legislator's record in the Biographical Directory of the US Congress Party Cat Political party of the legislator (D for Democrats, R for Republicans, and I for Independent) startYearInParty Date Date when the legislator started his term for that party endYearInParty Date Date when the legislator ended his term for that party CongressNum Int Congress number Hypergraphs To facilitate network analysis, undirected hypergraphs were constructed from the cleaned dataset. These hypergraphs include: Labeled Nodes: Legislators (identified by bioguideId, with labels Gender, State, Party, and birthYear); Labeled Hyperedges: Each bill (hyperedge) includes all sponsors and cosponsors, with labels PolicyArea and BillType. File Naming All hypergraph files follow the naming convention: congress_bills__chamber={H|S}__cong_num={XX} where {H|S} indicates House or Senate and {XX} is the Congress number (e.g., 93–118). Hypergraphs in the folders xgi_graphs, hgx_graphs, and hif_format can be loaded using the read functions available in the corresponding libraries.The format used to store the hypergraphs as TSV files is the following: the list of sponsors and cosponsors is comma-separated, and the hyperedge labels (bill type and policy area) are tab-separated from the list of legislators. All label files follow the naming convention: node_labels_congress_bills__chamber={H|S}__cong_num={XX}.csv edge_labels_congress_bills__chamber={H|S}__cong_num={XX}.csv Legislators are identified by their bioguideId, and edge labels by the order they appear in the corresponding hyperedge file. Preprocessing & Integration Notes Data from the Congressional Bills Project was merged with legislator metadata from the Congress.gov API. The following steps were taken to clean and integrate the data: Cosponsor count imputation: Filled missing cosponsor counts when not present in the original dataset. Name harmonization: Standardized inconsistent first/last names across data sources. Party correction: Fixed inconsistencies due to party changes over time. District correction: Fixed inconsistencies due to district changes over time. Multiple sponsors: Some bills had multiple sponsors. Missing values: Filled missing district, state, gender, and bioguideId using data from the congress.gov website. State remapping: Mapped state codes to ICPSR codes. Duplicate cosponsors: Occasionally, the Congress API lists the same cosponsor multiple times; minor inconsistencies in cosponsor counts may remain. Requirements Python 3.13+ Key Libraries: pandas, numpy, requests, tqdm, seaborn, xlrd. Graph Libraries: xgi and hypergraphx (for hypergraph construction). API Key: A valid API key from api.congress.gov. Configuration In both Members_Scraper.ipynb and Congress_Bills_Scraper.ipynb, you must insert your API k