Search for a command to run...
Climate Change Literature Perspective: A Bibliometric Analysis This repository contains the R code, methodology, and datasets used to analyze metadata from a large-scale systematic literature search related to climate change. The outputs of this processing step have been used to shape the broader academic perspective published in the journal Frontiers in Environmental Science, titled: "Scientific Coherence in Climate Change Research: A Meta-Research Perspective to Accelerate Scientific Progress and Climate Justice" Project Overview The core objective of this code is to take raw metadata extracted from Scopus and refine it by: Cleaning: Removing duplicate entries based on unique identifiers (EIDs). Feature Engineering - Geography: Extracting the first author's country of affiliation from the raw affiliation strings. Feature Engineering - Economy: Mapping the extracted countries to the 2024 World Bank Income Groups (High, Upper-middle, Lower-middle, and Low income). Exporting: Saving a refined dataset for downstream analysis. Visualization: Plotting the absolute number and share of climate change articles published across different income groups from 1946 to 2024. Data Source The data was sourced from Scopus (Advanced Document Search) using the following query: TITLE-ABS-KEY ( "climate change*" OR "global warming" OR "sea level rise" OR "sea-level rise" OR "rising sea level*" ) AND ( LIMIT-TO ( SRCTYPE , "j" ) ) AND ( LIMIT-TO ( PUBSTAGE , "final" ) ) AND ( EXCLUDE ( DOCTYPE , "rp" ) OR EXCLUDE ( DOCTYPE , "tb" ) OR EXCLUDE ( DOCTYPE , "dp" ) OR EXCLUDE ( DOCTYPE , "cr" ) OR EXCLUDE ( DOCTYPE , "er" ) OR EXCLUDE ( DOCTYPE , "bk" ) OR EXCLUDE ( DOCTYPE , "ch" ) OR EXCLUDE ( DOCTYPE , "cp" ) ) Note on Data Hosting: The raw bibliographic data and the refined datasets are extremely large (multiple Gigabytes). They are currently being tracked and pushed to this repository using Git LFS (Large File Storage). Structure code.R: The highly structured and professional main R script. It handles data ingestion, rule-based data cleaning, income level classification, data exports, and plotting. ClimateChangeMetadata.zip: The raw input data containing ~1.79 million records. (Tracked via Git LFS) ClimateChangeMetadata_Refined.csv: The clean data containing ~543,000 unique, classified records resulting from the cleaning of the original dataset (in the zip file). This refined dataset is the one used directly to generate the descriptive statistics and graphs regarding publications by income group. (Tracked via Git LFS) Requirements to Run the Code To execute the scripts in this repository, you need: R (version 4.0 or higher recommended) dplyr data.table stringr ggplot2 RColorBrewer scales You can install these dependencies in your R console via: install.packages(c("dplyr", "data.table", "stringr", "ggplot2", "RColorBrewer", "scales")) How to Run Clone this repository (Make sure git-lfs is installed to fetch the CSV files). Set your Working Directory in R to the cloned repository. Run or source the code.R script: source("code.R") The script will automatically parse ClimateChangeMetadata.csv, print output logs to the console, save ClimateChangeMetadata_Refined.csv, and prepare two ggplot objects (plot1 and plot2) that you can print to your graphics device.