Search for a command to run...
This dataset contains a random sample of 150,000 user comments posted on the Daily Mail website in 2021. The sample was drawn from a larger corpus of more than 40 million comments collected from 224,981 Daily Mail articles using a custom Python-based web scraping workflow. To ensure suitability for computational text analysis, only comments containing at least 20 words were included in the sample. The dataset contains the full comment text, article-level metadata, timestamps, and community feedback indicators such as vote counts and rating scores. The dataset is intended for research in computational social science, digital media studies, discourse analysis, and sentiment analysis. Column description RowID: Sequential row identifier within the exported dataset. AssetId: Identifier of the Daily Mail article to which the comment belongs. category: Content category/section of the article (e.g. news, sport, femail, tvshowbiz). custom_id: Unique identifier of the comment. AssetHeadline: Headline/title of the article. DateCreated: Date and time when the comment was created; stored in the file as a numeric date value. AssetCommentCount: Total number of comments associated with the article. AssetUrl: URL path of the corresponding Daily Mail article. message: Full text of the user comment. year: Year of publication/collection of the comment (2021). VoteCount: Total number of votes received by the comment. VoteRating: Net rating of the comment, calculated as positive votes minus negative votes. pos_votes: Number of positive votes received by the comment. neg_votes: Number of negative votes received by the comment.