Dataset : When the Machine Is Not Enough: An Autoethnography of AI-Assisted Amateur Research and the Boundaries of Epistemic Authority

20270 citationsDatasetgreen Open Access

Authors

Young Taeck Oh · Chung-Ang University Hospital

Abstract

========================================ZENODO RECORD UPDATE======================================== Title:When the Machine Is Not Enough: An Autoethnography of AI-Assisted Amateur Research and the Boundaries of Epistemic Authority Description:Replication dataset for a manuscript submitted to AI & Society. This deposit contains the analysis code, coded datasets, and tone classification results supporting an autoethnographic study of AI-assisted amateur research across three disciplinary domains. An emergency medicine physician used OpenAI's ChatGPT-4o as an epistemic collaborator for research projects in mathematics (Collatz conjecture), cosmology (Information-Topological Cosmology), and science and technology studies (Creative Singularity Triangle). All three resulting manuscripts were submitted to peer-reviewed journals and rejected. The study analyzes these failures through the lens of Collins and Evans's interactional expertise, Gieryn's boundary work, and Polanyi's tacit knowledge. The core dataset comprises 2,566 unique assistant messages across 47 research conversations, obtained after deduplication (removing 1,123 duplicates from overlapping ChatGPT exports) and filtering of non-research conversations. All messages were classified at the message level into four tone categories—encouragement-dominant (E), criticism-dominant (C), mixed (M), and neutral/informational (N)—via full census using Anthropic's Claude Opus 4 as the initial coder, with all classifications subject to author review. Key findings: The overall encouragement-to-criticism ratio is 3.6:1, varying with disciplinary verifiability—1.2:1 in mathematics (where claims are formally decidable), 5.3:1 in speculative cosmology (where falsification within a conversation is difficult), and 3.3:1 in STS (intermediate). Among evaluative messages (n = 760), 63.7% were encouragement-dominant, 18.6% mixed, and 17.8% criticism-dominant. File descriptions: 1. research_messages_deduped.csv (8.2 MB) - Deduplicated corpus of 2,566 assistant messages (plus corresponding user messages; 4,808 rows total) across 47 research conversations - Columns: assigned_project, conv_title, msg_id, role, datetime_kst, date, hour, model, text, text_length, word_count - Projects: Collatz (6 conversations, 402 messages), ITC (33 conversations, 1,746 messages), CST (8 conversations, 418 messages) 2. census_all_tones.json - Complete tone classification results: dictionary mapping each msg_id to its tone label (E/C/M/N) - 2,566 entries covering all assistant messages 3. census_review.csv - Author review file with msg_id, project, conversation title, assigned tone, and 200-character text preview - Designed for validation and reproducibility checking 4. full_census/batch_000.json through batch_051.json (52 files) - Input batches for the classification pipeline, each containing up to 50 messages with full text, project label, and conversation title 5. full_census/result_000.json through result_051.json (52 files) - Raw classification outputs from Claude Opus 4, each containing msg_id and tone label pairs Note: Raw ChatGPT conversation logs (JSON exports) are available from the corresponding author upon request. The logs contain the complete interaction history but are not deposited due to their size (>50 MB) and to avoid potential privacy concerns from incidental personal information in conversation text. Keywords:artificial intelligence, large language models, expertise, autoethnography, tone classification, peer review, RLHF, sycophancy, encouragement trap, boundary work License: CC BY 4.0 ========================================FILES TO UPLOAD (replacing old deposit)======================================== Keep:- (none from old deposit; all superseded) Upload new:1. tone_classification/research_messages_deduped.csv2. tone_classification/full_census/census_all_tones.json3. tone_classification/full_census/census_review.csv4. tone_classification/full_census/batch_000.json ~ batch_051.json (52 files)5. tone_classification/full_census/result_000.json ~ result_051.json (52 files)

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18908883

Command Palette

Dataset : When the Machine Is Not Enough: An Autoethnography of AI-Assisted Amateur Research and the Boundaries of Epistemic Authority

Authors

Abstract

Topics & Keywords

Publication Details