LULCC-KnowText - annotated entities for knowledge extraction on Land Use and Land Cover change

20260 citationsDatasetgreen Open Access

Authors

Arthur Crespin-Boucaud · Laboratoire HydroSciences Montpellier

Sarah Valentin · Centre de Coopération Internationale en Recherche Agronomique pour le Développement

Abstract

This dataset contains a subset of relevant sentences extracted from the Dataset "LULCC-KnowText - annotated text segments for knowledge extraction on Land Use and Land Cover change" (<a href="https://doi.org/10.18167/DVN1/F0HLEH">https://doi.org/10.18167/DVN1/F0HLEH</a>). These sentences were annotated at the entity level, including 16 types of entities. The dataset consists of 2 files: <ul> <li>annotated_entities.jsonl: A JSON lines file in which each entry corresponds to a text segment extracted from a scientific document and enriched with manually annotated entities for knowledge extraction. The top-level keys include : <ul> <li>id_segment: Identifier of the text segment manually labelled. This identifier links the segment to its corresponding scientific article.</li> <li>text: Raw text of the segment as it appears in the original scientific article.</li> <li>entities: List of entities manually annotated within the text segment.</li></ul> Each element in the entities list represents a single annotated entity and contains the following fields: <ul> <li> id: Unique identifier of the entity annotation within the segment.</li> <li> label: Semantic category assigned to the entity. Labels correspond to domain-specific concepts related to land use and land cover (e.g. LOC, LOC_LANDSCAPE, LULC, PRACTICE, CHANGE_UP, etc.).</li> <li> start: Start character offset of the entity in the text field (inclusive).</li> <li>end: End character offset of the entity in the text field (exclusive).</li> <li>value: Text span corresponding exactly to the annotated entity.</li> </ul> <li>entity_annotation_guidelines.pdf: The annotation guidelines used to manually annotate the entities.</li> </ul>

Topics & Keywords

Publication Details

Published in: Centre de coopération internationale en recherche agronomique pour le développement

DOI: 10.18167/dvn1/frqgid

Command Palette

LULCC-KnowText - annotated entities for knowledge extraction on Land Use and Land Cover change

Authors

Abstract

Topics & Keywords

Publication Details