Search for a command to run...
This dataset contains a subset of relevant sentences extracted from the Dataset "LULCC-KnowText - annotated text segments for knowledge extraction on Land Use and Land Cover change" (<a href="https://doi.org/10.18167/DVN1/F0HLEH">https://doi.org/10.18167/DVN1/F0HLEH</a>). These sentences were annotated at the entity level, including 16 types of entities.</p> <p>The dataset consists of 2 files:</p> <ul> <li><strong>annotated_entities.jsonl</strong>: A JSON lines file in which each entry corresponds to a text segment extracted from a scientific document and enriched with manually annotated entities for knowledge extraction. The top-level keys include : <ul> <li>id_segment: Identifier of the text segment manually labelled. This identifier links the segment to its corresponding scientific article.</li> <li>text: Raw text of the segment as it appears in the original scientific article.</li> <li>entities: List of entities manually annotated within the text segment.</li></ul> Each element in the entities list represents a single annotated entity and contains the following fields: <ul> <li> id: Unique identifier of the entity annotation within the segment.</li> <li> label: Semantic category assigned to the entity. Labels correspond to domain-specific concepts related to land use and land cover (e.g. LOC, LOC_LANDSCAPE, LULC, PRACTICE, CHANGE_UP, etc.).</li> <li> start: Start character offset of the entity in the text field (inclusive).</li> <li>end: End character offset of the entity in the text field (exclusive).</li> <li>value: Text span corresponding exactly to the annotated entity.</li> </ul> <li><strong>entity_annotation_guidelines.pdf</strong>: The annotation guidelines used to manually annotate the entities.</li> </ul>
Published in: Centre de coopération internationale en recherche agronomique pour le développement
DOI: 10.18167/dvn1/frqgid