Search for a command to run...
The growth in volume of heterogeneous data that are being created by web, IoT, and enterprise systems remains a major challenge to Big Data management, specifically, in the integration and migration of structured, semi-structured and unstructured data. Conventional methods tend to take the data of one type and this restricts its application in the real world which is varied, high in volume and velocity. The framework suggested in this paper is an integrated model of Big Data migration and integration based on the Linked Data principles. The methodology presents a new pipeline approach that integrates Semantic Business Vocabulary and Rules with Natural Language Processing in extracting semantic constructs of unstructured and semi-structured data, and model-to-model transformation rules in transforming structured relational data into Resource Description Framework representations. It includes a lightweight DOM parser which is used to discern and maintain structural schemas of semi-structured sources like XML and JSON and maintain relational semantics in the conversion. The structure is carried out in two fundamental modules, analysis and synthesis modules: the analysis module that identifies and classifies types of data and the synthesis module that produces coherent RDF-based Linked Data results. Experimental analysis of the framework based on the precision and recall values proves the efficiency of the framework, with a mean precision of 90–97% and a recall of 82–94% in various datasets. It can be observed that the proposed system is better than the existing ones such as D2RQ and Triplify since it supports all three types of data in a single pipe and offers better structural recovery and query-equivalence accuracy. The work adds a scalable, semantically-aware approach to converting heterogeneous Big Data into interoperable Linked Data to enable the improved integration, migration and semantic web mining of data.