Anglia Ruskin Research Online (ARRO)
Browse
Kettouch_2017.pdf (4.66 MB)

A new approach for interlinking and integrating semi-structured and linked data

Download (4.66 MB)
thesis
posted on 2023-08-30, 15:08 authored by Mohamed S. Kettouch
This work focuses on improving data integration and interlinking systems targeting semi-structured and Linked Data. It aims at facilitating the exploitation of semi-structured and Linked Data by addressing the problems of heterogeneity, complexity, scalability and the degree of automation. Technologies, such as the Resource Description Framework (RDF), enabled new data spaces and concept descriptors to define an increasing complex and heterogeneous web of data. Many data providers, however, continue to publish their data using classic models and formats. In addition, a significant amount of the data released before the existence of the Linked Data movement have not emigrated and still have a high value. Hence, as a long term solution, an interlinking system has been designed to contribute to the publishing of semi-structured data as Linked Data. Simultaneously, to utilise these growing data resource spaces, a data integration middleware has been proposed as an immediate solution. The proposed interlinking system verifies in the first place the existence of the Uniform Resource Identifier (URI) of the resource being published in the cloud in order to establish links with it. It uses the domain information in defining and matching the datasets. Its main aim is facilitating following best practice recommendations in publishing data into the Linked Data cloud. The results of this interlinking approach show that it can target large amounts of data whilst preserving good precision and recall. The new approach for integrating semi-structured and Linked Data is a mediator-based architecture. It enables the integration, on-the-fly, of semi-structured heterogeneous data sources with large-scale Linked Data sources. Complexity is tackled through a usable and expressive interface. The evaluation of the proposed architecture shows high performance, precision and adaptability.

History

Institution

Anglia Ruskin University

File version

  • Accepted version

Language

  • eng

Thesis name

  • PhD

Thesis type

  • Doctoral

Legacy posted date

2018-02-08

Legacy creation date

2018-02-08

Legacy Faculty/School/Department

Theses from Anglia Ruskin University

Usage metrics

    ARU Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC