Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
In today's companies and organizations, databases are omnipresent. They are the means to represent and store information. Therefore, they must evolve whenever the information architecture (semantic and structure) changes. Databases are subject to evolution for several reasons that include the changes to the real-world or the emergence of new database user requirements. We may also consider non representational and non-functional aspects such as the technology and performance requirements. Those can also compel the database designer or administrator to change the database in order to be more optimal and adapted to those requirements. This thesis proposes to address the schema evolution problem by anticipating the potential changes for the evolution of the schema. The approach adopts the a priori perspective i.e. to plan in advance for a possible solution that makes a schema evolve over time. Such a solution works with both a set of assumptions and techniques. The elaborated approach is based on different components: • First, we propose to develop a schema repository which contains a wide number of heterogeneous schemas and their historical versions. This repository serves in a series of operations that are important for the study of the schemas such as selecting the pertinent schemas, building the change matrices and performing the data-mining process. • Second, we propose to develop a requirements ontology, meaning to say a domain ontology that describes the current concepts of a database as well as the concepts that represent the potential future requirements that could be part of the evolution. The method of construction of this ontology includes several phases such as the preparation of the data for this ontology, the establishment of the data-dictionary besides change perspectives that store the concepts that are associated to other concepts by relationships resulting from the application of the data-mining process, the evaluation of the domain ontology using a graph structure and finally the construction of the ontology using a set of procedures. • Third, we propose to develop a predicted schema which is a database schema that contains two types of metadata. The metadata, that represents current entities of the database and the metadata that represents potential future entities. The predicted schema has two methods of representation: the multi-representation strategy and the predicted design repository schemas strategy. The schemas with the multi-representation strategy are particular versions of schemas whereas the schemas with the predicted design repository schemas are particular repositories of schemas. • Finally, we propose a case study and an analysis of the approach. The case study to better understand how this approach works and the analysis to show its feasibility, the positive as well as negative aspects.