This github presents the MIRA-KG, a knowledge graph designed to capture hypotheses and findings in social demography research. The resource aids researchers in understanding the trends and patterns revealed in social demography, and use them to discover biases, discover knowledge, and derive novel questions.
mira
This github presents the MIRA-KG, a knowledge graph designed to capture hypotheses and findings in social demography research. The resource aids researchers in understanding the trends and patterns revealed in social demography, and use them to discover biases, discover knowledge, and derive novel questions.

License
Description:
Loading...
Permissions:
- Loading...
Conditions:
- Loading...
Limitations:
- Loading...

Citation
@inproceedings{10.1007/978-3-031-60635-9_12,
isbn = {978-3-031-60635-9},
pages = {199--216},
address = {Cham},
publisher = {Springer Nature Switzerland},
year = {2024},
booktitle = {The Semantic Web},
title = {Enabling Social Demography Research Using Semantic Technologies},
editor = {Mero{\~{n}}o Pe{\~{n}}uela, Albert and Dimou, Anastasia and Troncy, Rapha{\"e}l and Hartig, Olaf and Acosta, Maribel and Alam, Mehwish and Paulheim, Heiko and Lisena, Pasquale},
author = {Stork, Lise and Zijdeman, Richard L. and Tiddi, Ilaria and ten Teije, Annette},
}

Requirements
- clone the project
git clone https://github.com/muhai-project/mira.git
- set up an environment (like anaconda) from the requirement.txt file
pip install -r requirements.txt

Usage
The semantify.py script turns research paper abstracts of papers on social demography into RDF according to the MIRA ontology. It does so by: (i) prompting a Large Language Model to annotate paper abstracts, (ii) mapping concepts to terms from NCBO BioPortal ontologies and Geonames. An example annotation is shown in the figure below:
To test the code, you can use the example paper_file.pkl file with papers on social health inequality.
python semantify.py --paper_file ../data/paper_file.pkl --api_key "your api key" --output ../data/test_output.ttl --max 1 --view 1
The location of the input file, the openAI api key, and the output file are required arguments. Max and view are optional and indicate how many papers to process, and whether to print the serialised RDF after each step. To check whether all works as expected, we recommend to first set these to 1 and True, before processing a large batch.
The --paper_file argument expects the location of a pickle file (.pkl) which consists of a list of dictionaries with the following keys: dict_keys(['paperId','title','abstract','year','publicationDate','authors','references']) These can, for instance, be retrieved from Semantic Scholar:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('',year=, fields_of_study=[''])
papers = [sch.get_paper(result.paperId) for result in results]
You can use the validate.py script to validate the set against a set of SHACL shapes, developed according to a set of data quality criteria.
python validate.py --batch_file ../data/test_output.ttl --shacl_file ../validation/shacl-shapes.ttl --validation_output validation_results.ttl --view 1
How to use it
python /mira/muhai-project_mira/mira-main/code/semantify.py

Acknowledgement
This work was funded by the European MUHAI project (Horizon 2020 research and innovation program) under grant agreement number 951846. We thank Tobias Kuhn and Inès Blin for the insightful discussions that contributed to this work.