Building Knowledge Graphs
Structuring research knowlede extraction
Context
I’ve recently explored how literature review could be automated so to produce content answering a single question. So, we’ve utilized Large Language Models (LLMs) among other technologies to develop a knowledge graph that consolidates a vast array of literature on the subject.
The technical process involved in creating the knowledge graph is simple if not proven yet, relying on a variety of tools to parse, structure, and process information from scientific literature. Tools like GROBID, Owlready2, and vector databases were used for initial data preparation, while natural language processing (backed by tools like NLTK, Spacy) and LLMs API (including GPT-3.5 and GPT-4) facilitated the extraction of themes, entity recognition, and text processing. This methodical approach projects data onto a domain ontology and enables its structuring into a knowledge graph, allowing for a comprehensive understanding of key items of the topic.
The project’s outcomes are promising, with the knowledge graph encompassing a vast range of topics, and ultimately leading to the creation of new, robust approaches and content for our topic at hand.
The planned future steps include integrating more robust graph management solutions and enriching the semantic content of the graph, indicating a commitment to evolving the knowledge base to better serve researchers.
Bonus points
On top of this, I’ve played with a couple of techs to present the knowledge graph (with around 60k entities) into something that is more human friendly. So there’s been a lot of simplification, to regroup and consolidate entities, on the one hand, leading to a smaller knowledge graph.
But there’s been equal effort put into visualisation, exploring the use of streamlit as a dynamic knowledge graph explorer, and static pages, wiki like explorations for the more concise knowledge graph.
It hasn’t been that bad - the tools have been reused now on a couple of projects, so I guess there’s some use in them!