LLMs are only used to construct the graph, to navigate it we use an algorithmic approach. As of now, what we do is very similar to HippoRAG (https://github.com/OSU-NLP-Group/HippoRAG), their paper can give a good overview on how things are working under the hood!
We are building connectors for that, so it will soon :) At the moment we are using python-igraph (which does everything locally) as we wanted to offer something as ready to use as possible.
I'd like to partner to see if a connector to a graph db can be mutually beneficial and provide some value to users. How do I reach out ? NOTE: Im not from Neo4j
That would be awesome, we have a discord you can join and we can talk there (link is in the github repo, message Antonio)
or you can message antonio [at] circlemind.com
This is super interesting! Thanks for sharing. Here we are talking of graphs in the milions nodes/edges, so efficiency is not that big of a deal, since anyway things are gonna be parsed by a LLM to craft an asnwer which will always be the bottleneck. Indeed PageRank is the first step, but we would be happy to test more accurate alternatives. Importantly, we are using personalized pagerank here, meaning we give specific intial weights to a set (potentially quite large) of nodes, would TC support that (as well as giving weight to edges, since we are also looking into that)?
> Here we are talking of graphs in the milions nodes/edges,
That ought to be enough for anybody.
> would TC support that
TC is a purely structural algorithm, it counts triangles so it doesn't take any weights into consideration, but it does return a vector of normalized ranking from 0.0 to 1.0, which you could combine with an existing biasing strategy to boost results that have strong centrality.
Hah indeed, we are doing billion-scale real-time graph rag in louie.ai for fairly regular tasks, so your sentiment resonates ;-)
For something like uploading a big folder of documents, agree with the OP, pretty straightforward, naive in-memory with out-of-the-box embeddings, LLMs, retrieval, and untuned DBs goes far. I expect most vector-supporting dbaas and LLMaaS to be offering in the new year. OpenAI, Claude, and friends are already going in this direction, leaving the rag techniques opaque for now.
(Something folks may not appreciate, and I think is important about what's being done here, is the incremental update aspect.)
We have tried from small novels to full documentations of some milion tokens and both seem to create interesting graphs, it would be great to hear some feedback as more people start using it :)
It is to mark the package as private (in the sense that for normal usage you shouldn't need it). We are still writing the documentation on how to customize every little bit of the graph construction and querying pipeline, once that is ready we will expose the right tools (and files) for all of that :) For now just go with `from fast_graphrag import GraphRAG` and you should be good to go :)
The graph is currently stored using python-igraph. The codebase is designed such that it is easy to integrate any graphdb by writing a light wrapper around it (we will provide support to stuff like neo4j in the near future). We haven't tried triplex since we saw that gpt4o-mini is fast and precise enough for now (and we use it not only for extraction of entities and relationships, but also to get descriptions and resolve conflicts), but for sure with fine tuning results should improve.
The graph is queried by finding an initial set of nodes that are relevant to a given query and then running personalized pageranking from those nodes to find other relevant passages. Currently, we select the inital nodes with semantic search both on the whole query and entities extracted from it, but we are planning for other exciting additions to this method :)
Exactly! Also PageRank is used to navigate the graph and find "missing links" between the concepts selected from the query using semantic search via LLMs (so to be able to find information to answer questions that require multi-hop or complex reasoning in one go).