liukidar's comments

liukidar · on Nov 18, 2024

LLMs are only used to construct the graph, to navigate it we use an algorithmic approach. As of now, what we do is very similar to HippoRAG (https://github.com/OSU-NLP-Group/HippoRAG), their paper can give a good overview on how things are working under the hood!

liukidar · on Nov 18, 2024

Thanks for sharing! These are all very helpful insights! We'll keep this in mind :)

liukidar · on Nov 18, 2024

We are building connectors for that, so it will soon :) At the moment we are using python-igraph (which does everything locally) as we wanted to offer something as ready to use as possible.

ignomaniac · on Nov 18, 2024

I'd like to partner to see if a connector to a graph db can be mutually beneficial and provide some value to users. How do I reach out ? NOTE: Im not from Neo4j

liukidar · on Nov 18, 2024

That would be awesome, we have a discord you can join and we can talk there (link is in the github repo, message Antonio) or you can message antonio [at] circlemind.com

onel · on Nov 19, 2024

Note, the domain is circlemind.co

liukidar · on Nov 18, 2024

This is super interesting! Thanks for sharing. Here we are talking of graphs in the milions nodes/edges, so efficiency is not that big of a deal, since anyway things are gonna be parsed by a LLM to craft an asnwer which will always be the bottleneck. Indeed PageRank is the first step, but we would be happy to test more accurate alternatives. Importantly, we are using personalized pagerank here, meaning we give specific intial weights to a set (potentially quite large) of nodes, would TC support that (as well as giving weight to edges, since we are also looking into that)?

michelpp · on Nov 19, 2024

> Here we are talking of graphs in the milions nodes/edges,

That ought to be enough for anybody.

> would TC support that

TC is a purely structural algorithm, it counts triangles so it doesn't take any weights into consideration, but it does return a vector of normalized ranking from 0.0 to 1.0, which you could combine with an existing biasing strategy to boost results that have strong centrality.

lmeyerov · on Nov 19, 2024

Hah indeed, we are doing billion-scale real-time graph rag in louie.ai for fairly regular tasks, so your sentiment resonates ;-)

For something like uploading a big folder of documents, agree with the OP, pretty straightforward, naive in-memory with out-of-the-box embeddings, LLMs, retrieval, and untuned DBs goes far. I expect most vector-supporting dbaas and LLMaaS to be offering in the new year. OpenAI, Claude, and friends are already going in this direction, leaving the rag techniques opaque for now.

(Something folks may not appreciate, and I think is important about what's being done here, is the incremental update aspect.)

liukidar · on Nov 18, 2024

We have tried from small novels to full documentations of some milion tokens and both seem to create interesting graphs, it would be great to hear some feedback as more people start using it :)

liukidar · on Nov 18, 2024

Hey! Our todo list is a bit swamped with things right now, but we'll try to have a look at that as soon as possible. On the Ollama github I found contrasting information: https://github.com/ollama/ollama/issues/2416 and https://github.com/ollama/ollama/pull/2925 They also suggest to look at this: https://github.com/severian42/GraphRAG-Local-UI/blob/main/em...

Hope this can help!

liukidar · on Nov 18, 2024

It is to mark the package as private (in the sense that for normal usage you shouldn't need it). We are still writing the documentation on how to customize every little bit of the graph construction and querying pipeline, once that is ready we will expose the right tools (and files) for all of that :) For now just go with `from fast_graphrag import GraphRAG` and you should be good to go :)

liukidar · on Nov 18, 2024

The graph is currently stored using python-igraph. The codebase is designed such that it is easy to integrate any graphdb by writing a light wrapper around it (we will provide support to stuff like neo4j in the near future). We haven't tried triplex since we saw that gpt4o-mini is fast and precise enough for now (and we use it not only for extraction of entities and relationships, but also to get descriptions and resolve conflicts), but for sure with fine tuning results should improve. The graph is queried by finding an initial set of nodes that are relevant to a given query and then running personalized pageranking from those nodes to find other relevant passages. Currently, we select the inital nodes with semantic search both on the whole query and entities extracted from it, but we are planning for other exciting additions to this method :)

katelatte · on Nov 19, 2024

Suggestion: check out Memgraph for graph db storage - https://memgraph.com/. I work at Memgraph as DX Engineer so feel free to ping me in case you have questions about it: https://memgraph.com/office-hours

Your solution looks interesting and I would love to hear more about it. I haven't seen that many PageRank-based graph exploration tools.

liukidar · on Nov 18, 2024

Exactly! Also PageRank is used to navigate the graph and find "missing links" between the concepts selected from the query using semantic search via LLMs (so to be able to find information to answer questions that require multi-hop or complex reasoning in one go).