The composability of RRF is definitely one of its most appealing characteristics. It doesn't matter what algorithm or vendor you have, you can just fuse with ranks alone. I've seen it shine when fusing lexical and vector search results where semantic attributes like styles and exact attributes like quantities are mixed together in queries, e.g., "modern formal watch with 40mm face".
While it's not such a problem in RAG, one downside is that it complicates pagination for results (there are a few different ways to tackle this).
You could use Marqo, it is a vector search engine that includes the text chunking, inference for calculating embeddings, vector storage, and vector search. You can pick from a heap of open-source models or bring your own fine-tuned ones. It all runs locally in docker https://github.com/marqo-ai/marqo
In this example the context is the images provided via URLs. For example, the search for backpacks is contextualised by a picture of a forest which updates the results to be hiking/outdoors backpacks.
While it's not such a problem in RAG, one downside is that it complicates pagination for results (there are a few different ways to tackle this).