Checkout https://text-generator.io for embeddings for search, they are better because they are cheaper/faster/take into account linked images (actually embed text images and code in the same space)
Your training trick is neat/great innovation but also keep in mind it is likely overfitting meaning when you get a bit of new data you need to index and search that model isn't going to do well at all at embedding it, said differently that training works well if you can cover the types of data you're going to see in production really well at training time. If not there's a big accuracy drop for unseen data due to overfitting
I'm clearly not versed in AI but at least two of their examples are quite obviously wrong. Their Study Notes example asks for five key facts about Ancient Rome -- one of them is borderline incorrect and only applies to the Roman Empire, the other one is an overgeneralization, and they're two, not five. The "Receipts" example gets the total sum wrong.
https://text-generator.io/blog/embed-images-text-and-code
Your training trick is neat/great innovation but also keep in mind it is likely overfitting meaning when you get a bit of new data you need to index and search that model isn't going to do well at all at embedding it, said differently that training works well if you can cover the types of data you're going to see in production really well at training time. If not there's a big accuracy drop for unseen data due to overfitting