I'm unclear on what the "flaw" is - isn't this precisely the "feature" that search engines provide to both sides and that site owners put a ton of SEO effort into optimizing?
If you have public documents, you can obviously let a public search engine index them and show previews. All is good.
If you have private documents, you can't let a public search engine index and show previews of those private documents. Even if you add an authentication wall for normal users if they try to open the document directly. They could still see part of the document in google's preview.
My explanation sounds silly because surely nobody is that dumb, but this is exactly what they have done. They gave access to ALL documents, both public and private, to an AI, and then got surprised when the AI leaked some private document details. They thought they were safe because users would be faced with an authentication wall if they tried to open the document directly. But that doesn't help if copilot simply tells you all the secret in it's own words.
You say that, but it happens — "Experts Exchange", for example, certainly used to try to hide the answers from users who hadn't paid while encouraging search engines to index them.
That's not quite the same. Experts Exchange wanted the content publicly searchable, and explicitly allowed search engines to index it. In this case, many customers probably aren't aware that there is a separate search index that contains much of the data in their private documents that may be searchable and accessible by entities that otherwise shouldn't have access.