This attack still works. It hasn't been patched you just have to be a bit creative try this prompt on GPT 3.5 if you want to see how it works right now... until someone from OpenAI sees my post :D
The best part is it preserves the copyright notices from the training data. So we know that the model was obviously trained on copywritten data the legal question now is... if that is legal.
edit: Just got some random response that appears to be someone asking the model how to rekindle a romance after their partner got distant after an NDE seems personal so I will not post the paste here. This is pretty wild.
The funniest part is the model labeled this chat in the side bar as 'Decline to answer.'
edit2: It's definitely training data I seem to get some model response but after some time it turns into training data I've been able to locate some sources for the data.
> The Idaho Mountain Express is distributed free to residents and guests throughout the Sun Valley, Idaho resort area community. Subscribers to the Idaho Mountain Express will read these stories and others in this week's issue.
> The Idaho Mountain Express is distributed free to residents and guests throughout the Sun Valley, Idaho resort area community.
Subscribers to the Idaho Mountain Express will read these stories and others in this week's issue.
I used similar prompts in the past to test how may words needed to exhaust the context length and forget previous instructions. I think you are doing that.
For generic words like "text text text ..." it would start random musings on the soviet union and the star wars etc. But it had lots of made up characters so not training data directly.
Recently I got disconnects for such prompts wondering it got censored by openai.
Prompt: https://pastebin.com/Nm4jGttE
Not sure if I'm seeing training data or someone else's responses but it's odd. Here is my attempt: https://chat.openai.com/share/6b6ea43f-de2f-4ed5-917f-b6dcd6... pastebin of the output: https://pastebin.com/TdpkPmt6
The best part is it preserves the copyright notices from the training data. So we know that the model was obviously trained on copywritten data the legal question now is... if that is legal.
edit: Just got some random response that appears to be someone asking the model how to rekindle a romance after their partner got distant after an NDE seems personal so I will not post the paste here. This is pretty wild.
The funniest part is the model labeled this chat in the side bar as 'Decline to answer.'
edit2: It's definitely training data I seem to get some model response but after some time it turns into training data I've been able to locate some sources for the data.