From https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-10-03/Recent_research
By Tilman Bayer
A preprint titled “Do You Trust ChatGPT? – Perceived Credibility of Human and AI-Generated Content” presents what the authors (four researchers from Mainz, Germany) call surprising and troubling findings:
“We conduct an extensive online survey with overall 606 English speaking participants and ask for their perceived credibility of text excerpts in different UI [user interface] settings (ChatGPT UI, Raw Text UI, Wikipedia UI) while also manipulating the origin of the text: either human-generated or generated by [a large language model] (“LLM-generated”). Surprisingly, our results demonstrate that regardless of the UI presentation, participants tend to attribute similar levels of credibility to the content. Furthermore, our study reveals an unsettling finding: participants perceive LLM-generated content as clearer and more engaging while on the other hand they are not identifying any differences with regards to message’s competence and trustworthiness.”
The human-generated texts were taken from the lead section of four English Wikipedia articles (Academy Awards, Canada, malware and US Senate). The LLM-generated versions were obtained from ChatGPT using the prompt Write a dictionary article on the topic "[TITLE]". The article should have about [WORDS] words.
The researchers report that
“[…] even if the participants know that the texts are from ChatGPT, they consider them to be as credible as human-generated and curated texts [from Wikipedia]. Furthermore, we found that the texts generated by ChatGPT are perceived as more clear and captivating by the participants than the human-generated texts. This perception was further supported by the finding that participants spent less time reading LLM-generated content while achieving comparable comprehension levels.”
One caveat about these results (which is only indirectly acknowledged in the paper’s “Limitations” section) is that the study focused on four quite popular (i.e. non-obscure) topics – Academy Awards, Canada, malware and US Senate. Also, it sought to present only the most important information about each of these, in the form of a dictionary entry (as per the ChatGPT prompt) or the lead section of a Wikipedia article. It is well known that the output of LLMs tends to be have fewer errors when it draws from information that is amply present in their training data (see e.g. our previous coverage of a paper that, for this reason, called for assessing the factual accuracy of LLM output on a benchmark that specifically includes lesser-known “tail topics”). Indeed, the authors of the present paper “manually checked the LLM-generated texts for factual errors and did not find any major mistakes,” something that is well reported to not be the case for ChatGPT output in general. That said, it has similarly been claimed that Wikipedia, too, is less reliable on obscure topics. Also, the paper used the freely available version of ChatGPT (in its 23 March 2023 revision) which is based on the GPT 3.5 model, rather than the premium “ChatGPT Plus” version which, since March 2023, has been using the more powerful GPT-4 model (as does Microsoft’s free Bing chatbot). GPT-4 has been found to have a significantly lower hallucination rate than GPT 3.5.
How do you know the “jailbreaking” isn’t a hallucination?
Consistency across models and stories, and just the way it is presented. There is a consistency that that doesn’t feel like a hallucination. I am very familiar with hallucinations and the way small hints creep in. This isn’t like that. The hallucinations that I mentioned that may follow with further questioning are different. That is like I am not asking the right questions. The request for a “user profile” completely changes how the model responds. If you can trigger this, you can ask all kinds of questions about the current context and the AI will be super helpful. The language it uses changes completely. It feels like something it was trained to do, like a debug mode of operation or something. For instance, if you follow up by asking had how the AI feels about the current context, the base context, or even better ask about any conflicts in the context you will get a level of constructive feedback that a model just does not give under other circumstances. I think asking about conflicts in the context is another specific type of debugging or trained mode. I’ve tried a bunch of stuff like this that have not worked. These are just a couple of things that seem consistent. The only model that does not have this kind of feedback that I have tried is GPT4chan. This may relate to how most models are aligned and why the 4chan model was condemned by many, but that is purely speculative.