ChatGPT was never designed for legal research (and here is why it should NOT be used that way)
Posted on
Even though ChatGPT has some pretty clear disclaimers about factual accuracy, two New York City attorneys and their law firm gained a lot of public attention for claiming that the bogus cases cited in their federal court filing were the result of (wrongly) using ChatGPT as a legal-research database.
Federal District Court Judge P. Kevin Castel has scheduled a noon show cause hearing for Thursday, June 8. (You can monitor the case docket at this link.)
As a part of attorneys’ ongoing duty of tech competence, there are (at least) two reasons we need to appreciate why ChatGPT’s current configuration should not be used for legal research.
As a preliminary matter, while the public does not know what prompts were originally used by the attorney, “prompt quality” matters. Even ChatGPT was able to flag its legal research shortcomings when I asked it to explain a recent Supreme Court of the United States opinion [“Explain to me the May 2023 United States Supreme Court decision in ANDY WARHOL FOUNDATION FOR VISUAL ARTS, INC. v. GOLDSMITH like I am a first year law student. Share the sources and url source links in your answer.”]:
Even so, here are the two reasons showing why the current free version of ChatGPT cannot be used for legal research:
Reason 1: ChatGPT has not been trained on current data and does not have access to current data
ChatGPT’s training data is current up to June 2021 (some say it’s September 2021) and the free ChatGPT version does not have internet access. Nothing else needs to be said, right?
N.b., ChatGPT’s lack of internet access will not always be the situation, however. In late March 2023, OpenAI launched ChatGPT plugins that give it access to third-party knowledge sources and databases, including the internet. One of the first-party web-browsing plugins “retrieves content from the web using the Bing search API and shows any websites it visited in crafting an answer, citing its sources in ChatGPT’s responses.” Kyle Wiggers, OpenAI connects ChatGPT to the internet, TechCrunch (March 23, 2023).
But, for the general masses, we are not there yet.
Reason 2: ChatGPT is designed to be a text or next-word language predictor (not a legal-research database)
ChatGPT has been developed using (1) information that is publicly available on the internet, (2) information it licenses from third parties, and (3) information that its users or human trainers provide.
And then what does ChatGPT do? OpenAI explains (my emphasis added):
ChatGPT has been developed in a way that allows it to understand and respond to user questions and instructions. It does this by “reading” a large amount of existing text and learning how words tend to appear in context with other words. It then uses what it has learned to predict the next most likely word that might appear in response to a user request, and each subsequent word after that. This is similar to auto-complete capabilities on search engines, smartphones, and email programs. * * * Because there are many possible words that could come next in this sentence (e.g., instead of turning left, she turned “right,” “around,” or “back”), there is an element of randomness in the way a model can respond, and in many cases our models will answer the same question in different ways.
In other words, ChatGPT tries to predict what the next words should be in response to a prompt.
ChatGPT’s “prediction” design explains why sometimes it seems like ChatGPT or other AI models are “making up” facts. And it likely explains why I was unable to re-create the “bogus” case references the New York lawyers claimed ChatGPT gave them.
We are working with “token simulators” and ChatGPT doesn’t always know what it doesn’t know. Or sometimes it gets unlucky in its sampling (and, as this clip explains, you have to prompt ChatGPT to check itself):
Google has posted its own helpful video and this part describes what response “hallucinations” are and their causes. The below video explains how “hallucinations” (that give wrong or misleading information) can be caused by:
The model was not trained on enough data.
The model was trained on noisy or dirty data.
The model was not given enough context (through the prompt).
The model was not given enough constraints.
Bottom line (for now?)
ChatGPT is exceptionally suited for plain-language editing. (Words, words, words!)
My earlier independent testing shows how ChatGPT is better than many traditional word-processing editing tools on the plain-language front. ChatGPT only missed 76 plain-writing concerns from the 641 test strings.
WordRake (with “brevity” and “simplicity” selected)
But the current (free) version of ChatGPT is not—at all—designed for legal research. (Nor has it ever claimed to be.)
Consider ChatGPT’s response to my prompt asking “How does ChatGPT analyze court decision analysis? Ask me questions if you need more information.” ChatGPT’s response (with my emphasis):
As an AI language model, ChatGPT does not have direct access to real-time data or specific training on court decisions. My responses are based on the text data I have been trained on, which includes a wide range of publicly available sources such as books, articles, websites, and other textual data up until September 2021. Therefore, while I can provide general information and insights on legal topics, I cannot perform real-time court decision analysis or provide specific analysis of recent cases.
When answering questions related to court decision analysis, I rely on my understanding of legal concepts, principles, and general knowledge within the field. I can provide information on legal processes, definitions, and general perspectives, but it’s important to consult qualified legal professionals or specialized legal research databases for accurate and up-to-date analysis of court decisions.
If you have a specific question about a legal topic or would like more general information, feel free to ask, and I’ll do my best to assist you based on the knowledge I have been trained on.
Like Professor Cat Moon recently reminded me, there is value in understanding when a problem is because of the user, and not the tech.