The digital transcripts
Against the backdrop of the continuing novel COVID-19 pandemic, the Michigan Supreme Court will soon hear oral arguments of first impression. How much authority Michigan’s governor can exercise under the Emergency Powers of the Governor Act or the Emergency Management Act is the focus of In re Certified Questions from the United States District Court, Western District of Michigan, Southern Division.
This post is a detour that noodles about four of the small-font footnotes in the plaintiffs’ opening brief. They are footnotes 1, 3, 4, and 5, and are included in the brief’s Index of Authorities’ “Other Authorities” section.
Each sounds official: “Michigan Governor Gretchen Whitmer Coronavirus Briefing Transcript” and includes the public briefing date.
But no “transcripts” were filed with the Michigan Supreme Court.
Where are the footnoted transcripts accessible?
In an online cloud. Hosted by a private entity. Commercial ads for the private company’s digital transcription service frame the transcript pages. I’m not kidding.
Each footnote includes a looooong internet link to an .html page. But none of the clicked-on digital transcripts include verification that the transcripts are complete, true, and correct.
Forbes profiled how this “San Francisco- and Austin-based company uses AI to facilitate the transcription process, though an army of freelancers does much of the legwork.” More than 50,000 freelancing “professional transcriptionists” shape out the digital team, according to the company’s website. While the company charges $1.25 per audio minute, Forbes reported that the freelance transcribers receive 30 cents per audio minute.
What about “word-error rates”?
The company’s pricing is very low. How’s the accuracy?
In its “Speech to text Report for 2020” the AI-powered company lists “court reporting” as a common use for automatic speech recognition (slide 24).
That same report celebrates that its AI-component has a 15.7% word-error rate (WER) (slide 28).
Should an admitted 15.7% AI-error rate be significant?
AI cannot be presumed to be reliable.
Three possible headaches
Maybe the AI technology and reliability will improve. But for today’s judiciary, I can think of three problems flowing from hyperlinked, uncertified digital “transcripts.”
Headache 1: Reader misunderstanding of what the “transcripts” represent.
These particular transcripts are not an “official” copy or representation of the governor’s briefings. The video is not sourced from the State of Michigan. The transcripts are not from or prepared at the direction of the State of Michigan. And the “transcripts” are not prepared by a certified court reporter.
A reading audience generally assumes that “a transcript or report of proceedings is an official copy of the recorded proceedings in a trial or hearing.” See Garner’s Dictionary of Legal Usage (Oxford University Press 2011).
This is why the Michigan Appellate Opinion Manual encourages using “official sources and actual documents” (Section 2:4). And practitioners are encouraged to follow the Michigan Appellate Opinion Manual when briefing (Michigan Supreme Court Administrative Order 2014-22).
What this private company promotes online is not similar to the video and transcripts available on the New York state website for its governor https://www.governor.ny.gov/news/video-audio-photos-rush-transcript-governor-cuomo-time-covid-19-pandemic-our-healthcare-workers [https://perma.cc/DBK8-LZDZ] or the transcript-like remarks posted on the White House website https://www.whitehouse.gov/briefings-statements/remarks-president-trump-press-briefing-081320/ [https://perma.cc/A3FC-FHCS]
There should be a greater effort to avoid the risk of public confusion and misunderstanding.
Headache 2: Transcribed content presented in a judicial setting cannot be presumed as reliable when it’s not prepared by a certified court reporter.
There is a process, testing, and ongoing obligations for someone to be authorized to prepare court or deposition transcripts.
On top of all of that, a properly prepared transcript always concludes with a certification that what was prepared is “complete, true, and correct.” Misconduct can end with one’s certification being revoked.
It should not have to be said. But maybe it does. Each layer of institutional reliability dissolves when a transcript is not certified.
If the 15.7% word-error rate isn’t alarming, perhaps paragraph 11 in the company’s “Terms of Service” is. Paraphrased, the warranty disclaimers include:
- All services are provided “as is” and “with all faults.”
- Each party disclaims any warranty related to the quality, accuracy, currency, or completeness of the platform.
- Any warranty that the provided services will meet the customer’s or end users’ requirements is disclaimed.
- Customer acknowledges and agrees that customer is solely responsible for verifying the accuracy and completeness of all results before taking or omitting any action based on the results.
- Customer’s sole and exclusive remedy for breach of the service commitments described on the site will be the re-performance of the applicable services.
Headache 3: “Reference rot” and “link rot” risks.
If a website’s content can change at any time, this means that there’s a fair chance that what a web page displays in September can differ from what was posted and footnoted back in May.
Here’s an example of just that from the plaintiffs’ July 21, 2020 appendix filing with the Michigan Supreme Court. The appendix appropriately includes the plaintiffs’ May 12, 2020 federal court complaint. That complaint includes a table summarizing CDC data as of May 10, 2020 and footnote 2 identifies the data source. Footnote 2 includes a hyperlink that was last visited on May 12, 2020. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html So far, so good.
Here’s the real-life problem: There’s no way for today’s reading audience to verify the hyperlinked earlier date-in-time referenced data—because the CDC keeps updating that web page’s content.
Anyone who now clicks on that footnoted hyperlink will see a webpage displaying “current” data. That’s different from the complaint’s cited “May 12, 2020” data.
Take it another step. It’s not a stretch to imagine how a third-party’s uncertified, online transcript could also be altered at any time.
This problem is called “reference rot.” The legal community has known about the reference rot and link rot problems for many years.
The best practices to avoid those rot risks in your briefing include:
- Use perma.cc to preserve internet pages and links. The Michigan Supreme Court does this in its opinions and orders (example: https://perma.cc/2TPJ-3UM3).
- Print/pdf the webpage and attach the demonstrative page to the court filing to preserve the content.
I am not suggesting that the uncertified, non-official transcripts linked in the plaintiffs’ opening brief in the pending Michigan Supreme Court matter include erroneous content. (Using error-filled content on a web page that includes virtual coupons to attract future paying customers would be a terrible marketing plan!)
But there are sound reasons why uncertified and non-official internet transcripts probably shouldn’t be cited as “authority” in day-to-day legal advocacy and why they may get quickly tagged as non-record evidence.
Important addendum about racial disparities in transcript preparation and automated speech recognition
Governor Gretchen Whitmer, who is white, is the primary speaker in the unofficial transcripts linked in the Michigan Supreme Court briefing. Independent research shows that her race as a white person reduces the risk and frequency for transcription errors.
Importantly—when the recorded speaker is non-white—those who read and consider transcripts must be alert to the serious problems identified in two other research areas:
“Testifying while black: An experimental study of court reporter accuracy in transcription of African American English,” Taylor Jones, Jessica Rose Kalbfeld, Ryan Hancock, and Robin Clark. Language 95, no. 2 (2019): e216-e252. doi:10.1353/lan.2019.0042 (behind a paywall)
“Racial disparities in automated speech recognition,”Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. Proceedings of the National Academy of Sciences Apr 2020, 117 (14) 7684-7689; DOI: 10.1073/pnas.1915768117 (open access)
In the spirit of the tired lawyer cliché, let’s govern ourselves accordingly.