A.I. Ain't Here for Only Your Lunch
Eventually the people who seek to employ it will want your dinner too.
(No. 27, a ±08 minute read)
Where do we go from here?
One of the things I am most terrible at is touch-typing. I’m no two-finger hack pounding away on deadline in a 1950s noir, but I am not much better. Really, I can’t type to save my life. (My spelling is worse.) As such, transcribing interviews is hell. Fortunately I live in the age of machine transcription. It is an incredible time saver; drop an audio file on screen into the transcription engine of your choice and not long after, there it is — a more or less well-written record of an audio interview.
But there are limitations. I am pursuing a story that requires exacting data security and needed to find a transcription service that worked offline, outside the “cloud,” and entirely on on a computer not connected to the internet. I found MacWhisper and I can’t recommend it highly enough. Jordi Bruin, the software’s developer, offers journalists half-off and has been great answering questions I had about the application’s security. My first interview fed through it was rendered almost perfectly by MacWhisper’s mid-level language model; an hour of tape translated to text in less than two minutes, even with terrible industrial background noise.
Technology — it’s a dream, it alleviates toil, virtuously and without guilt! With people like Jordi selling his transcription application so that it is individually owned — not a monthly rent in sight1 — and for half-sticker price to those of us working in an industry in extremis, I can almost squint and believe the dream. We’ve come a long way from painting in caves. But on Saturday the New York Times dropped the other shoe. Whisper, the speech recognition software that Bruin based his app on, was created because OpenAI had run out of data to train its A.I. engine.
A.I. models are insatiable data hoovers, and this is a problem for all of us.
The Times reports that OpenAI sicced its newly created Whisper on YouTube videos, in plain disregard for both the site’s terms of use and, maybe, probably, video creators’ copyrights. In January Reuters called 2024 the year copyright law changes A.I.2 The letter-of-the-law verdict is out as to whether or not scraping the internet for published work to train A.I. models violates existing copyright law, the cases that will decide this will be heard this year. In the meantime the ethics of it all seem plain.
As tech companies gain more and more power, intruding more and more deeply into our lives, ethical decision-making becomes harder than it long has been. But, even throwing up our hands in helpless exasperation, again, as society and the law try once more to play catch-up to the tech industry and its stated goal of moving fast and breaking things, it’s worth digging into the Times’ reporting. It reveals a pernicous concept that tech giant Meta used to forge on, full steam ahead, to solve its data woes, even as there was internal acknowledgement that what OpenAI had done with Whisper was not likely legal.
While understanding that OpenAI used copyrighted materials without permission to train its models, including those that became part of ChatGPT, Meta decided that it could too, citing ChatGPT’s “market precedent.” Market precedent as justification to do something that in-house discussions acknowledged was likely illegal, would open them to copyright lawsuits, and raised ethical concerns — per the Times. So, if this is a thing, this “market precedent” for illegal acts, it’s damned good that Enron was nailed to the wall for blacking out California while gaming the energy distribution markets for profit. It’d be pretty dark around these United States these days had they not been.
As a South of Market resident through the first and second tech gold rushes (and dealing with Enron blackouts) I saw more than my share of arrogance exhibited by the young masters of the new universe that had descended on my wider neighborhood. One of the most galling was learning that a genius research team at Stanford used the live feed at Brainwash, a local café I ate at regularly, to train an A.I. model to isolate figures in crowds. I can’t say for certain that I bought my usual lunch or had dinner and a beer as I was wont to do at Brainwash on days that Stanford researchers were pulling down the Brainwash webcam feed that contributed to further research linked to human rights abuses in China and military and law enforcement surveillance technology in the U.S. But there was a much better than zero chance of it.
Seems like exactly the sort of endeavor I’d be inclined to lend my visage to, doesn’t it? If I learned one thing in San Francisco through the dot com booms, it was that many of the people that had come to the city to get tech-rich would take any advantage, no hubris too big — real frontiersmen. I say this both from the direct experience of working for dot-commers as a contractor or architect and the experience of living cheek by jowl with them. This was not a group that was going to stand on principal or ethics, by and large, in the face of seeking their millions.
And the Times’ reporting bears this out. “Market precedent” anyone? I certainly did not provide consent for training human-recognition A.I. while patronizing Brainwash. And the Stanford team wasn’t alone using publicly available photos to train A.I. systems — Microsoft published a training data set in 2016 that included journalists, privacy researchers, and political activists.3 The Stanford researchers, as university scholars, seem to have skirted university research policies that consider research participants’ consent and data access.4
The last galling thing that I’ll bitch about that the Times revealed about A.I. training is that, according to A.I. researchers, “the most prized data is high-quality information, such as published books and articles, which have been carefully written and edited by professionals.” Here, at what some days seems like the end of journalism, we journalists are tying our own nooses by merely doing our jobs and writing and editing well. The tech-driven race to the bottom journalism is fighting doesn’t pay much, I can tell you that; let’s train our robot overlords to kill our vocation.
And so that you don’t think, “I’m not a journalist, writer, actor, or whatever A.I. is coming after,” put yourselves in the shoes of the GM workers who saw Robot being installed. The first industrial robot was tasked to remove castings from a press in 1961 at a GM plant in Trenton, NJ.5 By 1987 job loss to automation was outpacing the creation of new, and similarly skilled jobs.6 Does your work involve something that might face a similar vulnerability over a period of decades? As A.I. gets smarter — more capable — just as industrial robots did?
On the Fourth of July last year Google announced a change in Google Docs’ terms of use to allow “publicly available” documents hosted on the service to be used to train its A.I. engine. This was part of a response to increase Google’s available A.I. training resources in the face of its training data running dry. That means any Google Doc publicly available online hosted on a website or social media is fair game. What happens when Google falls behind OpenAI again, as already once motivated its terms of use change? What goalposts move next? After that? The contemporary digital ecosystem we can’t help but live in provides the tools to teach opaque, privately owned artificial intelligences the way forward to replace people at work, as assembly line hands on an auto line were once replaced. Detroit did great.
Capitalism builds slippery slopes on the way to profit. Eventually all of us are affected, it took industrial robots 18 years to outright kill someone (and a second, two years later), and 26 years to start displacing labor. How long will A.I. take to do either? Start the clock in 2023 and let’s see…. Or we can do as recommended in the Harvard Business Review and demand A.I. developers,
“…ensure that they are in compliance with the law in regards to their acquisition of data being used to train their models. This should involve licensing and compensating those individuals who own the IP [intellectual property] that developers seek to add to their training data, whether by licensing it or sharing in revenue generated by the AI tool.”
That would certainly slow this all down, and perhaps provide an opportunity to consider what it is that is being done, take it out of the realm of a tech bro fait accompli. The Harvard Business Review also recommends a corker — that we ask developers if their A.I. tools were developed ethically, without using protected content, and if not, refuse to use them. Presumably this ethical development is clocked from a certain point in the recent past or near future, certainly it appears not to apply to the models we have today.
And where does that leave me and Jordi Bruin of MacWhisper now, in this moment? Me, like so many journalists, trying to make a tough job a little easier, and no less accurate, and Bruin, who identified a demand in a struggling trade and gave it tangible, needed aid? Aid altruistically given; by definition an ethical act?
Welcome to the world of damned if you do, damned if you don’t ethics. We live even further up the road now, when does it dead end?
A post script, from the world of much graver consequences: It has been reported that Israel is using A.I. to develop very specific targeting lists for its military to use in targeted assassination bombings. These bombings are particularly opprobrious, as those listed for assassination are considered to be targets of such value that Israel is allowing itself substantial collateral deaths and injuries in seeking to kill them; launching attacks at people’s homes and killing entire families, and, presumably, neighbors in Gaza. These A.I.-developed lists are getting a notable number of their identifications wrong and the Israeli military is killing people based on that incorrect data. It is also worth reading the interview with the reporter of the story, Yuval Abraham, the winner of this year’s documentary Academy Award. The A.I. future, it’s here, and it is truly miserable.
I’m looking at you Adobe.
https://www.reuters.com/legal/litigation/how-copyright-law-could-threaten-ai-industry-2024-2024-01-02/ Thomson Rueters, the owner of Rueters news is among those taking A.I. companies to task for training artificail intellegence models on its intellectual property.
Nicely done but no reference to the Wachowskis work? 😜 There’s may be not too crazy a vision