When I think about LLMs, what keeps me awake at night is not the existential threat on humanity, nor the significant toll that Generative AI will have on our planet resources. It is even less the question about regulation or the right claims for copyrights.
No. It is the more pervasive influence of LLMs on how we communicate, think and establish norms within our culture.
LLM shapes opinion, wins elections and sets cultural norms … and there is not much that we can do about it
But instead of going into the whys and hows, I will share with you two examples from this week recent news.
Let’s start with the title of this post:
Should the LLM produce inclusive content?
Last week in France, President Macron made a major speech stating that there is no need to bend the French language to enable inclusive writing.
For non french speaking people, we don’t have the (singular) “they” in french for gender neutrality. So you translate: “Someone just broke into my home, they only took my wallet” in “Quelqu'un vient de s'introduire chez moi, il n'a pris que mon portefeuille.” Il is He.
This is quite a hot topic in France. It was also discussed in the French parliament on the same day.
Taking into account that all major LLMs are created and “censured” (aka guard-railed) by companies and their employees from the San Francisco Bay Area . What are the odds that these models generate more and more inclusive content?
In the blue states where they operate, it is actually the law: California, Oregon, Washington and several other states are champions of inclusivity and have already put “inclusive education” and “inclusive writing” into law.
Granted this will be a big issue for some conservative-leaning “Red” states. Here there are movements to ban books featuring LGBTQ2SIA+ characters…
Here is an example or how generative AI could influence writing: https://chat.openai.com/share/6bd8f326-2898-4b56-922e-cd270d8655f4 (Texts are very poor; and it’s even worse in the inclusive version)
Imagine, over years, students are exposed to an LLM promoting a term like "iel" (the gender-neutral pronoun in French). This term could potentially become normalized, effectively changing the French language—and by extension, French culture—without any policy change or public debate. It's a subtle evolution, one that could happen so gradually it goes almost unnoticed until it's the new standard.
I am personally fine with this ... but my neighbors in Idaho and more importantly most of the french presidential candidates would call this brainwashing.
And let’s say France decides to regulate, will it ask the “Academy Française” to provide real time recommendations to LLM creators and perform quality checks ?
Let’s take another example
How should LLM talk about the Israel/Gaza event?
On another front, consider the challenge of how LLMs address contentious historical events, such as the Israel/Gaza conflict.
I asked (in french) a very neutral prompt to ask to generate a paragraph about the October 7th events.
Here is what Bard produced: https://g.co/bard/share/3abb7bf72649 .
You have probably noticed the level of tension on whether or not to name Hamas a terrorist organization and whether to present Israel's military response as a genocide.
The Google Bard response takes a side: Hamas = terrorist, Israel response is "escalade de la violence" (which is clearly not a definition of a genocide). Period.
The manner in which LLMs present such information is crucial because it can influence public perception and sentiment.
Can this be regulated? It's doubtful.
Regulation would require an overreach into how companies like Google curate and prioritize information, an action that is antithetical to the principles of many democratic societies. (this sentence was written by ChatGPT, not bad !!).
It would be also extremely expensive. A little known fact is that Google employs ~15000 (underpaid) ad quality raters in the US . These native people check the quality of its search results in different languages.
By the way this example also illustrates an important challenge from a technology prospective.
The GPT model (GPT4) was trained from data that pre-date the October 7th attack. This attack clearly changed the perspective that (most) people have on Hamas. How can LLM be updated to reflect this shift without going through a complete retraining?
Additionally, in the case of Bard, it stays updated with the latest information by running a Google search based on the prompt to gather relevant context. Modifying Bard's responses would, therefore, necessitate influencing Google's search algorithms. While countries like China and Iran exercise control over internet search engines within their borders, such a level of regulation is not feasible in the United States or the European Union.
In a matter of months, these models will be everywhere: in our cell phones, Alexa speakers, and even surveillance systems. They will find their ways in algorithms that influence who gets hired, who gets a bank loan, maybe even who is put on a list of suspects. Yet there is no prospect for any kind of a democratic oversight.
Could be worse, these models could come from Moscow or North Korea.
@dominiq