Home Technology How AI and Wikipedia have sent vulnerable languages into a doom spiral

How AI and Wikipedia have sent vulnerable languages into a doom spiral

by admin

Iwuala, who now works as a professional translator between English and Igbo, said the users doing the most damage are inexperienced and see AI translations as a way to quickly increase the profile of the Igbo Wikipedia. She often finds herself having to explain at online edit-a-thons she organizes, or over email to various error-prone editors, that the results can be the exact opposite, pushing users away: “You will be discouraged and you will no longer want to visit this place. You will just abandon it and go back to the English Wikipedia.”  

These fears are echoed by Noah Ha‘alilio Solomon, an assistant professor of Hawaiian language at the University of Hawai‘i. He reports that some 35% of words on some pages in the Hawaiian Wikipedia are incomprehensible. “If this is the Hawaiian that is going to exist online, then it will do more harm than anything else,” he says. 

Hawaiian, which was teetering on the verge of extinction several decades ago, has been undergoing a recovery effort led by Indigenous activists and academics. Seeing such poor Hawaiian on such a widely used platform as Wikipedia is upsetting to Ha‘alilio Solomon. 

“It is painful, because it reminds us of all the times that our culture and language has been appropriated,” he says. “We have been fighting tooth and nail in an uphill climb for language revitalization. There is nothing easy about that, and this can add extra impediments. People are going to think that this is an accurate representation of the Hawaiian language.” 

The consequences of all these Wikipedia errors can quickly become clear. AI translators that have undoubtedly ingested these pages in their training data are now assisting in the production, for instance, of error-strewn AI-generated books aimed at learners of languages as diverse as Inuktitut and Cree, Indigenous languages spoken in Canada, and Manx, a small Celtic language spoken on the Isle of Man. Many of these have been popping up for sale on Amazon. “It was just complete nonsense,” says Richard Compton, a linguist at the University of Quebec in Montreal, of a volume he reviewed that had purported to be an introductory phrasebook for Inuktitut. 

Rather than making minority languages more accessible, AI is now creating an ever expanding minefield for students and speakers of those languages to navigate. “It is a slap in the face,” Compton says. He worries that younger generations in Canada, hoping to learn languages in communities that have fought uphill battles against discrimination to pass on their heritage, might turn to online tools such as ChatGPT or phrasebooks on Amazon and simply make matters worse. “It is fraud,” he says.

A race against time

According to UNESCO, a language is declared extinct every two weeks. But whether the Wikimedia Foundation, which runs Wikipedia, has an obligation to the languages used on its platform is an open question. When I spoke to Runa Bhattacharjee, a senior director at the foundation, she said that it was up to the individual communities to make decisions about what content they wanted to exist on their Wikipedia. “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” she said. Usually, Bhattacharjee added, editions were considered for closure only if a specific complaint was raised about them. 

But if there is no active community, how can an edition be fixed or even have a complaint raised? 

#Wikipedia #vulnerable #languages #doom #spiral

You may also like

Leave a Comment