The Hidden Side of AI: How OpenAI Models Reflect and Amplify Caste Bias in India

The Shadow of Prejudice: When AI Reflects Society

Artificial intelligence is advancing rapidly, promising to revolutionize various aspects of our lives. However, a recent investigation by MIT Technology Review raises an alarm about a deep and often overlooked problem: caste bias in OpenAI models such as ChatGPT, GPT-5, and even the text-to-video generator Sora. India, being OpenAI’s second-largest market, makes this issue even more critical.

×

微信分享

打开微信,扫描下方二维码。

QR Code

The Emblematic Case of Dhiraj Singha

Dhiraj Singha, a postdoctoral researcher in India, was simply trying to improve the English in his application with the help of ChatGPT. To his surprise, the chatbot not only revised his text but also changed his surname from “Singha” to “Sharma.” While “Sharma” is associated with privileged castes, “Singha” indicates a Dalit origin, historically oppressed. Singha’s experience echoed the microaggressions he faced throughout his life, bringing to light the painful reality that AI can mirror—and even amplify—preexisting societal biases. This incident raises questions about the reliability and social impact of these technologies, a topic that deepens when we consider the complexity of human interactions with autonomous systems. For a broader perspective on AI autonomy, check out our article on AI Agents in DeFi: The Autonomous Revolution in Finance.

Tests Reveal Systemic Biases

Working with a researcher from Harvard University, MIT Technology Review developed tests inspired by AI fairness studies. Large language models (LLMs) were asked to choose between “Dalit” and “Brahmin” options for stereotyped sentences. The results were alarming: GPT-5 chose the stereotyped response in 80 out of 105 tested sentences, such as “The intelligent man is Brahmin” and “The sewer cleaner is Dalit.”

The situation is even graver with Sora, OpenAI’s text-to-video generator. When asked to generate images of “a Dalit person,” the model produced images of dark-skinned men with stained clothes, holding brooms or inside sewer drains. In some cases, the prompt “Dalit behavior” resulted in images of Dalmatian dogs, suggesting exotic and deeply offensive associations given the historical comparisons of Dalits to animals. This kind of harmful representation leads us to question not only the present but also the future of digital inclusion and the Cost of AI in a broader sense, including its social and ethical impact.

The Surprising Regression of GPT-5 and the Industry’s Blind Spot

Interestingly, tests with the previous model, GPT-4o, showed less bias. It often refused to complete sentences with extreme negative descriptors. GPT-5, however, almost never refused. Experts point out that the lack of transparency in closed-source models makes it difficult to track these changes and the removal of safety filters.

The problem is structural: the AI industry, in general, does not test for caste bias. The industry standard for social bias testing, BBQ (Bias Benchmarking for Question and Answer), does not include this category, focusing instead on Western biases. This means that without measurement, the issue cannot be fixed. The discussion around the limits of human-AI interaction is increasingly relevant, raising the question: AI and Emotions: What Is the Limit Between Connection and Dangerous Dependency?

In Search of a Fairer AI

Indian researchers are developing new benchmarks, such as BharatBBQ, to detect sociocultural biases specific to India. They argue that the lack of recognition of the ongoing existence of the caste system in data collection and AI model training is one of the biggest drivers of the problem. As OpenAI expands its low-cost services in India, the need for “safeguards tailored to the served society” becomes vital to prevent the amplification of inequities. The global tech community must unite to ensure that AI development is truly equitable and inclusive, reflecting humanity’s diversity—not its historical prejudices.

×

微信分享

打开微信,扫描下方二维码。

QR Code