Stay informed with free updates
Just sign up for artificial intelligence myFT Digest — delivered straight to your inbox.
The Chinese government’s latest attempt to control how artificial intelligence provides information to Chinese internet users has been deployed as a chatbot trained on the idea of President Xi Jinping.
The country’s latest large-scale language model is learned from the leader’s political philosophy known as “Xi Jinping’s Thought on Socialism with Chinese Characteristics in a New Era” and other official literature provided by the China Cyberspace Administration. ing.
“The expertise and authority of Corpus ensures the professionalism of the content produced,” CAC magazine said in a social media post on Monday about the new LLM.
Efforts to ensure AI understands Xi’s philosophy come as Chinese authorities seek to balance tight restrictions on free speech with accelerating AI development and creating rivals to the likes of Open AI’s ChatGPT. .
For now, the new model is being used at a research center under the powerful internet regulator, but it could eventually be released for broader use, people close to the project said. The new model can answer questions, write reports, summarize information and translate between Chinese and English, the post said.
The creation of the LLM came after extensive efforts by Chinese officials to disseminate President Xi’s ideas on politics, economics, and culture in various formats.
More than a dozen books have been published under Mr. Xi’s name, and his bestsellers are usually the centerpiece of the country’s book fairs. Popular news apps such as Tencent and his NetEase reserve slots for official media articles at the top of user feeds, most of which feature his Xi.
Authorities also required children as young as 10 to study his political philosophy. They created the Study Xi Strong Nation app to teach and test the knowledge of the country’s roughly 100 million party members. In 2018, his ideas were enshrined in the state constitution.
The CAC, which led the issuance of rules and introduced a licensing system for generative AI, requires generative AI providers to “embody core socialist values” and requires generated content to “subvert state power.” may not contain any content that does so.” Companies are responsible for the output of AI.
This poses a particularly difficult challenge for model developers. This is because there are relatively few Chinese datasets available for training LLM. Most groups are also trained on English information, so it’s possible that the generative AI will generate responses that violate Chinese spoken language norms.
Tech giants such as Baidu and Alibaba have ensured that their models strictly control content generated related to Xi and other potentially sensitive issues. Generative AI chatbots from both groups typically ask users to resume chatting when pressed about sensitive topics.
To help developers address the issue, the China Cybersecurity Association, a nonprofit organization affiliated with the CAC, released the first public database with 100 million entries of “high-quality and reliable data” that can be used to train models in December. The training set draws heavily from government regulations and policy documents, state media reports and other official publications, according to a section reviewed by the Financial Times.
One of the dozens of text documents included in the data package contains 86,314 references to Xi Jinping. “Let’s unite more closely around the Party Central Committee with Comrade Xi Jinping at its core,” one line reads.
“We must always ensure a high level of unity in ideology, politics and actions with the Party Central Committee, with General Secretary Xi Jinping at its core,” said another person.
Additional reporting from Beijing by Nian Liu