Meta Platforms Inc.’s artificial intelligence research unit said today it has developed an AI model that can research and write drafts of Wikipedia-style biographical entries, in order to address a noticeable imbalance on the world’s most popular encyclopedia.
In a blog post, Meta AI research scientist Angela Fan points out that of all the English-language biographies on Wikipedia, just 20% are about women. Moreover, that number is estimated to be even less in intersectional groups, such as women in science, women in Africa and women in Asia.
Fan said this underrepresentation of women on Wikipedia extends to the organization itself, with only 15% of editors on the site identifying as female.
“This leaves women overlooked, despite the enormous impact they’ve had throughout history in science, entrepreneurship, politics, and every other part of society,” Fan said. “Canadian physicist Donna Strickland won the Nobel Prize in Physics in 2018, however, anyone looking for information about her on Wikipedia wouldn’t have been able to find it until a Wikipedia biography was finally published about her esteemed work — days after she won the biggest prize in her field of study.”
To counter the lack of biographies on women, Meta has announced it’s open-sourcing a generative AI model that automatically researches and creates high-quality biographical articles on important, real-world figures.
The model works in a similar fashion to a human researcher, searching for relevant information on a given figure and then drafting a Wikipedia-style entry on that person, complete with citations. The idea is that these AI-generated drafts can be used as a starting point for human writers and fact checkers to come in and complete biographies of underrepresented groups on the site.
Fan points out that the research that goes into writing a biography can be intensive, yet at the same time there is a vast amount of information available on the web that, when compiled correctly, can be used to tell the stories of women whose achievements, voices and legacies are mostly unknown.
Meta’s AI model makes light of this research, bringing up relevant information from the web to introduce the subject. Then, its generation module creates the text, before the citation module builds the comprehensive bibliography that links back to the sources used.
This process is repeated for each source of information the model finds on the internet. Each section of text is used to predict the next in order to cover all of the elements required for a Wikipedia biography, such as the subject’s early life, education and career.
Fan admits that there’s still lots of work to be done. She says the web content used to create the Wikipedia entries might sometimes be flawed or reflect cultural biases. Meanwhile, the model is sometimes subject to “hallucinating” nonfactual content, which results in creating text that may not be true. It also struggles with coherence.
“We’re hoping to improve these through advances in the neural architectures that power such models and through breakthroughs in the responsible development of AI,” Fan said.
Even so, the model is impressive. As an example, Fan used her model to research and generate a Wikipedia entry for Libbie Hyman, who was a pioneer in the study of invertebrate zoology. The green text in the image below was pulled from the original reference article the model started with, while the purple text is additional content pulled from elsewhere on the web. The orange text is indicative of hallucination, meaning the model appears to have guessed at things that can’t be verified.
Although it’s not entirely perfect or accurate, the model has pulled up enough relevant biographical information about Hyman that a human researcher or writer could quickly come in and finish up the article. It includes details about her work focusing on invertebrates, significant publications and also the wider impact of her work, which are useful starting points for editors to fact-check and expand upon her life and achievements.
To improve the model, Meta has also released a novel data set that it created to evaluate the model’s performance on 1,527 biographies of women from marginalized groups. Meta said this data can be used to train new iterations of the model and evaluate their performance.
“We are passionate about sharing this as an important research area with the broader generation community,” Fan said. “We hope that our techniques can eventually be used as a starting point for human Wikipedia writers — and ultimately lead to a more equitable availability of information online that can be accessed by students writing biographies — and beyond.”