
“It is basically a five-year quest to create a complete semantic map of a particular language,” explains Arnauld Leservot who
runs the LIST laboratory (Technological Research Department of the CEA). “It will show how all of the words in this language are related to each other, how frequently they are related, and in what context.” The Jean-Luc Lagardère Foundation decided to support this fascinating quest for knowledge by offering an annual stipend for a post-doctoral candidate over three years. Dubbed WASP (for Web-based Acquisition of Semantics and Pragmatics), the project seeks not only to identify and catalogue the use of the simple words in a language, but also and above all to identify the way these words combine to define new concepts. “Take the expression ‘race car,’ for example,” says Gregory Grefenstette, the scientific manager of the project. “If we place the expression in a broader context, the new semantic map
will allow us to understand that it has a different meaning from ‘rental car,’ and allow us to link this meaning to the realm of sports.” This refined linguistic work hit numerous obstacles until the arrival of the Internet in 1994, which made significant progress possible. Indeed, researchers are using the peculiar language of the web to analyze our language. “A dictionary contains around 150 000 words. In 2004, the web counted 13 billion word uses for French and 80 billion for English. This massive source of data can be processed by increasingly sophisticated computers and programs—hardware and software that did not exist five years ago,” notes Gregory Grefenstette. In fact, he has set his sights beyond the written word.
He now wants to extend the scope of his research to include at least one radio source so that spoken language can also be analyzed. In other words, the goal is to scientifically examine a language in its entirety.
While for now the team of seven researchers dedicated to the WASP project have decided to begin with French and English, it is already planning to extend the scope of inquiry to encompass Italian, Spanish, German, Chinese, Japanese and Arab! But what is the ultimate aim of this activity?
According to Gregory Grefenstette, the applications are numerous and some have not even been invented yet. We already know that this fine linguistic analysis will be useful in the field of translation, especially automated translation, since it will henceforth deal not just with one word but with the most frequent uses of the word and its associations.

Gregory Grefenstette, the scientific manager of the project.
LIST - an exceptional laboratory