View on GitHub

nlp4fun.github.io

Solving language games @EVALITA18

News

Task Description

Language games draw their challenge and excitement from the richness and ambiguity of natural language, and therefore have attracted the attention of researchers in the fields of Artificial Intelligence and Natural Language Processing. For instance, IBM WatsonTM is a system which successfully challenged human champions of Jeopardy!TM, a game in which contestants are presented with clues in the form of answers, and must phrase their responses in the form of a question [1]. Other researchers exploited question answering techniques to build an artificial player for “Who Wants to be a Millionaire?” [2]. Another popular language game is solving crossword puzzles. The first experience reported in the literature is Proverb [3], that exploits large libraries of clues and solutions to past crossword puzzles. WebCrow is the first solver for Italian crosswords [4].

The proposed task consists in designing a solver for “The Guillotine” (La Ghigliottina, in Italian) game. It is inspired by the final game of an Italian TV show called “L’eredità”. The game, broadcast by Italian National TV, involves a single player, who is given a set of five words - the clues - each linked in some way to a specific word that represents the unique solution of the game. Words are unrelated to each other, but each of them has a hidden association with the solution. Once the clues are given, the player has one minute to find the solution. For example, given the five clues: sin, Newton, doctor, New York, bad, the solution is apple, because: the apple is the symbol of original sin in Christian theology; Newton discovered the gravity by means of an apple; “an apple a day keeps the doctor away” is a famous proverb; New York city is also called “the big apple”; and “one bad apple can spoil the whole bunch” is a popular phrase which figuratively means that the person doing wrong can have a negative influence on those around him. “La Ghigliottina” is a challenging language game which demands knowledge covering a broad range of topics. Artificial players for that game can take advantage from the availability of open repositories on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to understand clues [5]. Participants must build an artificial player able to solve “La Ghigliottina”.

Data Description

Data Format

We will provide a set of both training and testing games in the XML format:

<games>
  <game>
        <id>3fc953bd-bd48-4fb9-a86c-bd979c1b5c3f</id>
        <clue>uomo</clue>
        <clue>cane</clue>
        <clue>musica</clue>
        <clue>casa</clue>
        <clue>pietra</clue>
        <solution>chiesa</solution>
        <type>TV</type>
  </game></games>

The XML file consists of a root element games which contains several game elements. Each game has five clue elements and one solution. Moreover, the element type reports the type of the game. The dataset has two kinds of game:

The current dataset contains 421 games. We will provide 316 and 105 games as training and testing, respectively. The participants can integrate any knowledge resources in their systems except further games.

Participants must provide for each game a ranked list of maximum 100 tentative solutions. Results must be provided in a text plain file according to the following format:

id solution score rank time

Values must be separated by a whitespace character and time must be reported in milliseconds. For example:

3fc953bd-bd48-4fb9-a86c-bd979c1b5c3f porta 0.978 1 3459
3fc953bd-bd48-4fb9-a86c-bd979c1b5c3f chiesa 0.932 2 3251
3fc953bd-bd48-4fb9-a86c-bd979c1b5c3f santo 0.897 3 4321

3fc953bd-bd48-4fb9-a86c-bd979c1b5c3f carta 0.321 100 2343

As evaluation measure, we adopt a weighted version of Mean Reciprocal Rank (MRR). Since time is a critical factor in this game, the Reciprocal Rank will be weighted by a function which takes into account the time. In the TV game, the player has one minute to provide the solution. Taking into account these factors, the final evaluation measure is:

where G is the set of games and rg is the rank of the solution, while tg denotes the minutes taken by the system to produce the tentative solutions. Systems that take more than 10 minutes are equally penalized.

We plan to provide a separate ranking for TV and boardgame, but the final ranking will take into account the whole test-set.

How to Participate

Online registration is open: registration form.

Important Dates

(tentative)

Further details will be made available in the near future.

References

[1] D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C.Welty, “Building Watson: An overview of the DeepQA project,” AI Magazine, vol. 31, no. 3, pp. 59–79, 2010.

[2] P. Molino, P. Lops, G. Semeraro, M. de Gemmis, and P. Basile. Playing with knowledge: A virtual player for who wants to be a millionaire? that leverages question answering techniques. Artificial Intelligence, vol. 222, pp. 157-181, 2015.

[3] M. L. Littman, G. A. Keim, and N. Shazeer, “A probabilistic approach to solving crossword puzzles,” Artificial Intelligence, vol. 134, pp. 23–55, 2002.

[4] M. Ernandes, G. Angelini, and M. Gori, “A web-based agent challenges human experts on crosswords,” AI Magazine, vol. 29, no. 1, pp. 77–90, 2008.

[5] P. Basile, M. de Gemmis, P. Lops, and G. Semeraro, Solving a complex language game by using knowledge-based word associations discovery. IEEE Transactions on Computational Intelligence and AI in Games, vol. 8, no. 1, pp. 13-26, 2016.


Organizers

Contacts

If you have any questions, please contact us: nlp4fun.evalita@gmail.com.