Initial Clarissa test on ISS: summary of results

Clarissa was first used by Expedition 11 Science Officer and Flight Engineer John Phillips on June 27, 2005. To the best of our knowledge, this is the first ever use of a spoken dialogue system in space. During the test, Phillips completed the interactive Clarissa training procedure, which exercises all the main system functionality; this procedure contains 50 steps, and took 25 minutes to complete. Table 1 summarises performance per step:

No problems
Bad recognition due to background speech
Bad recognition due to misunderstandings about command syntax
Total steps
Table 1: Performance per step in training procedure.
One step had problems both with background speech
and misunderstandings about command syntax.

Of the 50 procedure steps, 45 were completed without incident. In four steps, Clarissa suffered from speech recognition problems, apparently due to the fact that fellow crew-member Sergei Krikalev was talking near to the microphone. In all but one of these steps, the system simply failed to respond, and Phillips was able to correct the problem by repeating himself. In two steps, the training procedure was insufficiently clear about explaining the correct command syntax, and Phillips attempted to phrase requests in ways not acceptable to the recogniser. In the first of these steps (entry of numbers into a table), Phillips quickly ascertained that the recogniser did not permit negative numbers, and completed the step. In the second (setting an alarm), Phillips was unable to find the correct syntax to define the alarm time. This was the only step that was not completed successfully.

While Phillips was navigating the training procedure, the recogniser recorded 113 separate audio files: most of these contained spoken commands, but some were just background noise. Table 2 breaks down performance by files:

Recognised exactly
At least one word different, but correctly understood
Non-command, correctly ignored
Appropriate responses
No recognition
Incorrect recognition
Inappropriate responses
Total responses 113
Table 2: Performance per audio file in training procedure.

99 of the 113 files produced appropriate responses. In 84 cases, the file contained a command, and all the words were recognised correctly. In another four cases, the file was again a command, at least one word was misrecognised, but the system still understood and responded correctly. In 11 cases, the file contained non-command content (usually background noise), which was correctly ignored.

14 files produced inappropriate responses. In nine cases, the system failed to respond at all to a command, and in another five it responded incorrectly. The reasons for these problems are described above.

Both Phillips and the Clarissa team considered that the system performed very creditably during its first test. We hope to carry out a second test later in Expedition 11, using a real water sampling procedure.


Project Lead:
Beth Ann Hockey

Manny Rayner

Claire Castillo
Susana Early
Amelia Fischer
Vladimir Tkachenko
Sylvia Stoddart

Previous Contributors:
Kim Farrell (Project Lead)
Nikos Chatzichrisafis
John Dowding
Barney Pell
Greg Aist
Jim Hieronymus


Article in New Scientist, June 27, 2005

