rashima Lab
Harashima Lab
The authors have created a system capable of animating a face using text or speech as input data. Their aim is to have the system running from visual input. The system could become an intelligent user interface that could read the user's face and interpret the speech, rather than using a keyboard or mouse. The screen could display a synthesised face and have synthesised speech to talk to the user, giving a much friendlier and more natural interface.
The system uses a 3-D wire frame model and maps a 2-D texture onto it. Points of importance on the 3-D face are matched up to corresponding points on the texture map using an affine-transformation. The authors set up 17 phoneme positions for the face. The model includes teeth, and the movements of the teeth follow directly from the jaw movements. Two methods of voice to image conversion are used. The first is vector quantisation, and the other is synthesis by neural network. The output of each of these converters becomes the input to the image synthesis system.
The image display is done as four sub-processes: facial movement calculation using eight parameters; transformation of the wire-frame model; texture mapping for each polygon; and, output to the screen. Several machines are linked together by Ethernet to carry out these tasks. To give a bit more realism to the system (it only did mouth movements) the authors introduced random blinking and the ability to change the expression on the model's face. Possible applications include: an intelligent human-machine interface; an intelligent communication system; an automatic animation production system; and a less complicated computer interface for the handicapped. [Synopsis by Valarie Hall]