FaceToolKit - A 3D Facial ToolKit

Research Collaborators
Steve DiPaola , Ali Arya , Malahat Hosseini

Our long range research project is a visual development system for exploring face space, both in terms of facial types and animated expressions. This development toolkit is based on a hierarchical parametric approach. This approach gives us an additive language of hierarchical expressions, emotions and lip-sync sequences.

Face Centric Approach
There are many authoring tools for facial animation, but they are based on general modeling and animation techniques (such as point morphing or boning systems) — very few can be used in an intuitive way specific to facial creation and understanding. It is believed that a system that uses the conceptual idea of a correlated space of all faces, which can be browsed, explored and manipulated through various intuitive techniques, has many advantages.

At the core of FaceSpace is a descriptive language for specifying both the features and state of a character’s face. It is based on an approach used by the highly successful language PostScript, which describes the layout of a page through an understanding of the standard elements of a document. In a similar way, FaceSpace uses knowledge of the human face, both anatomical and behavioral, to describe the type and expression in a compact representation.

Parameters and Behaviors

The essence of FaceSpace is a set of numerical parameters, each of which controls some aspect of a character’s face. Parameters are typically unitized vectors, each representing a sub-routine, which performs some low level complex transformations on the part of the face it controls. Because parameters are abstracted from their low level techniques, they have mathematically rigorous properties such as the ability to be combined, subtracted, added together, while still maintaining controllable and repeatable effects to their face model.

Left Figure. Two characters with the same “fear” expression applied to each. The “fear” behavior creates the impression of fear that is recognizable, but unique to each face. Note how the asymmetries inherent in the bottom character are preserved.

Higher-level constructs can be imposed on the basic parameter scheme by combining low-level parameters to create application specific descriptive elements. For example a user could modify the character’s appearance from “sophisticated” to “silly” with a single control that simultaneously modifies eye separation, forehead height, nose scale, etc.We refer to these multi-parameter constructs as Behaviors.

In this way we have begun to build up a hierarchical library of behaviors, expressions and character types which all can be combined and changed in any number of ways.


Lip Sync and Gesture

Real-time Lip Sync
The toolkit system is able to analyze in real-time, inputted voice audio for lip-syncing and inflection, and then use this analysis to drive the face playback in a flexible and customized way. Based on the real-time voice analysis, the system outputs lip sync on any face character type via parameterized control. Unlike morphing or key-frame systems, the lip sync mouth position will adapt to the particular character – regardless whether it is a small mouth or an asymmetrical one (see figure above).

Voice-based face gestures filtered through Behavior states
In addition to lip sync (i.e. mouth movement), the audio energy level is scanned in real time for cadence cues in the voice. With this information, it can create realistic face gestures that are in-directed through the current behavior state. So if the current behavior state is nervous, the playback character will lip sync and face gesture in sync to the voice in a nervous way. It cam pick up impulse and emphasis cues in the voice stream, as well as long pauses and blinking which will customized the way the character is being nervous to sync with the speaking and gesturing animation. This allows for realistic automatic face animation from real-time voice input on any character type in any personality.

Program control
In addition to the real-time gesture and lip-sync, these parametric animations of mood and behavior can also be under program control. Currently we can map any parametric animation snippets to keys on the keyboard, allowing the user to effect the current real-time animation even further in a fluid way. Say adding extra nervousness while the key is pressed down. So all these tools allow a user to pick a character, lip sync through it in real-time via his voice, effect that style of synched animation by picking a behavior type like happy, but at any time depressing a key to temporarily make the character lip sync and gesture in a mad way while the key was depressed down. All these real-time channels can be effectively combined to provide realistic real-time playback because of the strength of the basic parametric technique.


Some of the face toolkit work dates back when Steve DiPaola was a researcher at NYIT Computer Graphics Lab under the pioneer of face animation research – Fred Parke (now at Texas A&M). Offshoots of this prototype system are being used in other applications including a student run, open source face toolkit called facade as well as other work at the iVizLab called musicFace and genFace. We are currently upgrading the core toolkit described here into a integrated framework called iFace.

Downloads and Links
Info Viz ’02 Paper PDF Paper: “FaceSpace: A Facial Spatial-Domain Toolkit” – with additional emotional work.
Siggraph 02 Paper  PDF Paper: “Investigating Face Space” from Siggraph’02 sketch paper.
Stanford Facade Page  Web Link: A student run, open-source face toolkit from DiPaola’s face animation class at Stanford.
rFace Download Site  Web Link: Older site where you can download rface version of the Face Toolkit.
Info Viz ’02 Talk  Web Link: Presentation accompanying talk from InfoViz’02.