Motion Capture and Face
Tracking
SynthEyes offers the exciting capability to do full body and facial motion
capture using conventional video or film cameras.
STOP! Unless you know how to do
supervised
tracking and understand
moving-object tracking, you will not be able to do motion tracking. The
material here builds upon that earlier material; it is not repeated here
because it would be exactly that, a repetition.
First, why and when is motion capture necessary? The moving-object
tracking discussed previously is very effective for tracking a head, when the
face is not doing all that much, or when trackable points have been added in
places that don’t move with respect to one another (forehead, jaws, nose). The
moving-object mode is good for making animals talk, for example. By contrast,
motion capture is used when the motion of the moving features is to be
determined, and will then be applied to an animated character. For example, use
motion capture of an actor reading a script to apply the same expressions to an
animated character. Moving-object tracking requires only one camera, while
motion capture requires several calibrated cameras.
Second, we need to establish a few very important points: this is not the
kind of capability that you can learn on the fly as you do that important
shoot, with the client breathing down your neck. This is not the kind of thing
for which you can expect to glance at this manual for a few minutes, and be a
pro. Your head will explode. This is not the sort of thing you can expect to
apply to some musty old archival footage, or using that old VHS camera at night
in front of a flickering fireplace. This is not something where you can set up
a shoot for a couple of days, leave it around with small children or animals
climbing on it, and get anything usable whatsoever. This is not the sort of
thing where you can take a SynthEyes export into your animation software, and
expect all your work to be done, with just a quick render to come. And this is
not the sort of thing that is going to produce the results of a $250,000 custom
full body motion capture studio with 25 cameras.
With all those dire warnings out of the way, what is the good news? If you
do your homework, do your experimentation ahead of time, set up technically
solid cameras and lighting, read the SynthEyes manual so you have a fair
understanding what the SynthEyes software is doing, and understand your 3-D
package well enough to set up your character or face rigging, you should be
able to get excellent results.
In this manual, we’ll work through a sample facial capture session. The
techniques and issues are the same for full body capture, though of course the
tracking marks and overall camera setup for body capture must be larger and
more complex.
Introduction
To perform motion capture of faces or bodies, you will need at least two
cameras trained on the performer from different angles. Since the performer's
head or limbs are rotating, the tracking features may rotate out of view of the
first two cameras, so you may need additional cameras to shoot more views from
behind the actor.
The fields of view of the cameras must be large enough to encompass the
entire motion that the actor will perform, without the cameras tracking the
performer (OK, experts can use SynthEyes for motion capture even when the
cameras move, but only with care).
You will need to perform a calibration process ahead of time, to determine
the exact position and orientation of the cameras with respect to one another
(assuming they are not moving). We’ll show you one way to achieve this, using
some specialized but inexpensive gear.
Very Important: You’ll have to ensure that nobody
knocks the cameras out of calibration while you shoot calibration or live
action footage, or between takes.
You’ll need to be able to resynchronize the footage of all the cameras in
post. We’ll tell you one way to do that.
Generally the performer will have tracker markers attached, to ensure the
best possible and most reliable data capture. The exception to this would be if
one of the camera views must also be used as part of the final shot, for
example, a talking head that will have an extreme helmet added. In this case,
markers can be used where they will be hidden by the added effect, and in
locations not permitting trackers, either natural facial features can be used
(HD or film source!), or markers can be used and removed as an additional
effect.
After you solve the calibration and tracking in SynthEyes, you will wind
up with a collection of trajectories showing the path through space of each
individual feature. When you do
moving-object tracking, the trackers are all rigidly connected to one
another, but in motion capture, each tracker follows its own individual
path.
You will bring all these individual paths into your animation package, and
will need to set up a rigging system that makes your character move in response
to the tracker paths. That rigging might consist of expressions, Look At
controllers, etc; it's up to you and your animation package.
Camera Types
Since eachcamera's fields of view must encompass the entire performance
(unless there are many overlapping cameras), at any time the actor is usually a
small portion of the frame. This makes progressive DV, HD, or film source
material strongly suggested.
Progressive-scan cameras are strongly recommended, to avoid the factor of
two loss of vertical resolution due to interlacing. This is especially
important since the tracking markers are typically small and can slip between
scan lines.
While it may make operations simpler, the cameras do not have to be the
same kind, have the same aspect ratio, or have the same frame rate.
Resist the urge to use that old consumer-grade analog videotape camera as
one of the cameras—the recording process will not be stable enough for good
results.
Lens distortion will substantially complicate calibration and processing.
To minimize distortion, use high-quality lenses, and do not operate them near
their maximum field of view, where distortion is largest. Do not try to squeeze
into the a small studio space.
Camera Placement
The camera placements must address two opposing factors: one, that the
cameras should be far apart, to produce a large parallax disparity with good
depth perception, and that the cameras should be close together, so that they
can simultaneously observe as many trackers as possible.
You’ll probably need to experiment with placement to gain experience,
keeping in mind the performance to be delivered.
Cameras do not have to be placed in any discernable pattern. If the
performance warrants it, you might want coverage from up above, or down
below.
If any cameras will move during the performance, they will need a visible
set of stationary tracking markers, to recover their trajectory in the usual
fashion. This will reduce accuracy compared to a carefully calibrated
stationary camera.
Lighting
Lighting should be sufficient to keep the markers well illuminated,
avoiding shadowing. The lighting should be enough to be able to keep the
shutter time of the cameras as low as possible, consistent with good image
quality.
Calibration Requirements and
Fixturing
In order for motion tracking footage to be solved, the camera positions,
orientations, and fields of view must be determined, independent of the “live”
footage, as accurately as possible.
To do this, we will use a process based on moving-object tracking. A
calibration object is moved in the field of view of all the cameras, and
tracked simultaneously.
To get the most data fastest and easiest, we constructed a prop we call a
“porcupine” out of a 4” Styrofoam ball, 20-gauge plant stem wires, and small 7
mm colored pom-pom balls, all obtained from a local craft shop for under $5.
Lengths of wire were cut to varying lengths, stuck into the ball, and a pom-pom
glued to the end using a hot glue gun. Retrospectively, it would have been
cleverer to space two balls along the support wire as well, to help set up a
coordinate system.
The porcupine is hung by a support wire in the location of the performer's
head, then rotated as it is recorded simultaneously from each camera. The
porcupine's colored pom-poms can be viewed virtually all the time, even as they
spin around to the back, except for the occasional occlusion.
Similar fixtures can be built for larger motion capture scenarios, perhaps
using dolly track to carry a wire frame. It is important that the individual
trackable features on the fixture not move with respect to one another: their
rigidity is required for the standard object tracking.
The path of the calibration fixture does not particularly matter.
Camera Synchronization
The timing relationship between the different cameras must be established.
Ideally, all the cameras would all be gen-locked together, snapping each image
at exactly the same time. Instead, there are a variety of possibilities which
can be arranged and communicated to SynthEyes during the setup process.
Motion capture has a special solver mode on the
Solver Panel :
individual mocap. In this mode, the second dropdown
list changes from a
directional hint to control camera synchronization.
If the cameras are all video cameras, they can be gen-locked together to
all take pictures identically. This situation is called “Sync Locked.”
If you have a collection of video cameras, they will all take pictures at
exactly the same (crystal-controlled) rate. However, one camera may always be
taking pictures a bit before the other, and a third camera may always be taking
pictures at yet a different time than the other two. The option is “Crystal
Sync.”
If you have a film camera, it might run a little more or a little less
that 24 fps, not particularly synchronized to anything. This will be referred
to as “Loose Sync.”
In a capture setup with multiple cameras, one can always be considered to
be Sync Locked, and serve as a reference. If it is a video camera, other video
cameras are in Crystal Sync, and any film camera would be Loose Sync.
If you have a film camera that will be used in the final shot, it should
be considered to be the sync reference, with Sync Locked, and any other cameras
are in Loose Sync.
The beginning and end of each camera's view of the calibration sequence
and the performance sequence must be identified to the nearest frame. This can
be achieved with a clapper board or electronic slate. The low-budget approach
is to use a flashlight or laser pointer flash to mark the beginning and end of
the shot.
Camera Calibration Process
We’re ready to start the camera calibration process, using the two shot
sequences LeftCalibSeq and RightCalibSeq. You can start SynthEyes and do a
File/New for the left shot, and then Add Shot to bring in the second. Open both
with Interlace=Yes, as unfortunately both shots are interlaced. Even though
these are moving-object shots, for calibration they will be solved as
moving-camera shots.
You can see from these shots how the timing calibration was carried out.
The shots were cropped right before the beginning of the starting flash, and
right after the ending flash, to make it obvious what had been done. Normally,
you should crop after the starting flash, and before the ending flash.
On your own shots, you can use the Image Preprocessing panel's
Region-of-interest capability to reduce memory consumption to help handle long
shots from multiple cameras.
You should supervise-track a substantial fraction of the pom-poms in each
camera view; you can then solve each camera to obtain a path of the camera
appearing to orbit the stationary pom-pom.
Next, we will need to set up a set of links between corresponding trackers
in the two shots. The links must always be on the Camera02 trackers, to a
Camera01 tracker. This can be achieved at least three different ways.
Matching Plan A: Temporary
Alignment
This is probably easiest, and we may offer a script to do the grunt work
in the future.
Begin by assigning a temporary coordinate system for each camera, using
the same pom-poms and ordering for each camera. It is most useful to keep the
porcupine axis upright (which is where pom-poms along the support wire would
come in useful, if available); in this shot three at the very bottom of the
porcupine were suitable.
With matching constraints for each camera, when you re-solve, you will
obtain matching pairs of tracker points, one from each camera, located very
close to one another.
Now, with the Coordinate System panel
open,
Camera02 active, and the Top view selected, you can click on each of Camera02's
tracker points, and then alt-click (or command-click) on the corresponding
Camera01 point, setting up all the links.
As you complete the linking, you should remove the initial temporary
constraints from Camera02.
Matching Plan B: Side by Side
In this plan, you can the Camera & Perspective viewport configuration.
Make Camera01 active, and in the perspective window, right-click and
Lock to current camera with Camera01's imagery, then make
Camera02 active for the camera view. Now camera and perspective views show the
two shots simultaneously (Experts: you can open multiple perspective
windows and configure each for a different shot.).
You can now click the trackers in the camera(02) view, and alt-click the
matching (01) tracker in the perspective window, establishing the links.
Reminder: The coordinate system control panel must
be open for linking. This will take a little mental rotation to establish the
right correspondences; the colors of the various pom-poms will
help.
Matching Plan C: Cross Link by
Name
This plan is probably more trouble than it worth for calibration, but can
be an excellent choice for the actual shots. You assign names to each of the
pom-poms, so that the names differ only by the first character, then use the
Track/Cross-Link by Name menu item to establish links.
It is a bit of pain to come up with different names for the pom-poms, and
do it identically for the two views, but this might be more reasonable for
other calibration scenarios where it is more obvious which point is
which.
Completing the Calibration
We’re now ready to complete the calibration process. Change Camera02 to
Indirectly solving mode on the Solver panel
.
Note: the initial position of Camera01 is going to
stay fixed, controlling the overall positions of all the cameras. If you want
it in some particular location, you can remove the constraints from it, reset
its path from the 3-D panel, then move it around to a desired location
Solve the shot, and you have two orbiting cameras remaining at a fixed
relative orientation as they orbit.
Run the Motion Capture Camera Calibration script
from the Script menu, and the orbits will be squished down to single locations.
Camera01 will be stationary at its initial location, and Camera02 will be
jittering around another location, showing the stability of the offset between
the two. The first frame of Camera02's position is actually an average relative
position over the entire shot; it is this location we will later use.
You should save this calibration scene file
(porcupine.sni); it will be the starting point for tracking
the real footage. The calibration script also produces a script_output.txt file
in a user-specific folder that lists the calibration data.
Body and Facial Tracking
Marks
Markers will make tracking faster, easier, and more accurate. On the face,
markers might be little Avery dots from an office supply store, “magic marker”
spots, pom-poms with rubber cement(?), mascara, or grease paint. Note that
small colored dots tend to lose their coloration in video images, especially
with motion blur. Make sure there is a luminance difference. Single-pixel-sized
spots are less accurate than those that are several pixels across.
Markers should be placed on the face in locations that reflect the
underlying musculature and the facial rigging they must drive. Be sure to
include markers on comparatively stationary parts of the head.
For body tracking, a typical approach is to put the performer in a black
outfit (such as UnderArmour), and attach table-tennis balls as tracking
features onto the joints. To achieve enough visibility, placing balls on both
the top and bottom of the elbow may be necessary. Because the markers must be
placed on the outside of the body, away from the true joint locations,
character rigging will have to take this into account.
Preparation for Two-Dimensional
Tracking
We’re ready to begin tracking the actual performance footage. Open the
final calibration scene file. Open the
3-D panel .
For each camera, select the camera in the select-by-name dropdown list. Then
hit Blast and answer yes to store the field of view data as well. Then, hit
Reset twice, answering yes to remove keys from the field of view track also.
The result of this little dance is to take the solved camera paths (as modified
by the script), and make them the initial position and orientation for each
camera, with no animation (since they aren’t actually moving).
Next, replace the shot for each camera with LeftFaceSeq and RightFaceSeq.
Again, these shots have been cropped based on the light flashes, which would
normally be removed completely. Set the End Frame for each shot to its maximum
possible. If necessary, use an animated ROI on the Imaging Preprocessing panel
so that you can keep both shots in RAM simultaneously. Hit Control-A and delete
to delete all the old trackers. Set each Lens to Known to lock the field of
view, and set the solving mode of each camera to Disabled, since the cameras
are fixed at their calibrated locations.
We need a placeholder object to hold all the individual trackers. Create a
moving object, Object01, for Camera01, then a moving object, Object02, for
Camera02. On the Solving Panel, set Object01 and Object02 to the
Individual mocap solving mode, and set the
synchronization mode right below that.
Two-Dimensional Tracking
You can now track both shots, creating the trackers into Object01 and
Object02 for the respective shots. If you don’t track all the markers, at least
be sure to track a given marker either in both shots, or none, as a
half-tracked marker will not help. The Hand-Held: Use Others
mode may be helpful here for the rapid facial motions. Frequent keying will be
necessary when the motion causes motion blur to appear and disappear (a lot of
uniform light and short shutter time will minimize this).
Linking the Shots
After completing the tracking, you must set up links. The easiest approach
will probably be to set up side-by-side camera and perspective views. Again,
you should link the Object02 trackers to the Object01 trackers, not the other
way around.
Doing the linking by name can also be helpful, since the trackers should
have fairly obvious names such as Nose or Left Inner Eyebrow, etc.
Solving
You’re ready to solve, and the Solve step
should
be very routine, producing paths for each of the linked trackers. The final
file is facetrk.sni.
Afterwards, you can start checking on the trackers. You can scrub through
the shot in the perspective window, orbiting around the face. You can check the
error curves and XYZ paths in the graph editor
.
By switching to Sort by Error mode ,
you can sequence through the trackers starting from those with the highest
error.
Exports & Rigging
When you export a scene with individual trackers, each of them will have a
key frame on each frame of the shot, animating the tracker path.
It is up to you to determine a method of rigging your character to take
advantage of the animated tracker paths. The method chosen will depend on your
character and animation software package. It is likely you will need some
expressions (formulas) and some Look-At controls. For full-body motion capture,
you will need to take into account the offsets from the tracking markers (ie
balls) to the actual joint locations.
Modeling
You can use the calculated point locations to build models. However, the
animation of the vertices will not be carried forward into the meshes you
build. Instead, when you do a Convert to Mesh operation in the perspective
window, the current tracker locations are frozen on that frame.
If desired, you can repeat the object-building process on different frames
to build up a collection of morph-target meshes.