A Neural Basis of Facial Action Recognition

A Neural Basis of Facial Action Recognition


This video will cover our work on the perception
of facial expressions and the neural mechanisms involved in this process. This work appears
in the Journal of Neuroscience; a link to it can be found on our website. This research was done at The Ohio State University
by my colleagues Ramprakash Srinivasan, Julie Golomb and myself. Look at this video. What do you see? Is this
person approachable? Is he interacting with someone? Avoiding them? Maybe laughing at
a disgusting joke? In order to answer these questions, our brain
performs a series of computations that provide us with the necessary information to not only
interpret this person’s actions and intent, but to determine how we should respond. But, what are these computations? To understand these seemingly effortlessly
computations, we first need to explain how facial expressions are produced. As shown in this illustration, muscles under
our skin contract and relax to produce facial expressions. To make life easier on us, it
is convenient to provide a unique number to each of the possible muscle actions. These are called Action Units. For example,
as seen here, action unit 1 (or AU 1) defines the contraction of the muscle that moves the
inner corner of the eyebrows up, while AU 2 defines the contraction of the muscle that
moves the outer corner of the brows up. Let us now see the result that AU 12 has on
an image. AU 12 is the contraction of a cheek muscle which results in the pulling of the
lips. This yields clearly visible image changes… as you can see in this image. We define facial actions as these visible
changes, and use action units as a way to annotate them. Here are the four action units we included
in our study; AUs 1, 2, 12 and 20. Note that each AU is common in many facial
expressions. In fact action units are the building blocks of facial expressions, allowing
us to construct many facial expressions of emotion, produce language and other non-verbal
communication signals. Our hypothesis is that, to visually interpret
facial expressions, our brain must solve the inverse problem, that is, our brain must identify
which action units are active in a face. Specifically, we hypothesize that the brain
region where this visual recognition takes place is the posterior Superior Temporal Sulcus
or pSTS, which is thought to increase its neural activity when we observe biological
movements. To test our hypothesis, we perform an fMRI
study. fMRI stands for functional Magnetic Resonance Imaging. Here, a participant lays comfortably in an
MRI, which is a big magnet that can take 3-dimensional pictures of the brain. To do this, the brain
is divided into 3-dimensional voxels of about 3 x 3 x 3 mm. The word functional means that
we estimate the amount of neural activation in each of these voxels by measuring the quantity
of oxygenation in blood. More oxygen correlates with higher neural activity. The neural activity on each of these voxels
is measured while the participant observes a set of facial expressions. Our experiment is designed in such a way that
half of the images the participant sees have AU 1 present while the other half do not.
The same is true for the other AUs. Half the images have that AU present, half don’t. The fMRI acquisitions we collected are represented
here by stars, with red indicating the AU was present and yellow not present during
that acquisition. Note they are plotted in a feature space where the dimensions correspond
to each of the voxels. That is, each of these dimensions specifies how much neural activity
there is in each voxel. Of course, we are only showing the values of two voxels even
though there are thousands of them in the brain. We have also included the feature spaces of
the brains of two participants. This illustrates how the activation of each participant is
slightly different. Clearly, our brains are not exactly alike. To align these subject-specific representations,
we use a linear transformation known as Principal Components Analysis (or PCA). After these representations have been aligned,
we can use a machine learning classifier to learn to discriminate between the acquisitions
that correspond to images with a certain AU present vs. the images with that AU not present. This step is called the learning phase because
it is here that we learn to identify when the brain is decoding an image with a specific
AU active. When we have data of an independent participant
who was not used during this training phase, we talk about testing. During testing, we do not know which acquisitions
correspond to images with a specific AU present and which don’t. This is indicated by the
blue colored stars. After projecting the blue stars from the subject
space to the aligned representation we see which ones are on the “red” side (indicating
that the AU is present) and which on the “yellow” side (indicating that the AU is not present). Hence, we can now ask you to get in the MRI
and while we show you images, our computer can look into the neural activity of your
brain and robustly and fully automatically identify which action units you are looking
at. In fact, the computer only needs to focus
on the red and yellow colored voxels shown in these images. These voxels are of course
in the pSTS and, thus, these results support our hypothesis of a neural mechanism for the
recognition of action units in this brain region. We also know that this region is less active
or even silent in Autism, suggesting a mechanism by which facial expression analysis is impaired
in this and other disorders. You can learn more about this and related
project on our website, where you will find a multitude of teaching and research resources.

About the Author: Earl Hamill

Leave a Reply

Your email address will not be published. Required fields are marked *