Frontiers in Bioscience. Experiment 1: Two example videos each for two personally familiar speakers, with either corresponding or noncorresponding auditory and visual speaker identities. Experiment 3: Two example videos of visual speakers that were combined with corresponding or noncorresponding auditory voice identities.
Note: Videos were linearly deblurred during the first ms of presentation. Furthermore, as has been well established in previous research, audiovisual integration is important in speech perception. You should play the clip while looking at the face and listening to the voice. Try listening to the voice with your eyes closed to get an idea of the difference between what you hear auditorily and what you perceive audiovisually.
The McGurk effect works for full sentence stimuli as well. The third and final part of the video allows you to perceive the result of the integration of these two signals. It can be viewed here. Audiovisual Integration in Speech and Speaker Perception. Schweinberger, Jena, Germany Verena G. Selected Relevant Publications. Stimulus examples Windows Media Player required. Below are some examples of the types of stimuli we use to investigate the effects of audiovisual integration on speaker recognition in our ongoing research: Corresponding Static: An example of a familiar voice, combined with the correct corresponding static face Corresponding Dynamic: An example of a familiar voice, combined with the correct corresponding dynamic face Noncorresponding Dynamic: An example of a familiar voice, combined with an incorrect non-corresponding dynamic face, edited so as to ensure precise temporal synchronisation.
Non-corresponding Dynamic with delayed video clarity : An example of a familiar voice, combined with an incorrect non-corresponding dynamic face. The video was presented in black and white and the blurred face becomes clearer over time.
Integrating Face and Voice in Person Perception - Semantic Scholar
This was done to investigate whether the voice can affect face recognition. Audiovisual Asynchrony We investigated the effects that asynchronous audiovisual presentations had on voice recognition. Backwards video: Corresponding Backwards: An example of a familiar voice, combined with the correct corresponding backwards-animated face. Non-corresponding Backwards: An example of a familiar voice, combined with an incorrect non-corresponding backwards-animated face.
Manipulation of audiovisual synchrony: Corresponding ms: An example of a familiar voice and face video, where the voice leads the facial motion by milliseconds voice begins ms before facial motion. However, despite some pioneering works published in the s, it is rather recently that face and voice recognition were narrowly compared.
Making such a comparison in the first paper of the present special issue, Catherine Barsics shows that the retrieval of both semantic and episodic information is usually easier from faces than from voices. Evelyne Moyse compares age estimation from faces and voices and shows that there are similarities and differences between both. An own-age bias may exist for age estimation from both faces and voices, but age is less precisely estimated from voices than from faces.
In addition to a comparison of the specific properties of face and voice processing, the necessity to understand how voice and face information are integrated during person recognition or emotional processing became progressively obvious, and drew the attention of cognitive psychologists, neuropsychologists and cognitive neuroscientists. Sarah Stevenage and Greg Neil examine how face processing and voice processing interact during person recognition, notably through the analysis of interference between the face and voice pathways.
Pierre Maurage and Salvatore Campanella analyse the alteration of face-voice integration during the processing of emotional stimuli in alcohol-dependent people. Finally, Guido Gainotti considers the patterns of disorders of familiar people recognition in patients with anterior temporal lesions including prosopagnosia, phonagnosia and multimodal recognition disorders. I will end this short introduction by acknowledging our sponsors, i.
I also would like to thank the co-organizers of the symposium, Salvatore Campanella and Gilles Pourtois. Bruce, V. Understanding face recognition. Sex categorizations show a similar pattern of functionally biased perceptions. Because men overall tend to be physically larger and stronger than women, they pose a greater potential threat to perceivers. In any condition of uncertainty, therefore, a functional bias is likely to favor a male percept. Under conditions that may signal potential threat, this tendency appears to be exacerbated.
When categorizing the sex of bodies, for example, perceivers show a pronounced male categorization bias for every body shape that is, in reality, not exclusive to women Johnson et al. Similarly, point-light defined arm motions that depict a person throwing an object are overwhelmingly categorized as male when the person engages a threatening emotion state i.
- Ante-Nicene Fathers, Vol. IV: Fathers of the Third Century: Tertullian, Part Fourth; Minucius Felix; Commodian; Origen, Parts First and Second.
- Commentary on Jeremiah.
- Belin / Campanella / Ethofer | Integrating Face and Voice in Person Perception | | .
- Navigation menu!
- Burnhams Celestial Handbook, Volume Two: An Observers Guide to the Universe Beyond the Solar System: 2 (Dover Books on Astronomy);
- Dragon Rising (Black Jade Dragon Book 3).
- Sport Car: Which One Is Different (Spot Puzzle Book 4).
Moreover, the findings from Johnson et al. Although perceivers are generally adept in achieving accurate social perception, accuracy goals may sometimes be overshadowed by other motivational concerns e. In such circumstances, current motivations may outweigh accuracy objectives, leading social perceptions to be functionally biased in a directional fashion. In sum, both perceptual attunements and functional biases may emerge from the top-down modulation of social perception, either through motivation, existing knowledge structures, or both.
Other factors that impinge on the combinatorial nature of social perception originate in the target of perception. This is because some perceptual attunements and biases are driven by the incoming sensory information itself. As such, cues to social identities may be confounded at the level of the stimulus. Such effects are now well documented for important intersections of social categories including sex and emotion, sex and race, and race and emotion.
Importantly, because these categories share cues, their perception becomes inextricably tethered, in turn producing attunements and biases that are moderated by the unique combination of cues and categories. One particularly intriguing juxtaposition of these dual routes of influence is in the perception of sex and emotion categories. This particular effect has received some attention over the years, initially with respect to shared stereotypes between emotions and sex categories. For instance, for many years, researchers found that facial expressions of emotion were perceived to vary between men and women Grossman and Wood, ; Plant et al.
Ambiguities in emotion expression tended to be resolved in a manner that was consistent with gender stereotypes Hess et al. Thus, the prevailing belief was that common associations between sex and emotion categories lead to biases in perceptual judgments. More recent research clarified that such results may also emerge via an alternate route. One argument put forth by Marsh et al. Likewise, gender appearance is similarly associated with facial features that perceptually overlap with facial maturity Zebrowitz, Like facial maturity and masculinized facial features, anger is characterized by a low, bulging brow and small eyes.
Conversely, like babyfacedness and feminized features, fear is distinguished by raised and arched brow ridge and widened eyes. Perhaps not too surprisingly then, several studies have hinted at a confounded nature between emotional expression and gender Hess et al. One more recent study examined the confound between gender and emotional expression of anger and happiness Becker et al.
In an even more recent study Hess et al. Such physical resemblance has been revealed in an even more compelling manner through computer-based models that are trained with facial metric data to detect appearance-based and expression cues in faces e.
' s personal copy Integrating face and voice in person perception
Critically, such studies avoid confounds with socially learned stereotypes. In one study, Zebrowitz et al. They found that the model was detected babyfacedness in surprise expressions and maturity in anger expressions due to similarities in height of brow. Additionally, the authors found that objective babyfacedness as determined by the connectionist model mediated impressions of surprise and anger in those faces reported by human judges.
In this way, they were able to provide direct evidence for babyfacedness overgeneralization effects on a wide array of perceived personality traits. Overlapping perceptual cues affect a number of other category dimensions as well. Some sex and race categories, for example, appear to share overlapping features.
In one study using a statistical face model derived from laser scans of many faces , cues associated with the Black category and cues associated with the male category were found to share a degree of overlap. This, in turn, facilitated the sex categorization of Black men relative to White or Asian men Johnson et al. A similar overlap exists between eye gaze and emotional expressions.
Gaze has the interesting property of being able to offer functional information to a perceiver that, when paired with certain expressions, can lead to interesting interactive effects. According to the shared signal hypothesis Adams et al. Because direct and averted eye gaze convey a heightened probability of a target to either approach or avoid a target individual respectively see Adams and Nelson, , for review , and anger and fear share underlying behavioral intentions see Harmon-Jones, , for review , this hypothesis suggests that processing should be facilitated when emotion and eye gaze are combined in a congruent manner i.
In support of both functional affordances described above, using speeded reaction time tasks and self-reported perception of emotional intensity, Adams et al. Similar effects were replicated by Sander et al. The converse effect holds as well; facial emotion influences how eye gaze is perceived. Direct eye gaze is recognized faster when paired with angry faces and averted eye gaze is recognized faster when paired with fearful faces Adams and Franklin, In addition, perceivers tend to judge eye gaze more often as looking at them when presented on happy and angry faces than neutral or fearful Lobmaier et al.
Further, Mathews et al. When eye gaze was shifted dynamically after emotion was presented, however, fearful faces were found to induce higher levels of cueing compared to other emotions for all participants regardless of anxiety level Tipples, ; Putman et al. More recently, Fox et al. These effects were also moderated by trait anxiety.
On the neural level, gaze has been found to influence amygdala responses to threatening emotion expressions. In an initial study, Adams et al. This study, however, was based on relatively sustained presentations of threat stimuli ms , whereas some more recent studies have found similar evidence for greater amygdala responses to congruent threat-gaze pairs direct anger and averted fear when employing more rapid presentations Sato et al.
Although these latter findings do corroborate Adams et al. In subsequent work, Adams et al. These differential responses support both an early process that detects threat and sets in motion adaptive responding, but a slightly slower process that is geared to confirming and perpetuating a survival response, or disconfirming and inhibiting an inappropriate response. It is in this interplay of reflexive and reflective processes that threat perception can benefit from different attunements to a threatening stimulus with different but complementary processing demands, to achieve the most timely and adaptive response to other people.
In short, many characteristics may interact in person perception because they are directly overlapping, often in functionally adaptive ways. The two routes by which social perceptions may be attuned or biased are now well documented, and such research provides an important foundation for understanding the basic mechanisms of social perception.
More interesting to our minds is their ability to help us understand how these dual routes work in concert to enable judgments of people that vary along multiple dimensions and across multiple sensory modalities. Recently, Freeman and Ambady a proposed a dynamic interactive framework to account for findings such as those reviewed above, and to map out how multiple category dimensions are perceived—and in many cases may interact—in a neurally plausible person perception system. In this system, multiple category dimensions e. Importantly, as we will describe shortly, while the system is attempting to stabilize onto particular perceptions over time, it will often throw different category dimensions into interaction with one another.
This may occur through either bottom-up or top-down forces, mapping onto the two routes described above. Before describing why and how these interactions would occur, we first outline the structure and function of the system. Freeman and Ambady a captured their theoretical system with a computational neural network model. The perceptual process that emerges in this system is a highly integrative one.
It incorporates whatever bottom-up evidence is available from others' facial, vocal, or bodily cues , while also taking into account any relevant top-down sources that could be brought to bear on perception. Thus, the system arrives at stable person construals not only through integrating bottom-up facial, vocal, bodily cues, but also by coordinating with and being constrained by higher-order social cognition e. As such, this system permits social top-down factors to fluidly interact with bottom-up sensory information to shape how we see and hear other people.
Although traditionally it was long assumed that perception is solely bottom-up and insulated from any top-down influence of higher-order processes e. Thus, we should expect top-down factors to be able to flexibly weigh in on the basic perceptual processing of other people. In this framework, person perception is treated as an ongoing, dynamic process where bottom-up cues and top-down factors interact over time to stabilize onto particular perceptions e.
This is because person perception, as implemented in a human brain, would involve continuous changes in a pattern of neuronal activity Usher and McClelland, ; Smith and Ratcliff, ; Spivey and Dale, Consider, for example, the perception of another's face. Early in processing, representations of the face would tend to be partially consistent with multiple categories e.
As more information accumulates, the pattern of neuronal activity would gradually sharpen into an increasingly confident representation e. Thus, this approach proposes that person perception involves ongoing competition between partially-active categories e. Further, the competition is gradually weighed in on by both bottom-up sensory cues as well as top-down social factors, until a stable categorization is achieved. Accordingly, bottom-up cues and top-down factors mutually constrain one another to shape person perception. How might this dynamic social—sensory interface be instantiated at the neural level specifically?
Let us consider sex categorization. One possibility is that visual processing of another's face and body in the occipotemporal cortex e. There, ongoing visual-processing results of the face begin integrating with ongoing auditory-processing results of the voice, which are emanating from the temporal voice area Lattner et al. While the available bottom-up information facial, vocal, and bodily cues begins integrating in multimodal regions such as the STS, the intermediary results of this integration are sent off to higher-order regions, such as the prefrontal cortex Kim and Shadlen, , in addition to regions involved in decision-making and response selection, such as the basal ganglia Bogacz and Gurney, In doing so, bottom-up processing provides tentative support for perceptual alternatives e.
The basal ganglia and higher-order regions such as the prefrontal cortex force these partially-active representations e. Before these processing results are fed back, however, they may be slightly adjusted by higher-order regions' top-down biases, e. Lower-level regions then update higher-order regions by sending back revised information e.
Across cycles of this ongoing interaction between the processing of bottom-up sensory cues instantiated in lower-level regions and top-down social factors instantiated in higher-order regions , the entire system comes to settle into a steady state e. This general kind of processing has been captured in a computational model, described below. A general diagram of the dynamic interactive model appears in Figure 1. It is a recurrent connectionist network with stochastic interactive activation McClelland, The figure depicts a number of pools; in specific instantiations of the model, each pool will contain a variety of nodes e.
Specific details on the model's structure may be found in Freeman and Ambady a. The model provides an approximation of the kind of processing that might take place in a human brain Rumelhart et al. Figure 1. A general diagram of the dynamic interactive model. Adapted from Freeman and Ambady a. Initially, the network is stimulated simultaneously by both bottom-up and top-down inputs see Figure 1. This may include inputs such as visual input of another's face, auditory input of another's voice, or higher-level input from systems responsible for top-down attention, motivations, or prejudice, for example.
Each model instantiation contains a variety of nodes that are organized into, at most, four interactive levels of processing one level representing each of the following: cues, categories, stereotypes, and high-level cognitive states. Every node has a transient level of activation at each moment in time. This activation corresponds with the strength of a tentative hypothesis that the node is represented in the input. Once the network is initially stimulated, activation flows among all nodes simultaneously as a function of their connection weights.
Activation is also altered by a small amount of random noise, making the system's states inherently probabilistic. Because many connections between nodes are bi-directional, this flow results in a continual back-and-forth of activation between many nodes in the system. As such, nodes in the system continually re-adjust each other's activation and mutually constrain one another to find an overall pattern of activation that best fits the inputs. Gradually, the flows of activation lead the network to converge on a stable, steady state, where the activation of each node reaches an asymptote.
Integrating Face and Voice in Person Perception
This final steady state, it is argued, corresponds to an ultimate perception of another person. Through this ongoing mutual constraint-satisfaction process, multiple sources of information—both bottom-up cues and top-down factors—are interacting over time toward integrating into a stable perception. As such, this model captures the intimate interaction between bottom-up and top-down processing theorized here. Thus, together, the approach and model treat perceptions of other people as continuously evolving over fractions of a second and emerging from the interaction between multiple bottom-up sensory cues and top-down social factors.
Accordingly, person perception readily makes compromises between the variety of sensory cues inherent to another person and the baggage an individual perceiver brings to the perceptual process. Now, let us consider how this system naturally brings about category interactions such as those described earlier, either through top-down perceiver impacts or bottom-up target impacts. A specific instantiation of the general model appears in Figure 2. Solid-line connections are excitatory positive weight and dashed-line connections are inhibitory negative weight. Further details and particular connection weights are provided in Freeman and Ambady a.
This instantiation of the model is intended to capture the experience of how a perceiver would go about categorizing either sex or race for a particular task context. When the network is presented with a face, its visual input stimulates nodes in the cue level. Cue nodes excite category nodes consistent with them and inhibit category nodes inconsistent with them. They also receive feedback from category nodes. At the same time that cue nodes receive input from visual processing, higher-level input stimulates higher-order nodes, in this case representing task demands.
This higher-level input would originate from a top-down attentional system driven by memory of the task instructions. These higher-order nodes excite category nodes consistent with them, inhibit category nodes inconsistent with them, and are also activated by category nodes as well. Figure 2. An instantiation of the dynamic interactive model that gives rise to category interactions driven by top-down stereotypes. One manner by which many categories may interact is through overlapping stereotype contents.
For instance, particular social categories in one dimension e. Stereotypes associated with the sex category, male, include aggressive, dominant, athletic, and competitive, and these are also associated with the race category, Black. Similarly, stereotypes of shy, family-oriented, and soft-spoken apply not only to the sex category, female, but also to the race category, Asian Bem, ; Devine and Elliot, ; Ho and Jackson, Thus, there is some overlap in the stereotypes belonging to the Black and male categories and in the stereotypes belonging to the Asian and female categories.
Johnson et al. Conversely, for a female face, sex categorization was quickened when made to be Asian, relative to White or Black. Moreover, when faces were sex-ambiguous they were overwhelmingly categorized as male when Black, but overwhelmingly categorized as female when Asian. Later work found that such influences have downstream implications for interpreting ambiguous identities e. How could a dynamic interactive model account for such interactions between sex and race, presumably driven by top-down stereotypes?
In the model, category activation along one dimension, e. Sex categorization, for example, is constrained by race-triggered stereotype activations. Because the stereotypes of Black and male categories happen to partially overlap, Black men would be categorized more efficiently relative to White and Asian men. This would facilitate a male categorization or, in cases of sex-ambiguous targets, bias categorizations toward male rather than female.
Thus, a dynamic interactive model predicts that incidental overlap in stereotype contents could powerfully shape the perception of another category dimension. When actual simulations were run with the network appearing in Figure 2 , it was found that race category was readily used to disambiguate sex categorization. Thus, a dynamic interactive model predicts that perceivers would be biased to perceive sex-ambiguous Black faces as men and, conversely, to perceive sex-ambiguous Asian faces as women. This is because the presumably task-irrelevant race category placed excitatory and inhibitory pressures on stereotype nodes which were incidentally shared with sex categories.
Indeed, Johnson et al.
As discussed earlier, different categories may be thrown into interaction because the perceptual cues supporting those categories partly overlap and are therefore directly confounded. For instance, sex categorization is facilitated for faces of happy women and angry men, relative to happy men and angry women. Further studies solidified the evidence that this interaction between sex and emotion is due to direct, physical overlap in cues rather than merely top-down stereotypes see also Becker et al.
Thus, these studies suggest the features that make a face angrier are also partly those that make a face more masculine. Similarly, the features that make a face happier are also partly those that make a face more feminine. For instance, anger displays involve the center of the brow drawn down-ward, a compression of the mouth, and flared nostrils.
However, men also have larger brows which may cause them to appear drawn down-ward. They also have a more defined jaw and thinner lips, which may make the mouth to appear more compressed, and they have larger noses, which may lead to the appearance of flared nostrils. A similar overlap exists for happy displays and the female face Becker et al. For instance, women have rounder faces than men, and the appearance of roundness increases when displaying happiness i.
Previous studies suggest that it is this direct, physical overlap in the cues signaling maleness and anger and in the cues signaling femaleness and happiness that leads to more efficient perceptions of angry men and happy women relative to happy men and angry women. A second instantiation of the general model appears in Figure 3 particular connection weights found in Freeman and Ambady, a. Differing from the previous instantiation, here nodes in the cue level represent a single perceptual cue e.
These four cue nodes represent the perceptual cues that independently relate to sex categories and independently relate to emotion categories. Thus, these two cue nodes represent the bottom-up overlap in the perceptual cues conveying sex and emotion. Specific cues used in this simulation were chosen arbitrarily; they are merely intended to simulate the set of non-overlapping and overlapping perceptual cues that convey sex and emotion.
Figure 3. An instantiation of the dynamic interactive model that gives rise to category interactions driven by bottom-up perceptual cues. When actual simulations were run with the network, the overlapping perceptual cues created bottom-up pressure that give rise to interactions between sex and emotion.
When a male face was angry, the M ALE category's activation grew more quickly and stabilized on a stronger state, relative to when a male face was happy. Conversely, however, when a female face was angry, the Female category's activation grew more slowly and stabilized on a weaker state, relative to when a female face was happy. This led sex categorization of angry men and happy women to be completed more quickly Freeman and Ambady, a.
This pattern of results converges with the experimental data of previous studies Becker et al. Thus, the categorization of one dimension e. This highlights how the model naturally accounts for such category interactions driven by bottom-up perceptual overlaps. One of the most fascinating aspects of person perception, which distinguishes it from most kinds of object perception, is that a single social percept can simultaneously convey an enormous amount of information.
From another's face, multiple possible construals are available in parallel, including sex, race, age, emotion, sexual orientation, social status, intentions, and personality characteristics, among others. Here we have reviewed two manners by which many of these construals may interact with one another. One manner is through top-down perceiver impacts, where existing knowledge structures, the stereotypes a perceiver brings to the table, motivations, and other social factors throw different dimensions into interaction.
Another manner is through bottom-up target impacts, where the perceptual cues supporting different dimensions are inextricably linked, leading those dimensions to interact. Further, these interactions in person perception may often occur in functionally adaptive ways. We then discussed a recent computational model of person perception that we argued is able to account for many of these sorts of interactions, both those driven by top-down and bottom-up forces. In short, person perception is combinatorial, and treating our targets of perception as having multiple intersecting identities is critical for an accurate understanding of how we perceive other people.
Research investigating the underlying mechanisms of person perception is growing rapidly. To take up this new level of analysis in understanding person perception successfully, collaboration between scientists in traditionally divided domains is needed, such as the social-visual interface Adams et al. Here, we have argued that there is a coextension among sensory and social processes typically investigated independently. To map out how low-level visual information traditionally home to the vision sciences may meaningfully interact with and be shaped by high-level social factors traditionally home to social psychology , and how this is instantiated through all the cognitive and neural processing lying in between them, interdisciplinary collaboration will be important.
The emerging study of social vision offers an exciting and multilevel approach that may help bring about a more unified understanding of person perception. At the same time, it provides a unique bridge between far-reaching areas of the field, from researchers in social psychology to the cognitive, neural, and vision sciences. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Adams Jr. Adams, R. The Science of Social Vision.
Influence of emotional expression on the processing of gaze direction. Emotion 33, — Amygdala responses to averted vs direct gaze fear vary as a function of presentation speed. Effects of Gaze on Amygdala sensitivity to anger and fear faces. Science , Effects of direct and averted gaze on the perception of facially communicated emotion. Emotion 5, 3—