Paradoxes of Engagement: Less Video, More Understanding

Does widespread video conferencing lead to less trust and lower performance?

The conventional wisdom about video conferencing being a better means to communicate than audio-only approaches (like teleconferencing) had become well established even prior to the pandemic when Zoom and other video-centric tools have become central to the conduct of business, school, and everyday life. Consider this finding from Forbes in 2017:

If organizations want to truly improve their ability to collaborate—if they want to supercharge team performance and gain closer ties to customers, partners, suppliers and similarly critical relationships—they must embrace video conferencing.

Or this totally unsupported claim from Hamid Hashemi, the chief product and experience officer at co-working space company WeWork:

Zoom-friendly conference rooms will make it easier, for example, for salespeople to read attendees' body language during a presentation, he says.

There are similar claims made in management journals, and video conferencing has been widely adopted, in part, because of this almost unthinking boosterism.


What if it’s wrong? What if the video side of video conferencing lowered the performance of communicating groups?

The premise of the effectiveness of video conferencing is simple: being able to see the body language, gestures, and facial expressions of group members imparts a greater understanding of the thinking of the group’s members. After all, it is helpful in face-to-face communication, isn’t it?

Anita Williams Woolley and a group of researchers at Carnegie Mellon have researched the question and found that video conferencing may be distracting, and can impede the synchronization of communication that leads to greater social cohesion in communicating groups. As the researchers put it:

Some empirical research suggests that visual cue availability may not always be superior to audio cues alone. In the absence of visual cues, communicators can effectively compensate, seek social information, and develop relationships in technology-mediated environments. Indeed, in some cases, task-performing groups find their partners more satisfactory and trustworthy in audio-only settings than in audiovisual settings suggesting that visual cues may serve as distractors in some conditions.

Paradoxically, being able to see others on a video call can lead to inequality in talking. Taking turns speaking — also called prosodic or vocal synchrony — leads to higher group cohesion:

Our findings suggest that visual nonverbal cues may also enable some interacting partners to dominate the conversation. By contrast, we show that when interacting partners have audio cues only, the lack of video does not hinder them from communicating these rules but instead helps them to regulate their conversation more smoothly by engaging in more equal exchange of turns and by establishing improved prosodic synchrony. Previous research has focused largely on synchrony regulated by visual cues, such as studies showing that synchrony in facial expressions improves cohesion in collocated teams. Our study underscores the importance of audio cues, which appear to be compromised by video access.

The bottom line might be going old school if we really want to improve group performance through increased collective intelligence. Maybe we need to turn off the video and listen more closely, take turns speaking, and eliminate distractions.

There is a great deal of anecdotal support for these research findings, too:

The 4,000 employees of the education publishing giant McGraw Hill held more than 14,000 Zoom and WebEx calls during a single week early in quarantine, many of them frustrating as people struggled to understand when it was their turn to speak.

However, there may be ways to improve conferencing software solutions. Woolley and company offer some hope:

We may achieve greater problem solving if new technologies offer fewer distractions and less visual stimuli.