Notes: The Human Voice and music.

When creating a piece of music, it is worth considering using the human voice even when the composition has no lyrical content.

If not for the sake of expression, when most music is made, it’s done with an audience in mind. Even if the role of the composition is to be listened to passively, to make something both subtle and engaging is a task on its own. Technical errors in mixing aside, if you are ever struggling with a reaction a composition then it might be worth adding a human voice to the composition even if the ‘human voice’ is completely inaudible. This text is an attempt to explain why and how thats possible, from a psychological reading.


As an overview, research has suggested that the following areas of the brain may be primarily activated when listening to music:

•           Rhythm: The basal ganglia and cerebellum, which are involved in motor control and coordination, are activated when processing rhythm.

•           Melody: The auditory cortex, which is responsible for processing sounds, is activated when processing melody. The prefrontal cortex, which is involved in attention and working memory, is also activated when processing melody. 

Additionally, the superior temporal gyrus (STG) is known to be activated during musical listening, this part of the brain is known to be involved in the processing of auditory complex sounds, melody and harmony, as well as linguistic syntax. 


Music creates an incredible amount of activation. The areas of the brain stated above are disparate, they have completely different functions. The motor cortex, for example is associated with body movement. The PFC is associated with intellectual reasoning. Regardless of this disparity in each area’s function, all these areas are activated when listening to music.

The argument I will be making going forward is this: if a composer was to excite more neural activity (exciting different parts of the brain cohesively) when creating to a piece of music: we could presume the track will sound ‘better’ or at least more engaging/stimulating.   

The suggestion is: the more we are able to engage each of these disparate brain areas stated above, the more of an impact the music will have on an individual. Intuitively, this makes sense. You wouldn’t create a piece of music without rhythm for example; a composition that dismisses rhythm (which would fail affect the motor cortex) is missing an important part of its impact (I am excluding art-concrete, abstract sound design type projects from this argument which actively dismiss rhythm or melody).

Voices & The Brain

This section attempts to give evidence as to why voices/vocals are special when it comes to how we perceive sound and how it’s related to music.

As mentioned, the STG has a role in complex auditory processing. Several studies using different methods such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) have shown that the superior temporal gyrus (STG) is activated when processing human voices and speech sounds. For example, an fMRI study by Scott et al. (2000) found that the STG was activated when participants listened to speech sounds and that the activation was stronger when the sounds were speech sounds of a known language, compared to when the sounds were speech sounds of an unknown language or non-speech sounds. 

Voices are unique to the rest of sound processing. There is a subregion of the STG called the superior temporal sulcus (STS). It is specifically activated when processing human voices and speech sounds. Additionally, other areas of the brain, such as Broca's and Wernicke's areas, are also involved in the processing of language. These regions are not activated when processing just non-language sounds.

A study by Scott et al (2006) found that the superior temporal sulcus (STS) is specifically activated when processing human voices and speech sounds, and that this region is more active when processing more complex speech sounds, such as those found in conversation.

Studies have also shown that Broca's area and Wernicke's area are activated when processing language. For example, a study by Indefrey and Levelt (2004) used PET to show that Broca's area is activated when participants generate speech, and Wernicke's area is activated when participants listen to speech but not for any other types of sound.

These studies and many others provide evidence that there are specific areas in the brain that are activated when processing human voices and language, which are different from the areas activated when processing non-language sounds. Voice processing has a special residency in the brain and the areas activated by instrumental music are different to specifically processing the human voice.

The Impact of the Human Voice.

We have established that the human voice has a novel pattern of activation in cognition, compared to all other sounds. The next thing to note is that this activation is very sensitive. This may be evidence as to why some instruments seem to live on through cultural transformation such as wind instruments that use the same principle of vibrating air into a pitch (just as vocals do). Jazz composer, Charlie Parker, is quoted saying “the saxophone is the closest instrument to the human voice”. Niccolò Paganini, on the other hand, is famously quoted saying "The violin is the most perfect of all instruments, it is the most expressive. It is the one that most resembles the human voice." Arguably, these quotes state an ‘ideal’ in that the human voice is the highest tier of musical expression. It’s quite self explanatory as to why the role of a voice is important from a social argument in expression (language, communication etc.)

Bias towards the Voice.

As mentioned psychologically, we are incredibly sensitive to the human voice with specialised areas of the brain for its processing. In this section we will attempt to give evidence to show the effect of the human voice is so psychologically salient that it can be triggered even unconsciously.

The evolution of human perception is highly attuned to stimuli that is essential for social & survival based behaviours. Communication between species is essential and much like we are attuned to rapidly identifying faces we are attuned to acutely perceive voices. This phenomenon of stimuli sensitivity is seen most notably in the phenomenon of pareidolia. Pareidolia is where we can almost instinctively mis-attribute human faces in ‘noise’ which by wide consensus is attributed to an evolutionary basis. Some fun example are in these photos.    

Similar to the identification of faces, there have been several studies that have investigated the phenomenon of voice pareidolia.

One study by Alvarado, et al. (2008) investigated the phenomenon in people who were listening to white noise, they found that people were more likely to perceive voices in the noise when they were in a state of heightened awareness and when they were expecting to hear a voice.

Another study by Leudar, et al. (2010) used a similar approach and found that people are more likely to perceive voices in noise when they are primed to expect them, and that this effect is stronger in people who score high on measures of "schizotypy," a personality trait associated with a heightened risk of developing psychosis.

 Additionally, a study by Da Silva, et al. (2019) investigated voice pareidolia in patients with schizophrenia and healthy controls, the study found that patients with schizophrenia had a higher rate of voice pareidolia than the healthy controls. As a side, voice pareidolia is not a significant symptom to assume the diagnosis of psychosis but rather these studies are illustrative that the human brain is attuned to voices and is so sensitive to the human voice that a disorder can easily create such an illusion. Voice pareidolia is likely to have the same evolutionary basis as the more commonly researched “face pareidolia”.

Takeaways for music production.

When it comes to music production then, with this evidence, it can be suggested, that even the slight hint of something ‘voice-like’ is enough to engage new parts of cognition in reaction to music. There are specific areas of the brain that light up to something ‘voice-like’ that can be related to our love for certain instruments. If we are to agree with the original argument stated that more neural activation creates ‘better’ or at least more engaging/stimulating music then ‘the voice’ even in its most abstract form is a candidate to achieve this.

Our sensitivity is so acute that a composer could create a synth using a vocal sample, which could sound inaudibly human. Our incredibly attuned cognitive processing would produce a more activate network as a result. Arguably, even using a degraded, processed and warped vocal sample would be ambiguous enough that areas like the STG/S would come into the play. The orchestra of brain activation would be expanded when listening to a piece of music with some sort of vocal element.

In summary, using any type of vocal: whether it be a long lyrical performance, a short phrase or vocal chop in an electronic track, or a human voice in its most ambiguous processed form can have a lot more impact on an instrumental piece than you may have first considered.

Previous
Previous

Notes: Exploring the death of ‘the aura’