Cerebrum Article

In Search of the Musical Mind

In all cultures, throughout history, music has accompanied human life. But what is it we hear, sometimes with great appreciation and skill, sometimes not? Is it only training that separates the opera fan from the “tin ear”? What about that mysterious ability called “absolute pitch”?

McGill University psychologist and former professional musician Daniel J. Levitin explores the brain’s complex systems for processing music and asks whether music may be a unique human evolutionary adaptation that serves some basic and ancient functions. With a nod to Herman Melville, join him as he sets out for “The Great White Hall.”

Published: October 1, 2000

As I strolled down a nondescript back street not far from Carnegie Hall, I heard a taxi horn blare. Not an uncommon sound in Manhattan, mind you, but the accompanying response caught me by surprise. Seemingly before the horn blare even ended, I heard an authoritative voice call out “E-flat.” I turned to see a black-tied and -tailed young man, carrying under his arm what appeared to be a violin case. His companion, a woman with a case about the size of a flute, added, “But ten cents sharp!” I saw them enter the back door of a large building, not realizing it was The Great White Hall itself.

Hurrying around to the front of the hall, I lucked on a single ticket for the evening’s performance of The Rite of Spring. I had played this piece as the bass clarinet player in our high school orchestra, but I had never heard it from a vantage point that would afford me the opportunity to distinguish something other than my own part mingled in with the staccato sounds of the trombones, whose bells were always directed, menacingly, at the back of my skull.

Once inside the hall, I listened carefully to the conversations of the audience around me. “I hope they don’t take it too fast—I saw Cincinnati do it last season, and they had way too much tempo!” “Do you think there’ll be an intermission? I can’t see the program—they have to have an intermission, don’t they?” “I love the opening movement. It really makes me think of springtime when I was a little girl, lying in the wildflower meadows.” The man next to me said to his wife, “This isn’t one of those atonal pieces, is it?” A pair of teenagers, covered with body piercings and henna tattoos, compared Stravinsky to their favorite metal band, Metallica: “Listen to his use of modes—it’s very Goth!” A young woman offered enthusiastically, “I love the part where the timpanis play,” and her companion answered “Which ones are the timpanis again?” Then he whispered to her, thinking no one else could hear, “How can you hear one instrument when so many are playing?”

Seven audience members, and each heard something very different in the same complex piece of music: not surprising. But even listening to a few simple, isolated tones, few of us could replicate the precision with which the two musicians named the sound of the taxi horn.

Years ago, my undergraduate adviser, Roger Shepard, conducted a study in auditory perception at the Bell Telephone Laboratories in Murray Hill, New Jersey. When Roger played his co-workers a semitone—the smallest interval in Western music, equal to the distance between two adjacent keys on a piano—he found that half the people could not tell him whether he had played the same note twice in a row or two different notes. A semitone opens the well-known piano piece by Beethoven “Für Elise,” which is heard throughout the Western world with great frequency at children’s piano recitals. Its first five notes form a repeating pattern of a note followed by a (semitone) step down, then back to the first note. If half the people listening to (and presumably enjoying) “Für Elise” cannot tell that the five opening notes are not the same, what are they hearing?

This is one of many questions now being addressed scientifically in the emerging field of music cognition and perception. By nature, this field is interdisciplinary, with research scientists joining the endeavor from cognitive psychology, neuroscience, computer science, musicology, and education. They are asking, for example, if we can ever know whether what I hear in music is the same as what you hear. If we all hear things the same, it is difficult to account for why one man’s Madonna is another man’s Mozart. Yet if we all hear things differently, how to explain that certain pieces are popular with almost everyone?

Why is it that some people in our culture can be moved by music and others cannot? For some, a day without music is unthinkable; music accompanies them as they wake, while they shower and eat, in their cars on the way to work, and in the background as they work. It may set the mood for romantic encounters, or energize athletic workouts. Music is used during times of war for patriotic solidarity or for synchronization of infantry, during times of sadness for comfort, during religious ceremonies from the solemn to the joyful and everything in between. Music’s unique status in the lives of humans is marked, as David Huron of Ohio State University says, by its ubiquity and its antiquity. There is no known culture in the world that lacks music, and some of the oldest human-made artifacts ever found are musical instruments (for example, bone flutes and drums). Indeed, music making predates agriculture in the archeological record.

What is the evolutionary basis of music, and what has science learned about music and the brain?

What We Hear in Music

My own work is directed at answering the fundamental question that I first raised: What is it that different people hear when they hear the same piece of music? The theoretical issues that motivate this research are my interests in the fidelity of sensory memory and the relations among how we perceive, classify, and remember what we hear. A century ago, the Gestalt psychologists believed that sensory experiences leave a “residue” in the brain’s memory system that contains information about the original stimulus, even after that stimulus is gone. Even if this is not literally true, specific features of sensory events might well be recorded in long-term memory. One way I have tested this idea has been to probe what people remember about the music they have heard and, specifically, what they are able to recall or produce of music they know well and like.

When we hear music, we are actually perceiving seven different attributes, or “dimensions”:

  1. Pitch is a purely psychological construct, related both to the actual physical frequency of a particular tone and to its relative position in a musical scale. It provides the answer to the question, “What note is that?” (“It’s C-sharp.”)
  2. Rhythm refers to the durations of a series of notes. For example, in the ditty known in America as “shave-and-a-haircut, two bits,” the rhythm is long-short short-long-long-(pause)-long-long.
  3. Tempo refers to the overall speed or pace of the piece.
  4. Contour describes the overall shape of a melody, taking into account only the pattern of “up” and “down.”
  5. Timbre is that which distinguishes one instrument from another—say, a trumpet from a piano—when both are playing the same written note. It is a kind of tonal color that is produced by overtones from the instrument’s vibrations.
  6. Loudness is a purely psychological construct that relates (nonlinearly and in poorly understood ways) to the physical amplitude of a tone.
  7. Spatial location is a cue we interpret based primarily on time and spectral differences in what we hear.

These attributes are separable. Each can be varied without altering the others, allowing the scientific study of one at a time, which is why we can think of them as dimensions.

Melodies are defined by the pattern or relation of successive pitches across time; most people have little trouble recognizing a melody that is played in a higher or lower key than expected. In fact, many melodies do not have a “correct” pitch; they just float freely in space, starting anywhere. “Happy Birthday” is an example of this. One way to think about a melody, then, is as an abstract prototype, which is derived from specific combinations of key, tempo, instrumentation, and so on. A melody is an auditory “object” that maintains its identity under certain transformations, just as a chair maintains its identity when you move it to the other side of the room, turn it upside-down, or paint it red. So, for example, if you hear a song played louder than you are accustomed to, you can still identify it. If you hear it at a different tempo, played by a different instrument, or coming from a different location in space, it is still the same melody. Of course, extreme changes in any of these dimensions will render it unrecognizable; a tempo of one beat per day or a loudness of 200 decibels might stretch the limits of identification.

“That’s a C-Sharp”: Absolute Pitch

People with a special ability called “absolute pitch” (AP) can tell us something about how the human brain processes melodies and pitch. By definition, those with AP are able to produce or identify tones without reference to an external standard. If they hear a car horn, they may say, “That’s E-flat!” In contrast, if you play a tone from the piano and ask people what you played, most cannot tell you (unless they watched your hand). People with AP can reliably tell you, “That was a C-sharp,” and some can even do the reverse. Ask them to produce a middle C (the center key on a piano keyboard), and they will sing or hum or whistle it for you. Those with AP have memory for the actual pitches in songs, not just the pitches in relation to one another. In fact, when most of them hear a song in a different key (and therefore with different pitches), it sounds wrong to them.

The ability to recognize and identify absolute pitch presents the research scientist with two opposing puzzles. First, why are some people able to do this? Since melodies are defined by relative pitches, why do some people have the ability to track the absolute pitches—information that has no apparent value? Understanding speech virtually requires that we ignore absolute pitch information; if we did not, we would not be able to understand children, who speak an octave or two higher than do adults.

A contradictory puzzle arises when we consider that the auditory system, from the cochlea in the ear up to the cortex of the brain, contains neurons that respond only to specific frequencies. Our ears and our brains are indeed registering absolute pitch information at every stage. The second question then becomes not “Why do some people have absolute pitch?” but rather “Why doesn’t everyone?” After all, as psychologist Dixon Ward was fond of pointing out, we do not have to run to a picture of a rainbow to say that a rooster’s comb is red, or run to a bottle of camphor to identify the odor of a skunk. Why, then, if someone plays us a note, do most of us have to run to the piano to figure out what note it is?

One annoying aspect of the scientific literature is that the term perfect pitch is often used interchangeably with the term absolute pitch. It is important to maintain a clear distinction between these two capacities, which we now know are unrelated. Some people have the ability to distinguish small differences in intonation, or “in-tuneness,” when comparing one tone to another—they have a “good ear.” Some call this ability perfect pitch because these people can tell whether or not two tones are perfectly in tune. People with absolute pitch cannot necessarily make this distinction better than others, as Ed Burns of the University of Washington has shown and as I have confirmed in my own laboratory. So those with AP are not especially accurate at tonal discrimination. What they are able to do is label tones precisely.

In reviewing the literature, I found that part of the mystery surrounding AP revolves around its rarity: It is estimated to occur in only 1 out of 10,000 people. As I thought about this strange statistic, I realized that it clearly didn’t mean that 1 out of 10,000 musicians has AP; it is not that rare. Rather, this estimate appeared to be constructed from an estimate of how many people in the population had AP when one simply counted the number of musicians who had it, and then considered the proportion of musicians in the total population. I had read the classic How to Lie with Statistics by Darrell Huff, and something just did not seem right about this kind of statistical juggling.

Then I realized that, quite naturally perhaps, the hundred years’ literature on AP was filled with studies that tested only part of the population: musicians. There is an obvious reason for this: If you ask most non-musicians to sing an E-flat or a C-sharp, they will not understand what you are talking about, since they have not learned this specialized vocabulary. This does not necessarily mean they don’t have AP, or at least something like it. The challenge, then, is to design a test to determine whether nonmusicians could demonstrate AP capabilities.

Remembering and Labeling

Creating this test turned out to be easy. First, I proposed that absolute pitch involved pitch memory and pitch labeling. I defined pitch memory as the ability to hear a note and remember that you have heard it somewhere before. Pitch labeling is the ability to apply a meaningful label to that knowledge, such as “C,” “321 Hertz,” or the solfège syllable “Do.” Maybe the only thing separating AP musicians from my hypothesized population of AP non-musicians is that the latter group, having never learned the proper vocabulary for labeling tones, lacks pitch labeling, while both groups have equivalent pitch memory ability.

To test this, I asked 50 college undergraduates simply to sing their favorite rock song from memory, in an effort to understand what it was they were retaining and attending to about the music and, in particular, to see how good their pitch memory was. I limited the music to rock songs because they have several peculiar qualities that make them ideal for such an experiment. To begin with, most rock songs exist in the world in only one specific version. Unlike “Happy Birthday” or “Michael, Row Your Boat Ashore,” which are sung in many different keys and have no one “official” key, rock songs are typically recorded by one particular group, and that is the version everyone knows. This is really an experimenter’s dream: people have heard the stimulus hundreds or thousands of times in the same key. Moreover, they learn the song on their own, outside the laboratory, and with strong motivation (remember, they were asked to sing their favorite song). Naturally, some rock songs don’t meet these criteria; The Beatles’ “Yesterday” or Stevie Wonder’s “You Are the Sunshine of My Life,” for example, have been recorded many times by different artists and in many different keys and styles. Such songs were eliminated from the experiment.

The results were remarkable. Most of the students were able to produce songs at or very near the actual pitch used in those songs. One-fourth sang their favorite songs from memory within one semitone of the correct pitch, and two-thirds came within two semitones of the correct pitch. Moreover, they retained the tempo and many of the nuances of the piece, such as affectations of the vocalist and peculiarities in phrasing. This provided strong evidence that many nonmusicians do have accurate pitch memory and something very much like absolute pitch. Although they did not know the formal note names, they did demonstrate the striking ability to use ad hoc labels, derived from the lyrics of the song. In other words, although they might not be able to sing an A-sharp if requested, they were able to skillfully use other, informal names, such as “that’s the first note of the song ‘Hotel California.’ ”

Using a less restrictive definition of absolute pitch such as this would bring the incidence of it in the population from 1 in 10,000 to more like 1 in 4. Moreover, this test is rather conservative, relying to an extent on a person’s vocal abilities. Some people in the experiment, since they are not experienced singers, might have had difficulty matching the notes in their head and made errors because of this. So, if anything, 1 in 4 is an underestimate of the true incidence of accurate pitch memory in the general population. This study also provided confirming evidence for the theory that absolute pitch has two components, memory and labeling.

In another recent study, Tonya Bergeson and Sandra Trehub of the University of Toronto asked mothers to sing songs to their infants on two occasions, one week apart. Almost half of the mothers sang in the same key both times, providing additional evidence for the stability of auditory memory.

Acquiring Absolute Pitch

Many researchers in music cognition now adopt this two-component theory and think of absolute pitch as nothing more mysterious than a particular ability to label what most people remember. Robert Zatorre and his colleagues at the Montreal Neurological Institute performed an extraordinary neuroimaging study that should settle any lingering doubts about this. Subjects with and without AP attempted to identify pitches and musical intervals while their cerebral blood flow was monitored. The AP possessors (but not the nonpossessors) showed activity in the left posterior dorsolateral frontal cortex, a region of the brain thought to be involved in conditional associative learning—the technical term that memory theorists use to mean “attaching labels to things.”

Now a satisfactory theory of pitch labeling must account for the reasons some people have the ability and others do not. Given that information about absolute pitch is encoded throughout the auditory system in frequency-sensitive neurons, why is it that most of us do not associate labels with tones? I believe it is partly because we never learned to do so, since the pitch of a tone does not have any special biological or ecological importance.

Color is an example of a perceptual domain with clear biological importance. The color of something can reveal important information about it: for example, whether it is food and whether it is fresh or spoiled, edible or poisonous. For the most part, color perception is believed to be the same for all members of our species and throughout different cultures. Members of widely different cultures, with different ways of naming colors, still agree on the best example of the major color categories, according to research by Brent Berlin, Paul Kay, and Eleanor Rosch. For example, the Dani tribe of Papua New Guinea have only two color terms, mola and mili, corresponding roughly to “light” and “dark.” But when shown an assortment of color chips, the Dani agree with Westerners about the best example of the color red, even though their language has no specific word for that color. A new study by Debi Roberson at the University of London, however, has challenged this initial understanding, suggesting that the issues of color perception are more complex.

Pitch is entirely different. Since different cultures use different musical scales, there are no known cross-cultural musical universals for pitch. The pitch of an object in the world has less ecological relevance than its color, since it does not usually reveal important properties of the object. This relative lack of salience for pitch may conspire against its easy acquisition.

While there have long been debates about whether AP is innate or learned, the emerging consensus among psychologists is that the ability to remember and label pitches requires activation and training during a critical period in a child’s development, analogous to the critical period required for acquiring language. During this time, the child must learn to put labels on tones. The preliminary evidence is that this critical period runs roughly from birth to age eight.

When a child is acquiring language, parents often point out objects to their children and say things such as “See that apple? That’s red.” Have you every heard a parent teach auditory labels, such as, “Hear that doorbell? That’s B-flat”? AP is acquired and developed through early systematic training, although as adults we may not necessarily remember the specific instance of learning the label for a particular tone, any more than we remember when we learned to call the apple red. What we do not know yet is whether every child who receives training in labeling pitches develops absolute pitch.

Is AP Genetic?

Although most psychologists and music theorists support the hypothesis that AP is learned, a debate has emerged with geneticists and others who claim a genetic basis for AP. The extreme version of the genetics argument is that some people are simply born knowing how to label pitches. Advocates of this view also hold that genetic endowment creates child prodigy pianists who can play Mozart sonatas by looking at the score once or compose their own music the first time they sit at a piano. Most scientists believe this view is untenable; how could one explain a child being born knowing how to read musical notation or understanding the relation between piano keys and the sounds they produce? In the case of absolute pitch, babies do not emerge from the womb speaking language, and it is no more likely that they would magically produce pitch labels in the first days or weeks of infancy.

A less radical form of the argument is that a combination of factors, including genes, simply creates a predisposition for the AP ability. Advocates of this view, including a team of geneticists at the University of California, San Francisco, led by Siamak Baharloo and Nelson Freimer, are searching for an absolute pitch gene. One difficulty this view encounters is the notion of just what such a gene would control. Would it be specifically for attaching labels to pitches or would it be for attaching labels to sensory stimuli in general? As we have already seen, AP is not a difference in perception, it is a difference in labeling, or perhaps in coding.

The question of whether AP is genetic has been at the heart of discussions for the last century, partly because certain prominent AP possessors (for example, Berlioz, Scriabin, and Toscanini) considered it the ultimate in musical endowment. I believe both biology and learning are involved; the difficulty is in separating the relative contributions of nature versus nurture. For example, people with a genetic predisposition for AP might acquire it more easily but, in my view, would still need some training.

Baharloo and his colleagues have studied the genetic history of families with and without AP to make inferences about its genetic basis. They make two explicit assumptions that many of us in the cognitive science community feel are erroneous. First, they state that AP represents an unusual ability in perceiving pitches, although cognitive scientists tend to believe that AP is more a skill in labeling, which is a form of classification and long-term memory. They also assume that AP is developed through musical training, but this does not appear to be true, either. It is not musical training in general, but some deliberate training to name pitches, that develops absolute pitch.

The goal of most musical training is, in fact, contrary to that of absolute pitch training, because it teaches children to attend to relational features of melodies, not the absolute features. In fact, the late Dixon Ward popularized a theory of absolute pitch claiming that all musicians start out with AP, but then “unlearn” it. As one becomes a better musician, one becomes trained to abstract out melodic patterns at the expense of absolute pitches. Classical and jazz training, especially, emphasize playing scales, chord progressions, and themes or melodies in every key.

The Baharloo team reported that a child with AP is more likely to have a sibling with AP than is a child without AP. They claim this to be strong evidence for its genetic basis. But the mere fact that an ability runs in families does not guarantee that it is genetic; only studies of identical twins can provide a definitive answer. One could say that speaking French runs in families too, but many are reluctant to propose a genetic basis for something that parents teach to their offspring. Likewise, families in which a parent has AP are going to be more likely to provide the type of environment in which a child can develop AP. The late Lloyd Jeffress is the most articulate spokesman of this view, as expressed in his 1962 letter to the editor of the Journal of the Acoustical Society of America:

The very circumstances which have caused people to believe the trait to be inherited are those which would bring about its “imprinting.” The children of people having absolute pitch are sure to be examined early for the existence of the trait and their first fumbling steps rewarded. In a home where the parents cannot tell “C” from coal scuttle, no such hospitable environment for growth of the trait will exist . . . only in the home of musical parents could absolute pitch develop; where the parents have absolute pitch it is almost sure to.

Identifying the Notes of the Scale

Possessors of AP are often used as examples in the more general study of human information processing because it is claimed that they violate the so-called “law of 7 plus or minus 2,” established by George Miller in 1956. There are only a few actual laws, or rules, in psychology, so when one is apparently violated, it makes for big news. Miller stated that under most circumstances there are limits to human information processing; we cannot consistently place items into more than five to nine categories without exceeding this capacity. People with AP would appear to violate this principle because they can classify more than 60 stimuli (5 octaves of 12 tones each). There are 88 keys on a standard piano keyboard, and people with AP can usually name all but those at the far ends of the keyboard.

In fact, though, people with AP make errors in identifying the octave in which a particular tone falls. For example, they are adept at identifying C but will often confuse middle C (the center key on a piano keyboard) with the higher or lower C on either side of it. With all these octave errors—not to mention semitone errors and the more rapid and accurate identification of the white notes of the scale over the black notes—it is not fair to claim that AP possessors are classifying anywhere near 60 stimuli without errors. The octave errors bring the skill down to identifying the 12 tones of the chromatic scale, and the other errors mentioned put the typical AP possessor well within observance of Miller’s law.

On the Road to Real Music

What has all this got to do with real music? I think AP provides a window into the brain and how music is stored and represented there. Pitch is one of the basic building blocks of music and, as we have seen, many people are skilled in remembering pitches even if they have not learned to label them.

The scientific tools with which we study pitch are useful in looking at other aspects of music; and great progress has been made in recent years in understanding more general issues surrounding music cognition, musical ability, and the evolutionary and neural underpinnings of music. The discovery that so many people have accurate pitch memory reveals much about the inner workings of memory and the mental codes used to represent perceptual events. All this puts us closer to understanding the links between brain and behavior, between the physical world and our mental representations of it.

Clues to the Musical Brain

Our knowledge about the neuroanatomical underpinnings of music comes mostly from two sources: scans of cerebral blood flow and lesion studies conducted by clinical neurologists and cognitive neuroscientists.

Until recently, scientists learned the most about the brain by studying patients with brain damage. Sometimes nature conducts a cruel experiment, and tumors, stroke, disease, or developmental disorders produce lesions in specific parts of the brain. By comparing the functioning of the patient before and after the lesion, or by comparing the patient with a control subject who has no lesion, we can make inferences about the role that the damaged brain region plays in normal cognition. Other times, it is not nature that performs the experiment, but humans. Sadly, many of the great advances in neuropsychology have come from studying soldiers injured during wartime by bullets that damaged a particular part of the brain.

One scientific problem with lesion studies is that no two lesions are identical (nature seldom conducts controlled experiments), so one is put in the weak scientific position of having to generalize from a single case. To make inference even trickier, no two brains are exactly alike to begin with; they are different in size, shape, and even organization. Moreover, lesions, whether occurring naturally or otherwise, seldom respect the anatomical boundaries of brain regions. That is, a lesion typically will not affect a well-defined area of the brain without also affecting some surrounding areas, or leaving parts of the well-defined area intact.

Patients undergoing selective removal of brain tissue to control epilepsy provide another source of information. The problem with scientific inference in these cases is that many parts of the brain may be damaged as a result of the epilepsy, and distinguishing this damage from the lesion damage can be difficult.

Clinical neuropsychology is a lot like detective work; the clues are usually hidden. Since Hans-Lukas Teuber, one of my teachers, first described the approach in the 1950s, the most compelling arguments in neuropsychology have been made through the existence of what are called double dissociations. A double dissociation occurs when two patients (or groups of patients) exhibit completely complementary deficits. To take a concrete example, we now know that the visual location of objects and certain features such as their color are processed in different parts of the brain because (1) patients with lesions in one part of the brain (along the dorsal pathway of the visual system) can perceive color fine but are impaired in their ability to perceive location and motion, and (2) patients with lesions in a different part of the brain (the ventral pathway of the visual system) have intact location and motion perception, but impaired color perception. The two cognitive abilities in question were shown to be dissociated, or separate, by virtue of one being spared when the other is impaired.

In the musical domain, Isabelle Peretz and her colleagues at the University of Montreal have used such double dissociations to argue that speech and non-speech sounds (as simple as a doorbell or as complex as music) are functionally independent in the human brain. They also believe that music and environmental sounds are each served by distinct neuroanatomical centers. Many cognitive neuroscientists believe that music is indeed an independent or modular neurocognitive system. The overall independence of musical function from other functions in the brain has been clearly observed, but we still do not know the neuroanatomical site for processing the various components of music. Researchers have been able to show separate impairments in melody, rhythm, meter, tonality, and timbre, suggesting that these are independent subcomponents of the music recognition system.

Possible specific sites for musical processing are beginning to be identified. Robert Zatorre and his colleagues have found that right temporal lobe lesions tend to affect perception of melodies more than left temporal lesions. They have also observed that perception of pitch patterns appears to involve activation of the right temporal region in normal volunteers. A. R. Luria found that when damage occurred to left auditory secondary association areas, patients had severe deficits in perception and reproduction of temporal patterns (arrhythmia), but melodic and timbral processing were preserved. Brenda Milner found deficits in tone and timbre perception following right temporal lobectomy, with relatively preserved rhythm.

Peretz also discovered that the right hemisphere of the brain contains a contour device that, in effect, draws an outline of a melody and actively analyzes it for later recognition. Through her study of a neurological patient (known in the literature by the initials CN), she has discovered a case of pure music agnosia (inability to recognize music), a patient with bilateral temporal lobe lesions who can no longer recognize once-familiar songs in explicit memory tasks, yet retains implicit memory for well-known tunes. Although she claims not to recognize them, when forced to guess which of two songs is familiar to her, she “guesses” with much greater accuracy than pure chance would allow.

Learning the Language of Music

John Sloboda at the University of Keele in England is one of the leading figures in music cognition and perception. Both an accomplished vocalist and a psychologist by training, he has probed how people gain musical competence. His research has shown that infants are extraordinarily adept at choosing between well-formed and ill-formed musical sequences in the music of their culture, and that this “grammatical” understanding of music’s syntax and semantics parallels the child’s ability to discern the structural regularities of spoken language.

Those who follow the lead of Noam Chomsky in linguistics and psychology argue that we are all born with an innate capacity to learn a language, and this capacity is incorporated in a “language module” in the brain. Infants come equipped with a brain template, hardwired in such a way that it can adapt to and learn any language—even a nonspoken one, as Ursula Bellugi of the Salk Institute has shown with American Sign Language. That is, we are not born with any predisposition to speak a particular language; through exposure, our brains conform to the structure of the specific language we hear. Sloboda suggests that there may be a similar music module that comes pre-equipped to discern the musical structure and musical grammar of an infant’s culture, provided exposure occurs during a critical period.

The notion of “critical periods” may also go far to explain musical competence. Almost all world-class musicians began their training at a very young age. It is unlikely that the reason for their success is simply that they had more hours of practice; what is more likely is that there are critical periods of development when the rapidly growing brain is receptive to making the types of new connections needed to incorporate musical thinking into the nature of thought itself—a process that makes musical thinking as automatic as walking or talking.

Musicians who learn late often play music “with an accent.” I know this from personal experience. Although I played guitar professionally for many years, I did not begin learning to play the instrument until I was 16. Friends who are professional guitarists tell me that I just do not sound “natural,” although non-guitarists don’t seem to notice the difference as much.

Sloboda also believes that one critical component of the developing musical mind is the ability to grasp the internal structure of music, analogous to the way master chess players have a deep structural understanding of chess moves and the interrelationship of pieces on the board.

The Evolutionary Basis of Music

Although not everyone is a musician, virtually all of us have music in our lives. This suggests that in our evolution, the basis for music is very old. It may even have preceded spoken language. So, then, what is its purpose? A current debate has Steven Pinker of the Massachusetts Institute of Technology claiming that music is merely “auditory cheesecake,” an evolutionary accident piggybacking on language, and Ian Cross of Cambridge University, Sandra Trehub of Toronto, and others arguing for the evolutionary adaptiveness of musical behaviors.

David Huron sides with Cross and Trehub in this debate, arguing that music promotes social bonding among members of a culture and citing my research program on Williams syndrome (conducted with Ursula Bellugi). Bellugi and I have found that individuals with this neurodevelopmental genetic disorder are unusually social, and, in spite of gross deficits in most cognitive functions, have relatively normal musical abilities. People with another disorder, Asperger autism, are generally unsocial and unmusical, presenting an intriguing double dissociation of musical and social abilities with Williams syndrome. Both disorders have a genetic basis. This, Huron argues, is strong evidence for a genetic component that influences both musicality and sociability. With the mapping of the human genome, the resolution of this issue may be near. The genes involved in Williams syndrome (on chromosome 7) are being probed in great detail by members of Bellugi’s team, led by geneticist Julie Korenberg at the University of California, Los Angeles.

Huron believes that although the evidence in support of music as an evolutionary adaptation may not yet be strong, the idea is plausible. First, complex evolutionary adaptations occur over many millennia. Music making, being one of the oldest documented human activities, qualifies. Second, evolution must express itself through genes, which, in turn, are expressed in the body by means of proteins. Musical experience influences and is modified by natural biochemical substances in the body, so music satisfies a basic biochemical requirement of the theory. Finally, specialized behaviors that evolved are typically associated with specific neuroanatomical sites; we have seen through double disassociation studies that music appears to have such a basis.

The Ancient Functions of Music

I asked earlier why music moves people, why some people spend so much time and money seeking musical pleasures. Musical pleasure is not merely subjective (although if it were, that would be enough); many studies have demonstrated biochemical and electrophysiological changes in response to hearing music. Some researchers are finding that listening to familiar music activates neural structures deep in the ancient, primitive regions of the brain, the cerebellar vermis. For music so profoundly to affect this gateway to emotion, it must have some ancient and important function. Although no one yet knows for sure what that is, I can speculate.

Communication. I believe music moves us because it mimics our species’ primitive communicative calls. Whether our hominid ancestors spoke or sang first is not critical; the prosody, rhythm, and contour of music may stir in us an evolutionary echo of early communications.

Expectation and Timing. Through its rhythm, music stimulates primitive neural timing mechanisms. Music affects us because the intrinsic structure of a piece of music, which in Western music is based on repetition, sets up expectations in the listener. Temporal patterns that have already occurred suggest to us new temporal patterns that will occur, and these patterns also carry melody and harmony. Music unfolds over time. Music without a pulse, without temporal regularity, and without temporal expectations, is virtually unknown. It is no coincidence, then, that the cerebellar vermis, a part of the brain that has been identified as involved in perceiving music, is connected with the perception of timing.

Why would timing and emotion be based in the same part of the brain? If we go back through evolution, we notice that in simpler life forms timing is the most fundamental property learned through conditioning and habituation. Single-cell organisms are very sensitive to timing. For example, they soon habituate to being poked if the poking is done with rhythmic regularity. Animals with a more complex brain with a cerebral cortex use timing in conditioning and associative learning.

The most famous cases come from animal research on the avoidance of pain and elicitation of certain behaviors for food rewards. Even lower animals can learn precise schedules in order to obtain rewards or to avoid punishment. The one thing we find from the conditioning literature is that animals are exquisitely sensitive to the time between a sensory event and its consequence. For example, both pigeons and rats have learned to make different choices when the difference in timing of a stimulus was on the order of mere milliseconds.

Music moves us because, of all human activities, it is the one with the most temporal regularity. One could argue that both rock and roll and rap music, which feature constant pulses (very few tempo changes) have lasted many years, in spite of the predictions of pundits who thought they were ephemeral, because of the timing regularities, the pulse. Music moves us because it is rhythmic.

Patterns. One of the strongest human drives is to find patterns in the environment. If you have ever stared at the random dots on an acoustical tile ceiling or clouds in the sky, you have seen that all kinds of patterns seem to emerge where none really exist. Our brain is constantly trying to make order out of disorder, and music is a fantastic pattern game for our higher cognitive centers. From our culture, we learn (even if unconsciously) about musical structures, tones, and other ways of understanding music as it unfolds over time; and our brains are exercised by extracting different patterns and groupings from music’s performance. Patterns emerge, regroup, repeat, and fold in on themselves in many interesting ways. Beethoven’s “Moonlight Sonata” moves us because each time we listen to it, we hear it slightly differently, depending on the performer, our mood, the people who are with us. The notes of the composition form a foreground, the spaces between them a background, and our minds work actively to make links between them, to group the music into phrases, to predict what will come next. The ever-vigilant brain becomes excited when we encounter even subtle violations of our expectations.

Returning to Shore

There is music in everyday life, even where it is not explicit. There is music in the waves crashing on the shore, in the seagulls calling above the din. Oxygen nourishes our cells through blood pumped by our own cardiac timekeeper, beating out metronomic rhythms. The wind conducts a percussion ensemble of leaves blowing, trees rocking, and branches breaking. The stars trace a pattern in the night sky as complex and interdependent as the harmonies in Mahler’s Fifth Symphony.

Although I have speculated why music moves us, the truth is that we do not know very much about its origins. Even to say that music is a uniquely human invention is somewhat controversial, although the weight of evidence suggests that this is true. Based on the archeological record, music has been with our species for a very long time—as long as anything else for which we have evidence. Its ubiquity and its antiquity demonstrate its importance to us. Mothers in every known culture sing songs to their infants, making music one of the newborn’s first experiences.

I believe the study of music is of central importance to cognitive science, because music is among the most complex of human activities, involving perception, memory, timing, object grouping, attention, and (in the case of performance) expertise and complex coordination of motor action. Consequently, the scientific study of music has the potential to answer fundamental questions about the nature of human thought and the relations among experience, mind, brain, and genes.

Near the end of Moby Dick, Ahab stares dimly at his own shadow in the ocean, watching it lose form in the depths of the sea. Perhaps we are all like Captain Ahab and Mr. Starbuck, in search of a great mystery that can never be explained fully, a dynamic, beautiful, powerful creation that cannot be understood while in motion, yet can never be captured. Our colleagues studying text and vision look at basic components, phonemes and primitive shapes, to better understand the dynamics of language and visual systems. Music is, of course, more than just pitch and frequencies, and indeed our experience of it is undoubtedly more than the sum of pitch, frequency, tempo, rhythm, meter, timbre, and so on. In studying any of these in isolation, we may—as the philosopher Alan Watts was fond of saying—be trying to study the river by looking inside a bucket of water we drew from it. This is the same challenge faced by researchers in physics and other disciplines: how to perform methodical and rigorous experiments on a complex, dynamic system. Through the interdisciplinary efforts of colleagues at every level of inquiry, from cellular function to cognitive psychology, the coming years promise to bring us closer than ever not just to illuminating the nature of music and mind, but also, perhaps, even to understanding why we are compelled to do so.