3rd Conference
The Evolution of Language
April 3rd - 6th , 2000




The singing origin theory of speech.

Dr. John R. Skoyles

6 Denning Road, London, NW3 1SU

Primeval man, or rather some early progenitor of man, probably first used his voice in producing true musical cadences, that is in singing, Charles Darwin (1871, p. 133).

Language originated as play, and the organs of speech were first trained in this singing sport of idle hours, Otto Jespersen (1922, p. 433).


Since the eighteenth century and people such as Rousseau, Diderot, Rameau and Condillac (Downing, 1995), song has been linked to the origins of speech. Increasingly, the idea has been proposed in the twentieth century (Jespersen, 1922; Marler, 1970; Geist, 1978; Richman, 1993; Vaneechoutte & Skoyles, 1998). Highly developed thoracic respiratory control underlies our ability on a single out breath to create multiple strings of vocalisations accurately timed and synchronised with complex vocal tract movements. This ability is notably absent in our close relatives such as chimps (Provine, 1996, p. 40). Thoracic breathing control has been shown by Ann MacLarnon and Gwen Hewitt (1999) to have originated between 1,500,000 and 100,000 years ago. Amongst untrained singers, the respiratory adjustments used in singing are similar to those used in normal or loud speaking (Hixon & collaborators, 1987, p. 361). Those exploring the evolution of breath control and related vocal changes link them to speech (for example: Kay, Cartmill & Balow, 1998; MacLarnon & Hewitt, 1999). However, these changes could have evolved first to enable singing, and only then by the addition of vocabulary and syntax become used for speech. Here I argue for this latter theory.

Song vs. speech

Song and speech reflect respectively two different types of communicable information: (1) identifiable repetitions such as rhythm, melody, stressing and intonation; and (2) syntactically structured sequences of symbolic tokens. Song information can be used to: (a) create and display identity; (b) synchronise relationships between singers and their listeners; and (c) provide a 'carrying' structure to enable the transmission of higher levels of information. Due to (a) and (b), song tends to be used to create and maintain social attachments such as pair-bonds (breeding birds often bond through duetting), and group-bonds (for example, in birds that engage in chorus singing), Moreover, due to (a), it can create a recognised link – 'ownership' – between a singer and a resource (such as a territory). These functions are primarily ‘limbic’ concerned with inducing behaviour and emotion in other animals.

Speech information comes from the capacity of symbols when combined syntactically to describe things and narrate events. Such information is referential rather than generative of a relationship between a speaker and their listeners (though what is described can at a referent level involve them). While what is described might cause behaviour and emotion, they are not caused directly by the symbols but the message they encode – speech is primarily ‘cortical’. Because they communicate different types of information, song and speech can easily be blended together as in poetry, chant and pray.


Logical arguments about precedence underlie important areas of science, for example, the conjecture that RNA life arose before life based upon proteins and DNA (Freeland, Knight & Landweber, 1999). Similar precedence arguments apply to why human song preceded speech. For humans to sing requires: (i) the capacity to produce and learn repetitive patterns, and (ii) the thoracic control of expirations to enable long sequences of different tones and articulations made upon a single out breath. However, speech requires at least two additional components: syntax and word vocabulary. The latter consists itself of two components: the ability to link semantics to word pronunciations (both in perception and production), and the ability to acquire words and their meaning from their presence in the talk of others. An asymmetry thus exists: while the components needed for song can independently precede those needed for speech (you can sing without words and syntax), those for speech cannot independently precede those needed for song (speech needs the breath control required for song). The evolution of biological structures, moreover, goes through stages whereby inherited modifications become increasingly complex by additions – feathers, for example, preceded their use in flight by initially being evolved to provide thermal insulation then only became structurally adapted (elongation etc) as wing feathers. Song has many functional advantages (see below), and is a form of communication that easily mixes with speech (chants, prayers, poetry). Thus, it is a natural proto-stage which could initially arise and then further develop by elaboration into speech.


Song has evolved many times in diverse species including crickets, birds (on many independent occasions), sea-mammals (dolphins, porpoises and sperm whales), and in all monogamous primates with stable territories (indris, titis, tarsiers and gibbons) (Haimoff, 1986). In contrast, the use of words and syntax for symbolic-based communication has evolved only once. One reason for the evolution of song in diverse animals is bonding of groups (Bown, Farabaugh & Veltman, 1988) and breeding pairs (Diamond & Terborgh, 1968; Thorpe & North, 1966). Even the function of song to defend territory is arguable one of bonding – though in this case between an individual and a resource. Humans (ignoring vocal communication) are distinct from our closest apes, the chimpanzee, in three prominent respects that link to our capacity to bond. First, we maintain lifelong attachments with dispersed offspring – indeed we are the only primate that does this (Rodseth, Wrangham, Harrigan & Smuts, 1991). As a result, all humans exist within complex social networks built around group and kin attachments. Second, human parents bond either monogamously or polygamously. Third, we have grossly enlarged brains – the development of which was made possible by the resources provided by bonded parents. As much as song bonds diverse animals, it also bonds humans: for example, in rituals observed by anthropologists (Bowra, 1962; Blacking, 1973), and in such familiar activities as marching and work songs, football stadium chants, National Anthems, camp-fire songs and hymns. As Ellen Dissanayake (1992, p. 119) puts it: 'by means of music a supra individual state is created in which singer and listener can exist together, joined in a "common consciousness", a common pattern of thought, attitude and emotion'. Thus, human evolution had in song a faculty: (i) that is not only used by many diverse animals (including all territorial monogamous primates) to create and maintain parental and social bonds, but (ii) that is used by modern people for related purposes, and (iii) that would have enabled the parental and complex social bonds needed for brain expansion. Moreover, if hominids sung to maintain social bonds, such vocalisations would have been available for natural selection to modify by the addition of words and syntax into speech.


Abilities linked to song, precede and aid those of speech. Children treat vocalisations both of themselves and others as a kind of 'song' (Papousek & Papousek, 1981). Newborns identify prosody sufficiently well (due to hearing low-pass filtered maternal vocalisations in the womb) to recognise the language that surrounds them from foreign ones (Mehler et al., 1988). 'Motherese' strongly emphases prosody (Trehub, Trainor & Unyk, 1993); in addition songs – lullabies – are an important part of prelinguistic communication (Trehub, Trainor & Unyk, 1993). Not surprisingly, babbling by eight month olds contains intonation patterns of the surrounding language (de Boysson-Bardies, Sagart & Durand, 1984). The perception of intonation is used (together with distribution regularities in speech sounds) to segment out word boundaries within speech and so enable words to be identified and acquired (Sansavini, 1997). Intonation similarly segments phrase boundaries and thus aids the acquisition of syntax (Morgan & Demuth, 1996). Thus, while song could evolve before speech, speech could not have on developmental grounds been acquired without the earlier existence of song.

Vocal tract evolution

In addition to breath control, human evolution adapted the vocal tract. Usually this is assumed to enable it to create the wide range of speech sounds found in human languages. An alternative explanation is that the vocal tract evolved to provide humans with a wide variety of musical sounds. Supporting this is the fact that intelligible speech can be produced using only a small part of the vocal tract: for example, Arandic languages of Central Australia use only two distinct vowels, both central non-high ones (Maddieson, 1998). This is enigmatic since natural selection would have adapted the vocal tract to produce no more than the minimum range of vocal sounds needed for intelligible speech. The most parsimonious explanation is that vocal tract was shaped by a factor other than speech (such as singing) that required a much more extensive range, and then was only secondarily adapted for speech. Interestingly, the ability to produce musical rhythm and initiate singing is left hemispheric like the ability to speak suggesting the use of overlapping motor control circuits in both (Borchgrevink, 1991).

While on their own, none of the above four areas of arguments is conclusive, together as a group they strongly support Darwin and Jespersen's proposal that in human evolution the capacity to sing preceded the capacity to speak. Indeed, as Darwin (1872, p 476) once observed, ‘It can hardly be supposed that a false theory could explain, in so satisfactory a manner … the several large classes of facts above specified.


Blacking, J. (1973). How musical is man? Seatle: University of Washington Press.

Borchgrevink, H. M. (1991). Prosody, musical rhythm, tone pitch and response initiation during amytal hemisphere anaesthesia. In Music, language, speech and brain, (eds. J. Sundberg, L. Nord, & R. Carlson) pp. 327-343, Macmillan.

Bowra, C. M. (1962). Primitive song. London: Nicolson & Weidenfeld.

Brown, E. D., Farabaugh, S. M. & Veltman, C. J. (1988). Song sharing in a group-living songbird, the Australian magpie. Part 1. Vocal sharing within and among social groups. Behaviour, 104, 1-28.

Darwin, C. (1871). Descent of man. London: Murray.

Darwin, C. (1872). Origin of Species. 6th. ed., London: Murray.

de Boysson-Bardies, B., Sagart, L. & Durand, C. (1984). Discernible differences in the babbling of infants according to target language. Journal of Child Language, 11, 1-15.

Diamond, J. M. & Terborgh, J. W. (1968). Dual singing by New Guinea birds. Auk, 85, 62-82.

Dissanayake, E. (1992). Homo aestheticus. New York: Free Press.

Dowing, T. A. (1995). Music and the origins of language. CUP.

Freeland, S. J., Knight, R. D. & Landweber, L. F. (1999). Do proteins predate DNA? Science, 286, 690-692.

Geist, V. (1978). Life Strategies, Human Evolution, Environmental Design. New York: Springer Verlag.

Haimoff, E. H. (1986). Convergence in the duetting of monogamous old world primates. Journal of Human Evolution, 15, 51-59.

Hixon, J. & collaborators. (1987). Respiration function in speech and song. London: Taylor and Francis.

Jespersen, O. (1922). Language. London: Allen.

Kay, R. F., Cartmill, M. & Balow, M. (1998). The hypoglossal canal and the origin of human vocal behavior. Proceedings of the National Academy of Sciences, USA, 95, 5417-5419.

MacLarnon AM & Hewitt GP (1999). The evolution of human speech: the role of enhanced breathing control. American Journal of Physical Anthropology, 109, 341-363

Maddieson, I. (1999). Vowel systems and language origins. Abstracts, Proceedings of the Evolution of Language, 2nd International Conference.

Marler, P. (1970). Birdsong and speech development. American Scientist, 58, 667-673.

Mehler, J., Jusczyk, P., Lambretz, G., Halsted, N., Bertoncini, J. & Amiel-Tison, C. (1988). A precursor of language acquisition in young infants. Cognition, 29, 143-178.

Morgan, J. L. & Demuth, K. (1996). Signal to syntax. Mahwah, NJ. Erlbaum.

Papousek, M. & Papousek, H. (1981). Musical elements in he infant's vocalization. Advances in Infancy, 1, 163-218.

Provine, R. R. (1998). Laughter. American Scientist, 84, 38-45.

Richman, B. (1993). On the evolution of speech: singing as the middle term. Current Anthropology, 34, 721-722.

Sansavini, A. (1997). Neonatal perception of the rhythmical structure of speech. Early Development and Parenting, 6, 3-13.

Thorpe, W. H. & North, M. E. (1966). Vocal imitation in the tropical Bou-bou shrike as a means of establishing and maintaining social bonds. Iris, 108, 432-435.

Trehub, S. E., Trainor, L. J. & Unyk, A. M. (1993). Music and speech processing in the first year of life. Advances in Child Development, 24, 1-35.

Vaneechoutte, M. & Skoyles, J. R. (1998). The memetic origin of language: modern humans as musical primates. Journal of Memetics, 2. http://www.cpm.mmu.ac.uk/jom-emit/1998/vol2/vaneechoutte_m&skoyles_jr.html



 Conference site: http://www.infres.enst.fr/confs/evolang/