Music, computers and the continuous gesture


In May 1957, Max Mathews and his research team at Bell Labs in New Jersey loaded the trunk of a car with several boxes containing thousands of punched cards, the prevailing computer storage format at the time. His goal was to take the data to be processed by the powerful IBM 704 computer, which occupied a large room at the headquarters of the computing giant on Madison Avenue in New York, and which was rented for $600 an hour. Not even the formidable economic resources that Bell allocated to its labs were enough to house in its own facilities a computer of the power required for the task that Mathews had in mind. After several hours of waiting and computation time, the 704 stored the result on a magnetic tape that was driven back to the lab. Once there, the tape was subjected to the last stage of the complex treatment: the conversion of the digital data it contained into sounds. The end result of the whole arduous process: a simple 17-second melody [1].

IBM 704 computer at the Madison Avenue headquarters in New York, 1954 (source)

That day was one of the foundational moments of computer sound synthesis, that is, the creation of sound by means of software. At that time, there were already important electroacoustic music studios full of analog devices (such as oscillators, filters, modulators), with which adventurers such as Karlheinz Stockhausen, John Cage or Gottfried Michael Koenig had been experimenting for years. Neither were electronic and electromechanical musical instruments new, such as the theremin (1919), the Ondes Martenot (1928) or the Hammond organ (1938). But in all those cases, the sonic palette was limited. The software synthesis pioneered by Mathews allowed the exact definition, from scratch, of an arbitrary sound; in theory, of any possible sound. For the first time in history it was possible to create sound waves that were not the result of the vibration of a physical system. The possibilities were endless, but the original method of organizing an expedition to Madison Avenue was not extremely efficient, and the total time between initial programming and sound output depended on external factors such as the waiting list to use the 704 or the traffic on the Interstate.

The limitations derived from the low computational power were gradually alleviated, but until well into the 1990s, the sound generation process with computers continued to consist of the stages of programming, processing, and listening, minutes or hours later, to the result. Software synthesis emancipated sound from its material source, but there was still a pending issue: computers, as delayed sound processing tools, were not suitable for live performance. Computer sounds, pre-calculated and pre-recorded, could only be played unaltered as part of a work. In addition, due to the difficulty of using the first computer synthesis languages and the high cost of the equipment, their use was limited to a small group of specialists. While hardware synthesizers, most of them with a built-in piano keyboard, began their dazzling career from the late 1960s, computer synthesis based on programming languages, much more flexible in terms of control and sound possibilities, remained virtually unknown outside the labs.

Today (2014), computers are approaching the performance requirements of traditional musical instruments: tangibility, interactivity, immediacy. There is still a long way to go, but it is already possible to see interactive sound installations, interfaces that generate intricate sound textures in response to the performer’s gestures, dancers who create music with their movements, programs that improvise live, imitating the human soloist they have just “listened” to. There is a microcosm of music ensembles and composers specializing in the use of such real-time technologies, usually associated with music technology research centers such as CCRMA (Center for Computer Research in Music and Acoustics, pronounced karma) at Stanford University, IRCAM (Institut de Recherche et Coordination Acoustique/Musique) of the Pompidou Center in Paris, or the MTG (Music Technology Group) of the Pompeu Fabra University of Barcelona, to name a few of the main ones. Although little known by the general public, these types of devices have recently begun to leave the strictly academic and institutional sphere and are making their way into popular music and culture.

In the specific case of keyboard instruments, a large multitude of programs have existed for more than a decade, the so-called softsynths, which are close in sound quality and response time to physical synthesizers. The same can be said of any sound generator that responds to input consisting of a list of isolated events or discrete gestures: keys played, pedals pressed, drums struck, etc. There are acoustic and electric piano emulators, analog and digital synthesizers or electronic drums, which for many purposes offer sufficient and, in some cases, practically indistinguishable quality.

Today’s challenge, however, is to create systems that naturally respond to continuous gestures, such as the bow movement of a violinist or the twist of a dancer. “Gesture” is understood here in a broad sense: any continuous physical magnitude in time that constitutes the input to a synthesis system. They can be movements, such as those mentioned, but also air fluctuations (for example to simulate wind instruments), temperature, pressure, or even ambient sounds captured by microphones. The difficulty of synthesis systems controlled by continuous gestures lies in designing a method that analyzes the gesture, extracts relevant features, and uses the latter to control certain parameters of the sound to be generated. And furthermore, to do all this in real time (in the range of microseconds).

Some important advances in this field come from the MTG in Barcelona. They developed what is perhaps the real-time musical interface best known by the general public: the reactable. It is conceived as an augmented DJ interface, in which buttons and sliders have been replaced by rotations and translations of some plastic cubes on a surface under which an image recognition system is hidden which, in turn, translates the movements into synthesis parameters or sound effects. Björk’s use of the reactable on her 2007 tour boosted the device’s popularity, which has led to its marketing by a university spin-off company. Currently, the group is working on equipping the reactable with force sensors that make it possible to increase the fidelity of the analyzed gesture.

Other systems escape from the two dimensions and interpret gestures in space. This is the case of the work of the researcher Esteban Maestre, also from the MTG. His objective is to increase the realism of the synthesis of bowed string instruments by capturing the movements of the bow and the instrument through sensors. The sounds generated in this way acquire an organic and realistic quality that those produced using traditional synthesizers or samplers lack. Another example is the IRCAM Modular Objects (MO) system, based on the use of small objects with sensors that transmit wirelessly to the computer. In this case, the aim is not to simulate a real instrument, but rather to analyze arbitrary gestures that a sound designer can freely associate with its acoustic results.

The ideal way to appreciate the capabilities (and limitations) of the latest advances in this field is in a concert situation. The aforementioned research centers organize cycles and festivals that serve as a vehicle for presenting their technologies. To continue with the two mentioned centers, IRCAM collaborates closely with the Ensemble Intercontemporain and organizes the ManiFeste festival every June, and the MTG is associated with the Phonos Foundation and its concert series. Many of the recent works performed there take advantage of real-time technologies and offer audiences a sample of what is taking place behind labs’ walls.

The recent piece Voir-Toucher by Lorenzo Pagliei, premiered in Paris in June 2013, is a good illustration of this. During the piece, the three performers hit, rub, caress wooden surfaces whose curved shapes evoke the profiles of various musical instruments. Each surface has a sensor connected to a computer that analyzes and classifies each gesture, and synthesizes the sounds. The relationship between gesture and sonic result is direct and immediate: the fluctuations and nuances that occur during the course of each gesture are instantly reflected in the sound. The simple pieces of wood thus give rise to a wide range of electronic sounds, many with a metallic timbre or with resonances similar to those of bells, but which all produce a paradoxically natural effect.

Premiere of the Voir-Toucher by Lorenzo Pagliei, Paris, 2013 (source)

Real time technologies have also made it possible to turn dance into a generator of music. An example among the recent works based on this new paradigm is Glossopoeia by Alberto Posadas, premiered in 2009. The movements of a dancer are captured by sensors and instantly act on the sound. Sensors of this type are also beginning to appear in pop; examples include performances by the English musician Imogen Heap and the French Émilie Simon.

Despite all the new possibilities offered by such new music interfaces, perhaps the most significant recent advance is the increasing ease of access and use. The combination of Arduino (the extremely popular free hardware platform) with real-time sound synthesis languages such as Max (named after Max Mathews) or its open source cousin Pure Data, are allowing a large number of artists, composers and amateurs to experiment with the continuous gesture as a sound generator. Proof of this is the large number of interactive sound installations that increasingly populate galleries and museums, and the numerous free hardware workshops and communities focused on musical creation.

What are the current limitations? It is necessary to increase the degree of accuracy, the sensitivity to nuances, the intimacy of control. In the case of tangible interfaces, the level of embodiment of the relationship between performer and device that many traditional instruments have has yet to be reached. Another important aspect to improve is predictability: to similar gestures, the system has to respond with similar sounds (the difficulty here lies in balancing what the interpreter and machine understand as “similar”). Improving all of these factors will help future performers achieve technical and expressive prowess that make the most of real-time systems.

During an interview just a year before he passed away in April 2011 at the age of 85, Max Mathews demonstrated his Radio Baton, a relatively simple musical interface based on antennas and radio waves. The Radio Baton was invented in the 1980s, and although revolutionary at the time, it cannot be said that it was still at the forefront of technology. However, an excited Mathews played and rejoiced in the acceleration and braking patterns that his movements caused in a recording of Beethoven’s fifth. At the end of his life, the man who turned calculating machines into musical instruments had achieved a dream: to turn movement into music, and to make computers invisible.


[1] The piece in question was composed by Newman Guttman and is entitled The Silver Scale. This and other early examples of digital synthesis can be heard on the album “The historical CD of Digital Sound Synthesis” released by the Wergo label (catalog number 20332).