“Guiding human-computer music improvisation: introducing authoring and control with temporal scenarios” by Jérôme Nika : Download the thesis (abstract below).
Video of the PhD defense, May 16, 2016.
Jury:
Gérard Assayag (co-dir.) – Ircam, Paris
Gérard Berry (president) – Collège de France, Paris
Emmanuel Chailloux – Université Pierre et Marie Curie, Paris
Marc Chemillier (co-dir.) – EHESS, Paris
Myriam Desainte-Catherine (reviewer) – Université de Bordeaux
Shlomo Dubnov (reviewer) – University of California San Diego
George Lewis – Columbia University New York
Abstract. This thesis focuses on the introduction of authoring and controls in human-computer music improvisation through the use of temporal scenarios to guide or compose interactive performances, and addresses the dialectic between planning and reactivity in interactive music systems dedicated to improvisation.
An interactive system dedicated to music improvisation generates music ”on the fly”, in relation to the musical context of a live performance. This work follows on researches on machine improvisation seen as the navigation through a musical memory: typically the music played by an ”analog” musician co-improvising with the system during a performance or an offline corpus.
These researches were mainly dedicated to free improvisation, and we focus here on pulsed and ”idiomatic” music. Within an idiomatic context, an improviser deals with issues of acceptability regarding the stylistic norms and aesthetic values implicitly carried by the musical idiom. This is also the case for an interactive music system that would like to play jazz, blues, or rock… without being limited to imperative rules that would not allow any kind of transgression or digression.
Various repertoires of improvised music rely on a formalized and temporally structured object, for example a harmonic progression in jazz improvisation. The same way, the models and architecture we developed rely on a formal temporal structure. This structure does not carry the narrative dimension of the improvisation, that is its fundamentally aesthetic and non-explicit evolution, but is a sequence of formalized constraints for the machine improvisation.
This thesis thus presents: a music generation model guided by a ”scenario” introducing mechanisms of anticipation; a framework to compose improvised interactive performances at the ”scenario” level; an architecture combining anticipatory behavior with reactivity using mixed static/dynamic scheduling techniques; an audio rendering module to perform live re-injection of captured material in synchrony with a non-metronomic beat; a study carried out with ten musicians through performances, work sessions, listening sessions and interviews.
First, we propose a music generation model guided by a formal structure. In this framework ”improvising” means navigating through an indexed memory to collect some contiguous or disconnected sequences matching the successive parts of a ”scenario” guiding the improvisation (for example a chord progression). The musical purpose of the scenario is to ensure the conformity of the improvisations generated by the machine to the idiom it carries, and to introduce anticipation mechanisms in the generation process, by analogy with a musician anticipating the resolution of a harmonic progression.
Using the formal genericity of the couple ”scenario / memory”, we sketch a protocol to compose improvisation sessions at the scenario level. Defining scenarios described using audio-musical descriptors or any user-defined alphabet can lead to approach others dimensions of guided interactive improvisation. In this framework, musicians for whom the definition of a musical alphabet and the design of scenarios for improvisation is part of the creative process can be involved upstream, in the ”meta-level of composition” consisting in the design of the musical language of the machine. This model can be used in a compositional workflow and is ”offline” in the sense that one run produces a whole timed and structured musical gesture satisfying the designed scenario that will then be unfolded through time during performance.
We present then a dynamic architecture embedding such generation processes with formal specifications in order to combine anticipation and reactivity in a context of guided improvisation. In this context, a reaction of the system to the external environment, such as control interfaces or live players input, cannot only be seen as a spontaneous instant response. Indeed, it has to take advantage of the knowledge of this temporal structure to benefit from anticipatory behavior. A reaction can be considered as a revision of mid-term anticipations, musical sequences previously generated by the system ahead of the time of the performance, in the light of new events or controls. To cope with the issue of combining long-term planning and reactivity, we therefore propose to model guided improvisation as dynamic calls to ”compositional” processes, that it to say to embed intrinsically offline generation models in a reactive architecture. In order to be able to play with the musicians, and with the sound of the musicians, this architecture includes a novel audio rendering module that enables to improvise by re-injecting live audio material (processed and transformed online to match the scenario) in synchrony with a non-metronomic fluctuating pulse.
Finally, this work fully integrated the results of frequent interactions with expert musicians to the iterative design of the models and architectures. These latter are implemented in the interactive music system ImproteK, one of the offspring of the OMax system, that was used at various occasions during live performances with improvisers. During these collaborations, work sessions were associated to listening sessions and interviews to gather the evaluations of the musicians on the system in order to validate and refine the scientific and technological choices.
Guided/composed human-computer music improvisation (more in french)