Creating generative musical agents dedicated to human-machine creative interaction
Through the development of generative software instruments, Jérôme Nika’s research focuses on the integration of scenarios in music generation processes, and on the dialectic between reactivity and planning in interactive human-computer music improvisation within the Music Representations team at Ircam.
Jérôme Nika’s PhD work Guiding Human-Computer music improvisation (Young Researcher Prize in Science and Music, 2015; Young Researcher Prize awarded by the French Association of Computer Music, 2016) focused on the introduction of authoring, composition, and control in human-computer music co-improvisation.
Then, the DYCI2 library of generative musical agents combined machine learning models and generative processes with reactive listening modules. This library offers a collection of “agents/instruments” embedding a continuum of strategies ranging from pure autonomy to meta-composition thanks to an abstract “scenario” structure.
More material and descriptions of the projects coming soon.
DYCI2 project (Dynamics of Creative Improvised Interaction) [2017-…]
Topics: Human-machine co-improvisation. Automatic chord extraction, discovery and inference of harmonic progressions in real-time audio signals.
The collaborative research and development project DYCI2, Creative Dynamics of Improvised Interaction (Ircam, Inria Nancy, EHESS, UCSD, Univ. La Rochelle), focuses on conceiving, adapting, and bringing into play efficient models of artificial listening, learning, interaction, and generation of musical contents. It aims at developing creative and autonomous digital musical agents able to take part in various human projects in an interactive and artistically credible way; and, in the end, at contributing to the perceptive and communicational skills of embedded artificial intelligence.
The concerned areas are live performance, production, pedagogy, and active listening. Three main research issues of this project: conceiving multi-agent architectures and models of knowledge and decision in order to explore scenarios of music co-improvisation involving human and digital agents.
The objective is to merge the usually exclusive “free”, “reactive”, and “scenario-based” paradigms in interactive music generation to adapt to a wide range of musical contexts involving hybrid temporality and multimodal interactions.
Phd thesis – ImproteK [2012-2016]
Gérard Assayag (co-dir.) – Ircam, Paris
Gérard Berry (president) – Collège de France, Paris
Emmanuel Chailloux – Université Pierre et Marie Curie, Paris
Marc Chemillier (co-dir.) – EHESS, Paris
Myriam Desainte-Catherine (reviewer) – Université de Bordeaux
Shlomo Dubnov (reviewer) – University of California San Diego
George Lewis – Columbia University New York
Abstract. This thesis focuses on the introduction of authoring and controls in human-computer music improvisation through the use of temporal scenarios to guide or compose interactive performances, and addresses the dialectic between planning and reactivity in interactive music systems dedicated to improvisation. An interactive system dedicated to music improvisation generates music ”on the fly”, in relation to the musical context of a live performance. This work follows on researches on machine improvisation seen as the navigation through a musical memory: typically the music played by an ”analog” musician co-improvising with the system during a performance or an offline corpus. These researches were mainly dedicated to free improvisation, and we focus here on pulsed and ”idiomatic” music. Within an idiomatic context, an improviser deals with issues of acceptability regarding the stylistic norms and aesthetic values implicitly carried by the musical idiom. This is also the case for an interactive music system that would like to play jazz, blues, or rock… without being limited to imperative rules that would not allow any kind of transgression or digression. Various repertoires of improvised music rely on a formalized and temporally structured object, for example a harmonic progression in jazz improvisation. The same way, the models and architecture we developed rely on a formal temporal structure. This structure does not carry the narrative dimension of the improvisation, that is its fundamentally aesthetic and non-explicit evolution, but is a sequence of formalized constraints for the machine improvisation. This thesis thus presents: a music generation model guided by a ”scenario” introducing mechanisms of anticipation; a framework to compose improvised interactive performances at the ”scenario” level; an architecture combining anticipatory behavior with reactivity using mixed static/dynamic scheduling techniques; an audio rendering module to perform live re-injection of captured material in synchrony with a non-metronomic beat; a study carried out with ten musicians through performances, work sessions, listening sessions and interviews. First, we propose a music generation model guided by a formal structure. In this framework ”improvising” means navigating through an indexed memory to collect some contiguous or disconnected sequences matching the successive parts of a ”scenario” guiding the improvisation (for example a chord progression). The musical purpose of the scenario is to ensure the conformity of the improvisations generated by the machine to the idiom it carries, and to introduce anticipation mechanisms in the generation process, by analogy with a musician anticipating the resolution of a harmonic progression. Using the formal genericity of the couple ”scenario / memory”, we sketch a protocol to compose improvisation sessions at the scenario level. Defining scenarios described using audio-musical descriptors or any user-defined alphabet can lead to approach others dimensions of guided interactive improvisation. In this framework, musicians for whom the definition of a musical alphabet and the design of scenarios for improvisation is part of the creative process can be involved upstream, in the ”meta-level of composition” consisting in the design of the musical language of the machine. This model can be used in a compositional workflow and is ”offline” in the sense that one run produces a whole timed and structured musical gesture satisfying the designed scenario that will then be unfolded through time during performance. We present then a dynamic architecture embedding such generation processes with formal specifications in order to combine anticipation and reactivity in a context of guided improvisation. In this context, a reaction of the system to the external environment, such as control interfaces or live players input, cannot only be seen as a spontaneous instant response. Indeed, it has to take advantage of the knowledge of this temporal structure to benefit from anticipatory behavior. A reaction can be considered as a revision of mid-term anticipations, musical sequences previously generated by the system ahead of the time of the performance, in the light of new events or controls. To cope with the issue of combining long-term planning and reactivity, we therefore propose to model guided improvisation as dynamic calls to ”compositional” processes, that it to say to embed intrinsically offline generation models in a reactive architecture. In order to be able to play with the musicians, and with the sound of the musicians, this architecture includes a novel audio rendering module that enables to improvise by re-injecting live audio material (processed and transformed online to match the scenario) in synchrony with a non-metronomic fluctuating pulse. Finally, this work fully integrated the results of frequent interactions with expert musicians to the iterative design of the models and architectures. These latter are implemented in the interactive music system ImproteK, one of the offspring of the OMax system, that was used at various occasions during live performances with improvisers. During these collaborations, work sessions were associated to listening sessions and interviews to gather the evaluations of the musicians on the system in order to validate and refine the scientific and technological choices.
Supervision & teaching
- 2017-ongoing : Tristan Carsault (co-supervision with Philippe Esling, direction Gerard Assayag), PhD thesis, “Structure discovery and prediction in multivariate musical audio signals for human computer improvisation”, EDITE, Sorbonne Université.
- 2017: Tristan Carsault (co-direction with Philippe Esling), Master’s thesis, 5 months internship: “Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation.”.
- 2016: Théis Bazin (co-direction with Philippe Esling), Master’s thesis, 5 months internship: “Deep learning for music structure analysis and prediction, application to musical co-improvisation”.
- 2015: Axel Chemla–Romeu-Santos (co-direction with Gérard Assayag), Master’s thesis, 5 months internship: “Combining long-term planning and reactive listening in human-computer interactive improvisation.”
- Discrete stuctures (LI214 / 2I005) – Université Pierre et Marie Curie, Paris 6
- Introduction to imperative programming, C (LI115 / 1I002) – Université Pierre et Marie Curie, Paris 6
- From chipset to web (LI105) – Université Pierre et Marie Curie, Paris 6