Skip to content

Generation of Symbolic Music Based on MusicVAE

This machine learning project was my bachelor thesis. It was about analyzing, understanding and re-implementing the MusicVAE architecture

  • what is it about?

  • understanding and re-implementation of lstms, vaes

  • it was my bachelor thesis
  • much code review!

why? - i was interested in music - i wanted to work with pytorch on some machine learning project - i was interested in generative ml paradigm. - before sonus came around and it got mainstream and thus boring :P - i did it before it was cool!

The code belongs to the Fraunhofer IDMT and I must not disclose it. The thesis and the final presentation are available for reading.

Abstract

The abstract of the thesis can be read below.

Automatic music generation (AMG) systems are computer-based models that produce signals that can be interpreted as music, such as waveforms or note sequences. Apart from purely commercial motivations, like the creation of a potentially unlimited amount of music, such systems are also interesting from a creative point of view, since they could be used for support during the music composition process, or even be understood as a new type of instrument. In this work, the flat variant of the AMG system MusicVAE is re-implemented and trained. It is used for unconditioned generation of monophonic note sequences of two bars in length. MusicVAE is a variational auto-encoder. It maps a given note sequence to a multivariate Gaussian distribution in a lower-dimensional space (called latent space). From that distribution, it samples a vector and decodes it to another note sequence. The model can be trained so that the reconstructed sequence is as similar as possible to the input sequence. By randomly sampling and decoding vectors from the latent space after training, new pieces of music can be generated. In this work, a set of generated note sequences is compared to a held-out test set using selected objective criteria and a subjective analysis. It is shown that the trained model struggles to reproduce the rhythmic structure of the training data and that generated note sequences often contain large jumps in pitch and single non-diatonic notes. Despite such outlying notes, most note sequences contain coherent and diatonic melodies. In particular, interesting rhythms and melodies with multiple high, short notes followed by few longer notes are generated. In future work, ways to condition the latent space could be examined to find possibilities of controlling the generation process.

Excerpts to Listen to

The excerpts mentioned in the presentation can be downloaded here.