Introduction
Deep learning is a trend topic in recent years. With the developing of
deep learning algorithms, it starts to be used from every aspect. The
most two commonly used areas are CV and NLP, but it doesn’t mean
researchers have not use it for other areas, audio is one aspect that
they are working on. Sageev Oore [1] and his team proposed a
Long-short term memory based recurrent neural network to generate music,
this algorithm will learn from the input music representation and study
the pattern of music and then compose a new musical expression. For deep
learning approach that the size of input is really important, if the
input size is too big then the training time and memory capacity would
be an issue. In order to deal with this problem, MIDI file is used as
the raw data type. MIDI is not a normal musical file like mp3, it can be
described as music score. A synthesizer will generate music according
what’s inside a MIDI file. Other reason that using MIDI is music
generation is either aiming to create music scores or directly
interpreting them, but Sageev Oore [1] proposed that in fact
jointly predicting the notes and also their expressive timing and
dynamics is more valuable. That will be discuss at coming section.
The history of music generation can be call back to the
17th century. People had this game called
“musical dice game ”[5], it is just a basic dice game that
the only difference is the result of rolling dice is to pick a
pre-composed music options and after several rounds the music options
are putting together as a complete music. The first automatic music
generation is a very simple game. Now through the development of machine
learning, deep learning algorithms, music generation can be carried out
with different ways. For our research, we focus on training a
machine-learning system which is Performance RNN to generate music. They
had a great success on the goal of generate music with both timing and
feeling, mention that “given the current state of the art in music
generation systems, it is effective to generate the expressive timing
and dynamics information concurrently with the music.” So, the approach
is to directly generate improvised performances rather than creating or
interpreting scores.