Data

Theorytab Dataset (Lead Sheet XML format)

Total ~138K Four-Four Time bars
Scale degree (considered as C Major scale)

Lakh Pianoroll Dataset (MIDI format)

Total ~160K Four-Four Time bars
Traspose all songs to C Major/A minor scale

Training Data

Use symbolic timing, which discards tempo information (see here for more details)
Discard velocity information (using binary-valued piano-rolls)
84 possibilities for note pitch (from C1 to B7)
Merge tracks into 5 categories: Bass, Drums, Guitar, Piano and Strings
Consider only songs with an rock tag
Collect musically meaningful 4-bar phrases for the temporal model by segmenting the piano-rolls with structure features proposed in [1]

Hence, the size of the target output tensor is 4 (bar) × 96 (time step) × 84 (pitch) × 5 (track).

tra_phr.npy (7.54 GB) contains 50,266 four-bar phrases. The shape is (50266, 384, 84, 5).
tra_bar.npy (4.79 GB) contains 127,734 bars. The shape is (127734, 96, 84, 5).

Here are two examples of five-track piano-roll of four-bar long seen in our training data. The tracks are (from top to bottom): Bass, Drums, Guitar, Strings, Piano.

train_samples

Reference

Joan Serrá, Meinard Müller, Peter Grosche and Josep Ll. Arcos, “Unsupervised Detection of Music Boundaries by Time Series Structure Features,” in AAAI Conference on Artificial Intelligence (AAAI), 2012