One of the persistent obstacles in real-time pattern detection research is data. Most publicly available datasets are either fully synthetic, recorded in studio conditions, or designed for offline pattern discovery — not for the live, expressive, human performance scenario a smart musical instrument must handle. To bridge this gap, I created three complementary datasets across the course of my PhD, each extending the complexity of the previous one: from symbolic monophony, to rhythmic percussion, to polyphonic audio.
DoMP — Dataset of Musical Patterns
The DoMP is a dataset of monophonic musical patterns and expressive variations, available at zenodo (DOI: 10.5281/zenodo.10818617). It contains 4,000 patterns and variations — 400 distinct ground-truth patterns and 3,600 expressive repetitions — contributed by 40 musicians (20 keyboard players, 20 electric guitar players) from diverse ethnic and musical backgrounds (Italian, British, Scottish, Filipino, Spanish, Ugandan, and Sri Lankan; aged 18–43, mean 28.5). Musicians ranged from classically trained pianists and church organists to studio session players, jazz musicians, blues guitarists, and rock band members.
Each musician composed 10 distinct monophonic phrases and then repeated each 9 more times with full artistic freedom, subject to one condition: each repetition must remain perceptually equivalent to the original, by their own judgment. Guitar recordings were captured using a Fishman TriplePlay MIDI tracker on Ibanez S470 and Epiphone SG Standard guitars. Both MIDI files are included. Analysis of DoMP revealed an average pattern length of 20.45 MIDI notes, mean amplitude variation of 9.24 MIDI velocity units, and a 26.5% probability that a given pattern contains pitch bends. The dataset is deliberately designed as a worst-case benchmark.
DoDP — Dataset of Drum Patterns
The DoDP extends the DoMP framework into the rhythmic and percussive domain. Building on the same methodology, 10 drummers (all Italian, aged 18–36, mean 26.0) from diverse backgrounds — professional session drummers, jazz drummers, rock and metal band members, orchestral percussionists, and ethnic/traditional percussion players — each contributed 10 distinct drum patterns with 9 expressive repetitions each, yielding 1,000 total recordings (100 distinct patterns and 900 variations).
All recordings were made using a Yamaha DTX-432k electronic drum kit configured with the standard General MIDI percussion mapping, enabling consistent MIDI note-number assignments across all recordings. Each MIDI file contains note numbers, velocity data, precise tick timing, and hi-hat pedal controller data. Key statistical characteristics include an average pattern length of 16.8 MIDI events, mean velocity variation of 12.6 units, and a 44 time signature in 75% of patterns. Compared to DoMP, DoDP shows smaller average temporal deviations (42.3 vs. 77.6 MIDI ticks) and more consistent pattern lengths — reflecting the greater rhythmic precision typical of drummers versus melodic players.
DoPP — Dataset of Polyphonic Patterns
The DoPP completes the three-dataset framework by moving into polyphonic audio, available at zenodo (DOI: 10.5281/zenodo.14497998). It was created with 20 musicians (10 keyboard, 10 guitar; Italian and Sri Lankan; aged 26–55, mean 38.6), each contributing 10 polyphonic patterns with 9 expressive repetitions, yielding 2,000 total audio recordings. Unlike DoMP, musicians were free to use any combination of monophony and polyphony — chords, double stops, chord melodies, contrapuntal motion — making the detection problem substantially harder.
Recordings were made at 44.1 kHz, 24-bit resolution using Universal Audio RME Fireface UFX and Zoom U-24 audio interfaces. Electric guitars were recorded as direct unprocessed input; keyboard players were asked to refrain from time-based effects. The dataset captures a wide range of instruments (Ibanez S470, Ibanez RGA7, Fender Stratocaster, Gibson Les Paul, Fender Telecaster, Korg Krome, Roli Seaboard, Korg KRONOS, and others) and diverse guitar-specific techniques including fingerstyle playing, hybrid picking, polyphonic string bending, and multi-finger tapping. Average pattern duration is 8.4 seconds (SD 5.54), with a dynamic range of −24 dB to 0 dB.
A Three-Tiered Framework
Together, DoMP → DoDP → DoPP form a deliberate progression from the simplest symbolic single-voice case to the most complex real-world polyphonic audio scenario. All three datasets share an identical file naming convention (artistID_patternID_versionID) and folder structure for straightforward integration into evaluation pipelines. The systematic recording methodology — live musicians, intentional expressive variations, perceptual equivalence as the only constraint — makes all three datasets uniquely suited to evaluating pattern detection systems intended for live performance, and sets a new benchmark for the field.