The number of viable options for physical control over digital synthesis processes has grown tremendously in recent years. Alongside custom-built hardware controllers, several types of commercially available technologies are being used for this purpose as well. These include multitouch surfaces like the iPad, and an array of hardware originally developed for use with video games, such as Nintendo’s Wii remote, the Sony PS3eye camera, and Microsoft’s Kinect sensor. In addition to being relatively inexpensive, this technology has the advantage of providing sophisticated sensor data in a standardized format.
The Digital Instrument Library (DILib) for Pure Data is a set of abstractions and externals that were developed for a course on digital musical instrument design. DILib is intended to streamline the process of realizing musical instruments that make use of the types of control data streams mentioned above. In addition to providing convenience, the library’s components are designed to establish a level of standardization with respect to the varied methods for obtaining sensor data from widely available hardware.
[DILib: Control Data Parsing for Digital Musical Instrument Design
]—an article describing the library. From the proceedings of the 4th Pure Data Convention in Weimar, 2011.
[Aspects of Gesture in Digital Musical Instrument Design
]—an article that reviews critical theory, gestural frameworks, and neuroscience research in relation to digital musical instruments. From the proceedings of the 2011 ICMC in Huddersfield, UK.
View a [short movie
] of me using DILib’s infrared fingertip tracking patch.
View a [short movie
] demonstrating the Kinect body-tracking patch for use with OSCeleton.
View a [short movie
] of me using DILib’s laptop trackpad module.
[Download DILib version 0.0.7
[pitchEnv~] lets you scale the amplitude of individual harmonics in a pitched tone with a harmonic spectrum. This won't work with inharmonic spectra. It's essentially FFT filtering, but is specifically designed for making spectral envelope changes relative to harmonics. So, if you'd like to always suppress the 3rd harmonic (an octave and a 5th above the fundamental), [pitchEnv~] will do it for you no matter what the fundamental happens to be. [pitchEnv~] recieves pitch in MIDI units at its right inlet. I use [sigmund~]'s pitch function for tracking. Version 0.2 of [pitchEnv~] includes a de-noise feature that eliminates energy that isn't related to the harmonics of the pitched tone. Listen to the example below to hear the difference this can make.
[Listen to a before/after example of denoising a scratchy viol sample
[Watch a demo movie altering oboe tones
[Watch a demo movie altering a viol tone
[Download pitchEnv~-0.2 source and helpfile
[Download pitchEnv~-0.2 Macintosh binary and helpfile
[Download pitchEnv~-0.2 Windows binary and helpfile
[bark~] is an onset detector that makes use of the perceptually determined Bark frequency scale to achieve its weighting of the frequency spectrum before attempting to measure spectral growth. It uses an algorithm very similar to bonk~, but the use of the Bark scale may improve results for some types of signals. For instance, with a lot of previous resonance, bonk~ sometimes misses low notes in piano signals.
[bark~] also features some convenient functions for the tuning process, which is always finicky. This external is new and still under development, so look for updates if you intend to use it.
[A Perceptually Based Onset Detector for Real-time and Offline Audio Parsing
]—an article describing features of the bark~ object, as well as a simple test involving three commercial recordings of piano performances. From the proceedings of the 2011 ICMC in Huddersfield, UK.
[Download bark~-0.1.8 source code and helpfile
[Download bark~-0.1.8 Macintosh binary and helpfile
[Download bark~-0.1.8 Windows binary and helpfile
The non-real-time version is useful for cutting up larger soundfiles into musically meaningful chunks. You can use the results for granular synthesis. This will eventually be folded into the timbreID library as well so that you can organize the segmented audio according to various timbre features.
[Download bark-0.0.6 source code and helpfile
[Download bark-0.0.6 Macintosh binary and helpfile
[Download bark-0.0.6 Windows binary and helpfile
[convolve~] is a partitioned impulse response (IR) convolution object. It's designed mainly with reverb in mind, but you can convolve any prerecorded file with live input. Since a useful IR is likely to be several seconds long, [convolve~] cuts it up into partitions for analysis. As each available block of input arrives, it is Fourier transformed, then complex-multiplied against each of the partitions. The results are summed in the frequency domain and processed output blocks are obtained by inverse transforming the sums. This is what all real-time convolution reverbs must do for the sake of efficiency.
The only creation argument is the partition size, which also determines the amount of delay. This is a test version for now—I haven't gotten around to fiddling with the gain adjustment properly. Source, Mac, and Windows binaries are available below.
[Listen to speech convolved with four different IRs
[Download convolve~-0.9 source and helpfile
[Download convolve~-0.9 Macintosh binary and helpfile
[Download convolve~-0.9 Windows binary and helpfile
Josh Reese worked on a non real time version for directly convolving two pre-recorded sound files.
[Download convolve-0.2 source and helpfile
[Download convolve-0.2 Macintosh binary and helpfile
[Download convolve-0.2 Windows binary and helpfile
This is a collection of audio feature analysis externals for [Pd
]. The classification extern (timbreID) accepts arbitrary lists of features and attempts to find the best match between an input feature and previously stored instances of training data. Besides doing identification, timbreID is also designed to facilitate real time concatenative synthesis and timbre-based orderings of sound sets. Its usage is fully explained in the accompanying helpfile.
The most powerful feature analysis objects are based on cepstral analysis. cepstrum~ outputs the raw cepstrum of an analysis frame, while mfcc~ and bfcc~ warp the initial FFT results to the mel and Bark frequency scales. They tend to perform more reliably with unpitched percussion. The technique of deriving MFCCs is described in Rabiner & Juang’s Fundamentals of Speech Recognition
. In a nutshell, all FFT bins are run through a filterbank composed of overlapping triangular filters, boiling the spectrum down to 20 or so numbers which are then subjected to a discrete cosine transform. This weighting of the frequency axis may be more in line with how we perceive timbre.
Other spectral features in the package are magSpec~, barkSpec~, specBrightness~, specCentroid~, specFlatness~, specFlux~, specIrregularity~, specKurtosis~, specRolloff~, specSkewness~, specSpread~, and zeroCrossing~. The idea is to provide a diverse set of tools that can be used for creative research in the area of timbre classification. In addition to being used to create feature vectors to send to the timbreID classifier, these feature externs are useful for generating real-time control streams based on timbre.
As of version 0.3.0, all of the features have a non-real-time version for analyzing samples loaded to arrays. This makes the training step very convenient for things like concatenative synthesis and timbre plotting.
[A Timbre Analysis and Classification Toolkit for Pure Data
] : an introduction to the timbreID collection of objects as of December 2009, including descriptions of the example patches.
[Cepstral Analysis Tools for Percussive Timbre Identification
] : a detailed rundown of the effectiveness of various features in a basic classification problem. From the proceedings of the 3rd Pure Data Convention in Sao Paulo, 2009.
[Perceptually Based Pitch Scales in Cepstral Techniques for Percussive Timbre Identification
] : a comparison of raw cepstrum, mel-frequency cepstrum, and Bark-frequency cepstrum. From the proceedings of the ICMC in Montreal, 2009.
• [A demo movie
] of the updated timbre-space browser using timbreID version 0.6. 88 piano samples.
• [A demo movie
] of a CataRT-style timbre-space browser using timbreID. 68 percussion samples.
• [A demo movie
] of 2500 violin/cello grains in the timbre-space browser.
• [A demo movie
] of 847 voice grains in the timbre-space browser.
• [A demo movie
] of the Bark-based vocoder example.
• [A demo movie
] of the spectrogram plotting example.
• [ A demo movie
] of the vowel identification example patch.
• [A demo movie
] illustrating how to order by timbre and browse the results of the clustering algorithm using a database of 650 viola sounds played by David Medine (the diminished 7th chord you hear is just a by-product of the fact that the samples were done in minor 3rds).
• [A sound example
] of concatenative synthesis.
• [A sound example
] of timbre ordering using percussion samples performed by [Stephen Solook
• [A sound example
] of timbre ordering using ~5 minutes of cartoon sounds. The file was divided into 3696 grains of sound, then played back at a rate of one grain every 20 ms. All the noisiness, pops, etc are part of the [original file
], but notice how these noisy areas are grouped together.
• [A sound example
] of timbre ordering using 5000 grains of sound from a recording of a male voice speaking for 5 minutes. First the grains are played in random order for a few seconds as a point of comparison, then you hear the version ordered by timbreID. Notice how "sss" and "shh" sounds get grouped together.
• [A sound example
] of timbre ordering using grains of sound from a recording of a female voice speaking. Again, the grains are played in random order for a few seconds as a point of comparison, then you hear the version ordered by timbreID.
[timbreID-0.6.0 source & helpfiles
[timbreID-0.6.0 macintosh binaries & helpfiles
[timbreID-0.6.0 windows binaries & helpfiles
[Documented example patches
] demonstrating possible uses of the timbreID set, including a classic vocoder, concatenative synthesis, timbre ordering, timbre-space mapping, spectrogram/cepstrogram plotting, and vowel identification. (version 0.6.4: last updated June 20, 2012)
Video cue player
This video cue player was used for the recent reincarnation of James Dillon’s La Coupure
. I’m making it available here because it’s generally useful and may be the most straightforward, cheapest means of defining points in a video file to be cued on command with your computer keyboard. In order to make precise image placement as simple as possible, the patch allows you to rotate your video image on the X, Y, and Z axes, and scale the width and height independently. You assign computer keyboard keys to particular time segments in the video with a separate “keydef” file—a text file that you can easily create or edit yourself. It is also possible to set different playback speeds for each cue. The patch requires that you have [GEM
] installed, but requires absolutely no knowledge about [GEM
] or even [Pd
] for that matter.
[Download the video cue patch
The tabletool external manipulates, searches, and provides information about tables in Pd. These are basically most of the array manipulation functions I miss from SuperCollider, which makes dealing with tables much more convenient. So far, its features include:
- dumping a table to a list
- reporting the length of a table
- offsetting (adding a constant to all values)
- scaling (multiplying all values by a scalar)
- curve smoothing
- shifting values forward or back by index (with overflow wraparound)
- shifting values forward or back by index (filling excess with zeros)
- removing a value at a given index
- inserting a value at a given index
- randomly choosing a value from the table
- filling a table with an arithmetic series
- filling a table with random numbers within a given range
- generating a random walk
- shuffling values randomly
- sorting values from low to high
- sorting a sub-range of values from low to high
- normalizing to a specified range
- normalizing so values sum to 1.0
- copying a source table to a target table
- copying a specified range of a source table to a target table
- concatenating two tables into one table
- lacing alternating values from two tables into one table
- table absolute value
- table sum
- table mean
- table arithmetic mean
- table geometric mean
- table integration
- table differentiation
- adding two tables
- subtracting two tables
- multiplying two tables
- dividing two tables
- dot product between two tables
- Euclidean distance between two tables
- Manhattan distance between two tables
- Pearson correlation coefficient between two tables
- reporting minimum value (and its index)
- reporting maximum value (and its index)
- reporting the closest value in the table to a given value (and its index)
- number of occurrences of a given value (and a list of their indices)
- number of occurrences of a value less than a given value (and a list of their indices)
- number of occurrences of a value greater than a given value (and a list of their indices)
- number of occurrences of values between two boundary values (and a list of their indices)
- peak picker
- searching for change
Want an additional feature? Send me a message and I'll fold it in...
[tabletool-0.0.8 source & helpfiles
[tabletool-0.0.8 Macintosh binary & helpfiles
[tabletool-0.0.8 Windows binary & helpfiles
pix_motion_sector is an extension of an existing Pd/GEM object: pix_movement. While pix_movement detects motion in the video input, pix_motion_sector only looks for motion within a user-specified rectangle within that input. A second difference is that pix_movement computes the distance between pixels in the current and previous frames in a one-dimensional greyscale space, while pix_motion_sector operates in a three-dimensional RGB space. This should make it more sensitive. Several instances of pix_motion_sector can be chained together to define multiple independent areas of motion detection. This can be functionally equivalent to the cyclops external written for Max/MSP/Jitter.
Possible applications are:
• triggering events in sound installations based on audience movement
• basic gesture tracking of a live performance
• motion analysis of a pre-composed video work
[Download pix_motion_sector macintosh binary, source, &
pitchHeight~ is inspired by the following paper on pitch circularity:
Deutsch, D., Dooley, K., and Henthorn, T. (2008). "Pitch circularity from tones comprising full harmonic series". Journal of the Acoustical Society of America 124: 589-597.
In it, Deutsch et al. describe a technique for gradually attenuating the odd harmonics of a pitched sound as it descends over a chromatic scale. By the bottom of the scale, the odd harmonics are fully attenuated, making the tone sound one octave higher - just above the starting tone of the scale. Like Shepard tones, scales constructed in this way seem to descend perpetually.
pitchHeight~ allows you to create these kinds of scales with live acoustic signals instead of synthetic tones. It is also a potentially interesting effect to apply to an instrument for a piece. The third inlet takes a height parameter between 0 and 1, where 0 is no change in height, and 1 pushes height to a full octave jump. Ambiguous pitch quality can be achieved using height values in between.
The technique is very similar to the example G06.octave.doubler.pd from the Pd documentation, except that pitchHeight uses FFT filtering. On my 2.5 Ghz MacBook Pro, it's more efficient than comb filtering in terms of CPU. It also offers finer control over the strength of the odd harmonics using a single scalar value.
Listen to the results in the demonstration movie below, and watch the odd partials gradually die away as the height slider is increased. Notice that the natural sound of the oboe is preserved throughout.
[Watch a short movie
] demonstrating the external.
[pitchHeight~-0.0.5 source & helpfiles
[pitchHeight~-0.0.5 macintosh binary & helpfiles
[pitchHeight~-0.0.4 windows binary & helpfiles
dryUp~ attenuates sustained resonance in a signal. It can be used to "remove" sustained sounds resulting from holding the pedal down on a piano, for instance. This is a technique that Miller Puckette originally developed in a patch. dryUp~ is my translation of the patch to C, in order to provide a compact, easy to use object that takes care of a lot of the annoying details for you. There are a number of reasons why this might be useful.
The technique is actually very simple, but needs to be very precisely executed in terms of timing. To be off by even 3 samples at 44.1 kHz (0.068 milliseconds) is enough to ruin the effect. It works by using phase vocoding to create a constant parallel signal that predicts what the input signal will sound like N samples in the future (N is 4096 by default, but you can adjust this). The predicted signal is then continuously subtracted from the input signal, leaving behind only the the sounds that could not have possibly been predicted - like the attack of a new note. For instruments with steady predictable resonance, e.g. vibraphones and pianos, the predicted signal is often very accurate. The effect will never completely erase resonance, but the attenuation can be dramatic with the right window size. As you'll hear in the example below, the effect also results in some un-natural sounding artifacts at maximum resonance suppression. With values in between, however, the signal can still sound very natural. You might want to use this range to get a little more attack out of a signal with a lot of resonant wash.
The creation argument is window size, which defaults to 4096 samples. Overlap is set to 4 and currently cannot be changed. This is something I'll be improving in the future. But the default overlap works quite well for a variety of window sizes.
In the accompanying help file, the resonance of each of three independently looping vibraphone samples can be controlled with a set of sliders. The sound example below begins with all three loops at full resonance, then with reduced resonance for a few seconds, then back to full resonance.
[Listen to the sound example
[Watch a short movie
] demonstrating the external.
[dryUp~-0.0.5 macintosh binary & helpfile
[dryUp~-0.0.5 windows binary & helpfile
[dryUp~-0.0.5 source & helpfile
[dryUp~ for Max/MSP 5 macintosh binary & helpfile
Soundhack Pd externals
This summer (2009) I began translating Tom Erbe's VST plugins to Pd externals. So far I've finished the delay trio (+delay~, +pitchdelay~, and +bubbler~) and the freesound bundle (+chebyshev~, +compand~, +decimate~, and +matrix~). Windows/Mac/Linux binaries plus help files by Daniel Arias are availble to download at [www.soundhack.com/externs.php
]. Max 5 binaries are also available, but much less thoroughly tested for bugs.
Timbre-matching Based Granular Synthesis
Like conventional granular synthesis, this technique (realized with my timbreGrains~ external for Pd) constructs an audio signal using short grains from a given sound file. The difference here is that each grain is chosen based on its degree of timbral similarity to the frame of audio coming in at any given moment. Timbral similarity is measured using the same technique employed in my bfcc~ external (described on this page). Depending on the sound file being used as a set of granular building blocks, an input signal can be mirrored in real time to create an interestingly similar yet distorted output. It can be used as an effect: mic a cello and have it construct an analagous signal using a database of farm animal sounds. Alternatively, one could generate tape pieces by vocalizing into a microphone while listening to and recording the output.
Below are two output examples that used the same input sound: a woman (psychologist Diana Deutsch) speaking a short phrase. The first was constructed using grains from 5 minutes of cello and violin playing. To illustrate the relationship between input and output signal, the example begins with just the voice, then fades in the real-time granular reconstruction of the voice, then fades out the voice. The second example uses grains from a recording of the same woman speaking completely different phrases than those in the input signal. The smoothness of the output signal varies according to the particular training set, input, and parameter settings in use. This one is pretty noisy.
Mixing things up, the third example used cello and violin playing as the grain source, and was driven by a recording of a panel discussion on NPR as an input. Both signals are present in order to illustrate the relationship. This recording was done in real time, and similar sounds can be created by speaking into a microphone directly (this is a very gratifying experience). After hearing both channels of the example for a while, try listening to only the output signal (left).
The larger the granular building block sound file, the more rich and interesting the output is. However, this means more and more comparisons per frame of audio. The best results I have found so far are at a window size of 1024 samples with an overlap of 4. The output is very quick to react since there are only 256 samples between frame updates, but that means that bfcc~ must compare the cepstrum of the input signal against a database of ~30,000 other cepstra every 5.8 milliseconds. The next step will be to use statistics and machine learning to avoid a brute force linear search.
Other ways to reduce CPU include restricting the search to a specific range of instances while manually controlling the center of the search neighborhood, or allowing the system to reorient itself in order to find the best neighborhood on its own. These are parameters of the timbreGrains~ external. Working in non real-time, these restrictions disappear, and we can wait for the more thorough linear search results. There is a further parameter for controlling the
weighting of match choice based on both the best match of the current input frame and the best match in light of the previous output frame (to smooth out discontinuities). The more that the choice is weighted according to previous output, the longer the search will take.
**As of October, 2009, timbreID includes functions for this type of synthesis, making timbreGrains~ unnecessary. Download the [timbreID-example package
] and look at the concatenative folder for a thoroughly documented example of how to perform this kind of synthesis using the timbreID externals.
Wii abstractions for Pd
These abstractions parse Wii remote info sent from [OSCulator
] and [DarwiinRemote
]. They require udpreceive, unpackOSC, and routeOSC from the mrpeach library to deal with OSC. The mrpeach externals are included with [Pd-extended
[download the OSCulator abstraction
[download the DarwiinRemote abstraction
Polyrhythm Metronome (for Edwin Harkins)
This is a polyrhythm metronome for Ed Harkins, and is intended to be nothing more than a practice tool. It’s an abstraction that works in Pd vanilla. Operation was designed to be as simple as possible. Click the "Tempo" number
box to enter a duration for the complete polyrhythm. Click the "Notes/cycle" number boxes to enter the number of attacks for the 1st and 2nd voices. Turn it on. The HAM (Helpful Alignment Mnemonic) is calculated and displayed in the post window after a polyrhythm is entered, and a sub-patch displays a crude "slash" style representation of the rhythm. You can choose the pitch (in MIDI note #) for either of the voices, and check the "Melodic" checkbox to have the voices climb up chromatically during the rhythm. There is independent volume control for each voice in case you'd like to concentrate on one more than the other. If you make changes to any parameters in the middle of a polyrhythm, they will take effect on the downbeat of the next cycle.
Finally, there's an option for opening a score. A score should be a text file where each line lists a polyrythmic notes/cycle pair. This way you can practice transitions between difficult patterns.
[download the metronome
] code runs a recursive algorithm to generate multiple melodic streams that fork and rejoin themselves until they run out of steam. Control parameters like forking and merging probabilities determine how long the whole structure will sustain itself. At this point, the melodic behavior is a simple random walk, but I plan to expand the code to allow for more unpredictable musical gestures.
[Listen to a piano-based
[Listen to a
[Listen to an absurd
Some generally useful SC code is available below. The idea is to provide common, simple tools in a bare-bones format that can be expanded for any purpose.
• Basic [sampling keyboard
• [Multi-channel soundfile player
] for free and easy n-channel surround playback.
• [Pitch Tracker
] to analyze samples or live input.
• [Spectral analysis tool
• A [self organizing map
] that demonstrates some SC drawing ability.
• Fullscreen [video player
A basic familiarity with microcontrollers can come in handy. My introduction was through Tom Igoe’s book [Physical
], and the first microconroller I used was the Basic Stamp in order to make a MIDI knob box. Basic Stamps are very expensive.
PIC microcontrollers are made by [Microchip
]. The wonderful thing about PICs is that they are inexpensive (in small quantities: free), powerful, and available in a very wide variety of models. The unfortunate thing for Macintosh or Linux users is that the IDE (MPLAB) is only available for Windows. One solution to this problem is to turn to other chip manufacturers that do provide software on other platforms
Another solution is to use some well developed open source tools for the PIC.
I have documented my process of installing piklab-prog on a Macintosh
]. In terms of software, what you need are the
] - the standard IDE for
] - a package manager for installing Linux
based software on a Macintosh.
] - this open source usb library can simply
be installed via Fink once you have it.
] - a collection of open source PIC
] - the Small Device C Compiler to
compile your programs.
] - the open source IDE for programming
In the end, I didn’t use the Piklab IDE, just the command line program piklab-prog to actually program my chips. Compiling C code and generating the final hex file can all be done in the command line with a simple script. A template version of the script I use can be found [here
If you’re wondering which of the many PICs to use, popular models are the 18F252 and 18F452. Both have analog input pins (the 452 has more).
More information my be posted here in the future if the process of building Piklab and its toolchain changes for better or worse.
Contact microphones are extremely easy to build, and very cheap. Assuming you already have a soldering iron and some solder, the microphone pictured here costs about $2.
[This recording of R31
] (featuring Fabio Oliveira performing on amplified Waterphone and myself on live
electronics) provides a good example of the kind of amplification you’ll get from these microphones. Their frequency response is very limited, but the sounds they open up can be very interesting. The parts list and instructions I put together can be found [here