divider

timbreID, bfcc~ & other spectral features

timbreSpace_1-img timbreSpace_2-image

This is a collection of timbre feature analysis externals for [Pd]. The classification extern (timbreID) accepts arbitrary lists of features and attempts to find the best match between an input feature and previously stored instances of training data. Besides doing identification, timbreID is also designed to facilitate real time concatenative synthesis and timbre-based orderings of sound sets. Its usage is explained more fully in the accompanying helpfile, but here's a rundown of the basic functionality:


• Feature lists sent to the first inlet are processed by the "train" function. Give example features upon which future comparisons will be based. Once a training database has been created, it can be saved to a .timid file with the “write” method, and later retrieved with the “read" method. Other output formats are .txt, .mat (for use with MATLAB or octave), and ARFF (for use with WEKA).

• Feature lists sent to the second inlet are compared with those in the training database, and a match is identified. Instances are distinguished by index number, so if the closest match was the instance at index 7, the number 7 will appear at the first outlet.

• Feature lists sent to the third inlet also produce matches, but this inlet is designed specifically for concatenative synthesis. Other messages that direct and restrict the database search can be used in combination with this outlet. See the granular example in the [timbreID examples package] for a detailed explanation, and listen to this [sound example] of a voice reconstructed using grains from a sound file of bowed string instruments. [This sound example] reconstructs the same voice sample using grains from a sound file of noisy low-quality cartoon sound effects. Both examples start with just the voice, then fade in the real-time granular reconstruction of the voice, then fade out the voice.

• A clustering algorithm can be run using the "cluster" message, after which instances will be grouped into a desired number of clusters that represent instruments. This way, there is no need to specify exactly how many examples will be given during training. After clustering, timbreID will output the associated cluster index of the best matching instance. When instances have been grouped into clusters, matching is performed using a k-nearest-neighbor strategy, for which you can specify k. If automatic clustering doesn't work out, or if you wish to group dissimilar sounds together, a manual option is available as well.

• The fourth outlet reports lists of orderings based on given starting timbres. This way you can analyze a set of sounds and have timbreID propose orderings that have smooth timbre transitions. See the timbre-order example in the [timbreID examples package] for a detailed explanation. You can also listen to this [sound example] of a set of sounds played in random order followed by an order produced by timbreID. Other ordering examples can be heard below.

• Individual weights can be assigned to feature list components. For instance, if your feature list consists of BFCCs and spectral centroid, you can weight the latter to be half as influential as the former in calculating distance.

• Since list length is arbitrary, an effective way to capture the temporal evolution of a sound is to concatenate analysis results of multiple overlapping windows. This analysis strategy is used in the timbre-order example in the [timbreID examples package], and graphs of the analyses can be viewed as they occur in real time.


The most powerful feature externs are based on cepstral analysis. cepstrum~ outputs the raw cepstrum of an analysis frame, and mfcc~ is another version that warps the initial FFT results to the mel frequency scale. It clearly performs more reliably. The technique of deriving MFCCs is described in Rabiner & Juang’s Fundamentals of Speech Recognition. In a nutshell, all FFT bins are run through a filterbank composed of overlapping triangular filters, boiling the spectrum down to 20 or so numbers which are then subjected to a discrete cosine transform. This weighting of the frequency axis may be more in line with how we perceive timbre.

bfcc~ is the most developed cepstral external, using the more thoroughly researched Bark scale rather than mels for the spectrum weighting. Performance is only slightly better than mfcc~. The most noteworthy thing about these cepstral externs is the flexibility they provide with respect to filterbank construction. The choice of a specific Bark- or mel-spacing can have a real impact on how relevant the features are in classifying one sound set vs. another.

Other spectral features in the package are magSpec~, barkSpec~, specBrightness~, specCentroid~, specFlatness~, specFlux~, specRolloff~, and zeroCrossing~. The idea is to provide a diverse set of tools that can be used for creative research of timbre classification and measurement. In addition to being used to create feature vectors to send to the timbreID classifier, these feature externs are useful for generating real-time control streams based on timbre, and ordering a group of sounds by timbre.

As of version 0.3.0, all of the features have a non-real-time version for analyzing samples loaded to arrays. This makes the training step very convenient for things like concatenative synthesis and timbre plotting.


Publications:

[A Timbre Analysis and Classification Toolkit for Pure Data] : an introduction to the timbreID collection of objects as of December 2009, including descriptions of the example patches.

[Cepstral Analysis Tools for Percussive Timbre Identification] : a detailed rundown of the effectiveness of various features in a basic classification problem. From the proceedings of the 3rd Pure Data Convention in Sao Paulo, 2009.

[Perceptually Based Pitch Scales in Cepstral Techniques for Percussive Timbre Identification] : a comparison of raw cepstrum, mel-frequency cepstrum, and Bark-frequency cepstrum. From the proceedings of the ICMC in Montreal, 2009.


Audio/Video examples:

[A short movie] demonstrating the beginnings of a CataRT-style timbre-space browser using Pd/GEM. 68 percussion samples.

[Another short movie] demonstrating the timbre-space browser. 2500 violin/cello grains.

[Another short movie] demonstrating the timbre-space browser. 847 voice grains.

[A short movie] illustrating how to order by timbre and browse the results of the clustering algorithm using a database of 650 viola sounds played by David Medine (the diminished 7th chord you hear is just a by-product of the fact that the samples were done in minor 3rds).

[ A short movie] demonstrating the vowel identification example patch.

[Concatenative synthesis sound example]

[Timbre order sound example] using percussion samples performed by [Stephen Solook]

[Timbre order sound example] using ~5 minutes of cartoon sounds. The file was divided into 3696 grains of sound, then played back at a rate of one grain every 20 ms. All the noisiness, pops, etc are part of the [original file], but notice how these noisy areas are grouped together.

[Timbre order sound example] using 5000 grains of sound from a recording of a male voice speaking for 5 minutes. First the grains are played in random order for a few seconds as a point of comparison, then you hear the version ordered by timbreID. Notice how "sss" and "shh" sounds get grouped together.

[Timbre order sound example] using grains of sound from a recording of a female voice speaking. Again, the grains are played in random order for a few seconds as a point of comparison, then you hear the version ordered by timbreID.


Downloads:

[timbreID-0.3.7 macintosh binaries & helpfiles]
[timbreID-0.3.7 windows binaries & helpfiles]
[timbreID-0.3.7 source & helpfiles]

[Documented example patches] demonstrating possible uses of the timbreID set, including concatenative synthesis, timbre ordering, timbre-space mapping, and vowel identification.




divider

tabletool

tabletool-image

The tabletool external manipulates, searches, and provides information about tables in Pd. These are basically most of the array manipulation functions I miss from SuperCollider, which makes dealing with tables much more convenient. So far, its features include:

- dumping a table to a list
- reporting the length of a table
- offsetting (adding a constant to all values)
- scaling (multiplying all values by a scalar)
- curve smoothing
- shifting values forward or back by index
- removing a value at a given index
- inserting a value at a given index
- randomly choosing a value from the table
- filling a table with an arithmetic series
- generating a random walk
- shuffling values randomly
- sorting values from low to high
- sorting a sub-range of values from low to high
- reversing
- inverting
- normalizing to a specified range
- normalizing so values sum to 1.0
- copying a source table to a target table
- copying a specified range of a source table to a target table
- concatenating two tables into one table
- lacing alternating values from two tables into one table
- table absolute value
- table sum
- table mean
- table arithmetic mean
- table geometric mean
- table integration
- table differentiation
- adding two tables
- subtracting two tables
- multiplying two tables
- dividing two tables
- dot product between two tables
- Euclidean distance between two tables
- Manhattan distance between two tables
- Pearson correlation coefficient between two tables
- reporting minimum value (and its index)
- reporting maximum value (and its index)
- reporting the closest value in the table to a given value (and its index)
- number of occurrences of a given value (and a list of their indices)
- number of occurrences of a value less than a given value (and a list of their indices)
- number of occurrences of a value greater than a given value (and a list of their indices)
- number of occurrences of values between two boundary values (and a list of their indices)


Want an additional feature? Send me a message and I'll fold it in...

[tabletool macintosh binaries & helpfiles]
[tabletool windows binaries & helpfiles]
[tabletool source & helpfiles]




divider

pix_motion_sector

pix_motion_sector_image

pix_motion_sector is an extension of an existing Pd/GEM object: pix_movement. While pix_movement detects motion in the video input, pix_motion_sector only looks for motion within a user-specified rectangle within that input. A second difference is that pix_movement computes the distance between pixels in the current and previous frames in a one-dimensional greyscale space, while pix_motion_sector operates in a three-dimensional RGB space. This should make it more sensitive. Several instances of pix_motion_sector can be chained together to define multiple independent areas of motion detection. This can be functionally equivalent to the cyclops external written for Max/MSP/Jitter.

Possible applications are:
• triggering events in sound installations based on audience movement
• basic gesture tracking of a live performance
• motion analysis of a pre-composed video work

[Download pix_motion_sector macintosh binary, source, & helpfile].




divider

pitchHeight~

pitchHeight-image

pitchHeight~ is inspired by the following paper on pitch circularity:

Deutsch, D., Dooley, K., and Henthorn, T. (2008). "Pitch circularity from tones comprising full harmonic series". Journal of the Acoustical Society of America 124: 589-597.

In it, Deutsch et al. describe a technique for gradually attenuating the odd harmonics of a pitched sound as it descends over a chromatic scale. By the bottom of the scale, the odd harmonics are fully attenuated, making the tone sound one octave higher - just above the starting tone of the scale. Like Shepard tones, scales constructed in this way seem to descend perpetually.

pitchHeight~ allows you to create these kinds of scales with live acoustic signals instead of synthetic tones. It is also a potentially interesting effect to apply to an instrument for a piece. The third inlet takes a height parameter between 0 and 1, where 0 is no change in height, and 1 pushes height to a full octave jump. Ambiguous pitch quality can be achieved using height values in between.

The technique is very similar to the example G06.octave.doubler.pd from the Pd documentation, except that pitchHeight uses FFT filtering. On my 2.5 Ghz MacBook Pro, it's more efficient than comb filtering in terms of CPU. It also offers finer control over the strength of the odd harmonics using a single scalar value.

Listen to the results in the demonstration movie below, and watch the odd partials gradually die away as the height slider is increased. Notice that the natural sound of the oboe is preserved throughout.


[Watch a short movie] demonstrating the external.

[pitchHeight~-0.0.4 macintosh binaries & helpfiles]
[pitchHeight~-0.0.4 windows binaries & helpfiles]
[pitchHeight~-0.0.4 source & helpfiles]




divider

dryUp~

dryUp-image

dryUp~ attenuates sustained resonance in a signal. It can be used to "remove" sustained sounds resulting from holding the pedal down on a piano, for instance. This is a technique that Miller Puckette originally developed in a patch. dryUp~ is my translation of the patch to C, in order to provide a compact, easy to use object that takes care of a lot of the annoying details for you. There are a number of reasons why this might be useful.

The technique is actually very simple, but needs to be very precisely executed in terms of timing. To be off by even 3 samples at 44.1 kHz (0.068 milliseconds) is enough to ruin the effect. It works by using phase vocoding to create a constant parallel signal that predicts what the input signal will sound like N samples in the future (N is 4096 by default, but you can adjust this). The predicted signal is then continuously subtracted from the input signal, leaving behind only the the sounds that could not have possibly been predicted - like the attack of a new note. For instruments with steady predictable resonance, e.g. vibraphones and pianos, the predicted signal is often very accurate. The effect will never completely erase resonance, but the attenuation can be dramatic with the right window size. As you'll hear in the example below, the effect also results in some un-natural sounding artifacts at maximum resonance suppression. With values in between, however, the signal can still sound very natural. You might want to use this range to get a little more attack out of a signal with a lot of resonant wash.

The creation argument is window size, which defaults to 4096 samples. Overlap is set to 4 and currently cannot be changed. This is something I'll be improving in the future. But the default overlap works quite well for a variety of window sizes.

In the accompanying help file, the resonance of each of three independently looping vibraphone samples can be controlled with a set of sliders. The sound example below begins with all three loops at full resonance, then with reduced resonance for a few seconds, then back to full resonance.

[Listen to the sound example]

[Watch a short movie] demonstrating the external.

[dryUp~-0.0.5 macintosh binary & helpfile]
[dryUp~-0.0.5 windows binary & helpfile]
[dryUp~-0.0.5 source & helpfile]

[dryUp~ for Max/MSP 5 macintosh binary & helpfile]




divider

Subjective timbre judgment research

timbre_research_image

Though many acoustic measurements are used in an attempt to quantify timbre, in the end it is a highly subjective characteristic of sound. Experiments by Grey, Wessel, and Krumhansl (among others) have sought a relationship between these realms of quantity and quality by compiling databases of subjective timbre judgments and identifying acoustic measurements that seem to fit.

I am in the process of designing an experiment that follows this model, and have recently put together a pilot version that is limited in scope. Using the data I get back from this early test, I'll be able to develop a larger experiment with some very useful insight. If you would like to participate at this stage, the necessary files can be downloaded below. It runs in the [Pd] programming environment, but requires no knowledge of Pd. A README file is inlcluded in the main folder of the package, and will describe what you need to do in order to get started.

[Download Pd at Miller Puckette’s website]
[Download the experiment package]




divider

Soundhack Pd externals

pluginChain

This summer (2009) I began translating Tom Erbe's VST plugins to Pd externals. So far I've finished the delay trio (+delay~, +pitchdelay~, and +bubbler~) and the freesound bundle (+chebyshev~, +compand~, +decimate~, and +matrix~). Windows/Mac/Linux binaries plus help files by Daniel Arias are availble to download at [www.soundhack.com/externs.php]. Max 5 binaries are also available, but much less thoroughly tested for bugs.




divider

Timbre-matching Based Granular Synthesis


Like conventional granular synthesis, this technique (realized with my timbreGrains~ external for Pd) constructs an audio signal using short grains from a given sound file. The difference here is that each grain is chosen based on its degree of timbral similarity to the frame of audio coming in at any given moment. Timbral similarity is measured using the same technique employed in my bfcc~ external (described on this page). Depending on the sound file being used as a set of granular building blocks, an input signal can be mirrored in real time to create an interestingly similar yet distorted output. It can be used as an effect: mic a cello and have it construct an analagous signal using a database of farm animal sounds. Alternatively, one could generate tape pieces by vocalizing into a microphone while listening to and recording the output.

Below are two output examples that used the same input sound: a woman (psychologist Diana Deutsch) speaking a short phrase. The first was constructed using grains from 5 minutes of cello and violin playing. To illustrate the relationship between input and output signal, the example begins with just the voice, then fades in the real-time granular reconstruction of the voice, then fades out the voice. The second example uses grains from a recording of the same woman speaking completely different phrases than those in the input signal. The smoothness of the output signal varies according to the particular training set, input, and parameter settings in use. This one is pretty noisy.

Mixing things up, the third example used cello and violin playing as the grain source, and was driven by a recording of a panel discussion on NPR as an input. Both signals are present in order to illustrate the relationship. This recording was done in real time, and similar sounds can be created by speaking into a microphone directly (this is a very gratifying experience). After hearing both channels of the example for a while, try listening to only the output signal (left).

The larger the granular building block sound file, the more rich and interesting the output is. However, this means more and more comparisons per frame of audio. The best results I have found so far are at a window size of 1024 samples with an overlap of 4. The output is very quick to react since there are only 256 samples between frame updates, but that means that bfcc~ must compare the cepstrum of the input signal against a database of ~30,000 other cepstra every 5.8 milliseconds. The next step will be to use statistics and machine learning to avoid a brute force linear search.

Other ways to reduce CPU include restricting the search to a specific range of instances while manually controlling the center of the search neighborhood, or allowing the system to reorient itself in order to find the best neighborhood on its own. These are parameters of the timbreGrains~ external. Working in non real-time, these restrictions disappear, and we can wait for the more thorough linear search results. There is a further parameter for controlling the weighting of match choice based on both the best match of the current input frame and the best match in light of the previous output frame (to smooth out discontinuities). The more that the choice is weighted according to previous output, the longer the search will take.

**As of October, 2009, timbreID includes functions for this type of synthesis, making timbreGrains~ unnecessary. Download the [timbreID-example package] and look at the concatenative folder for a thoroughly documented example of how to perform this kind of synthesis using the timbreID externals.

[Example 01]
[Example 02]

[Strings and NPR example]




divider

Wii abstraction for Pd


This parses Wii remote info sent from [darwiinosc]. It requires udpreceive, unpackOSC, and OSCroute from the mrpeach library to deal with OSC. The mrpeach externals are included with [Pd-extended].

[download the abstraction]




divider

Polyrhythm Metronome (for Edwin Harkins)

metronome

This is a polyrhythm metronome for Ed Harkins, and is intended to be nothing more than a practice tool. It’s an abstraction that works in Pd vanilla. Operation was designed to be as simple as possible. Click the "Tempo" number box to enter a duration for the complete polyrhythm. Click the "Notes/cycle" number boxes to enter the number of attacks for the 1st and 2nd voices. Turn it on. The HAM (Helpful Alignment Mnemonic) is calculated and displayed in the post window after a polyrhythm is entered, and a sub-patch displays a crude "slash" style representation of the rhythm. You can choose the pitch (in MIDI note #) for either of the voices, and check the "Melodic" checkbox to have the voices climb up chromatically during the rhythm. There is independent volume control for each voice in case you'd like to concentrate on one more than the other. If you make changes to any parameters in the middle of a polyrhythm, they will take effect on the downbeat of the next cycle.

Finally, there's an option for opening a score. A score should be a text file where each line lists a polyrythmic notes/cycle pair. This way you can practice transitions between difficult patterns.

[download the metronome]




divider

Arboresence example

arboresence_image

This [SuperCollider] code runs a recursive algorithm to generate multiple melodic streams that fork and rejoin themselves until they run out of steam. Control parameters like forking and merging probabilities determine how long the whole structure will sustain itself. At this point, the melodic behavior is a simple random walk, but I plan to expand the code to allow for more unpredictable musical gestures.

[Listen to a piano-based realization]
[Listen to a vibraphone-based realization]
[Listen to an absurd multi-sample realization]
[Download arboresence.rtf]




divider

SuperCollider Code


Some generally useful SC code is available below. The idea is to provide common, simple tools in a bare-bones format that can be expanded for any purpose.

• Basic [sampling keyboard].
• [Multi-channel soundfile player] for free and easy n-channel surround playback.
• [Pitch Tracker] to analyze samples or live input.
• [Spectral analysis tool].
• A [self organizing map] that demonstrates some SC drawing ability.
• Fullscreen [video player].




divider

PIC Microcontrollers


A basic familiarity with microcontrollers can come in handy. My introduction was through Tom Igoe’s book [Physical Computing], and the first microconroller I used was the Basic Stamp in order to make a MIDI knob box. Basic Stamps are very expensive.

PIC microcontrollers are made by [Microchip]. The wonderful thing about PICs is that they are inexpensive (in small quantities: free), powerful, and available in a very wide variety of models. The unfortunate thing for Macintosh or Linux users is that the IDE (MPLAB) is only available for Windows. One solution to this problem is to turn to other chip manufacturers that do provide software on other platforms ([AVR] via [Arduino]). Another solution is to use some well developed open source tools for the PIC.

I have documented my process of installing piklab-prog on a Macintosh [here]. In terms of software, what you need are the following things:

• [XCode] - the standard IDE for Macintosh.
• [Fink] - a package manager for installing Linux based software on a Macintosh.
• [libusb] - this open source usb library can simply be installed via Fink once you have it.
• [gputils] - a collection of open source PIC utilities.
• [sdcc] - the Small Device C Compiler to compile your programs.
• [Piklab] - the open source IDE for programming PICs.

In the end, I didn’t use the Piklab IDE, just the command line program piklab-prog to actually program my chips. Compiling C code and generating the final hex file can all be done in the command line with a simple script. A template version of the script I use can be found [here].

If you’re wondering which of the many PICs to use, popular models are the 18F252 and 18F452. Both have analog input pins (the 452 has more).

More information my be posted here in the future if the process of building Piklab and its toolchain changes for better or worse.




divider

Contact Microphones

contact

Contact microphones are extremely easy to build, and very cheap. Assuming you already have a soldering iron and some solder, the microphone pictured here costs about $2.

[This recording of R31] (featuring Fabio Oliveira performing on amplified Waterphone and myself on live electronics) provides a good example of the kind of amplification you’ll get from these microphones. Their frequency response is very limited, but the sounds they open up can be very interesting. The parts list and instructions I put together can be found [here].