Centroid

Centroid is a routine for tracking the centroid of a sound. The centroid is the average of all the frequencies weighted by their amplitudes. It essentially gives you a kind of center frequency value for your spectrum. The analysis can be restricted to a band of frequencies, allowing the centroid to track a particular frequency component (although pitchtracker can do this as well). Selecting floats or ASCII will produce a file suitable for use in the control of a parameter.

Back to Main


Analysis Frames per Second
Begin Time in Seconds
Centroid Enveloepe Warp
Data Type
End Time in Seconds
Envelope Ascent Time in Seconds
Envelope Descent Time in Seconds
FFT Length
High Frequency/Pitch Boundary
Low Frequency/Pitch Boundary
Multiple Channel Method
Output Data Format
Output Data Type
Output Samples per Second
Print Elapsed Time
Reference Pitch in Semitones of Deviation
Resynthesis Channel
Spectrum Warpshape Index
Weighting Frequency
Window Size in Samples
Window Type

Analysis Frames

This controls how often the phase vocoder will perform an analysis on the signal. It is a translation of the classic decimation control that specifies how many samples to skip between analysis frames. More frames increases the resolution of time but decrease speed. 200 frames per second is a good reference point. If you expand time you should increase this proportionately to maintain about 200 or more frames per second.


Begin Time

The time, in seconds, at which to begin processing the soundfile.


Centroid Envelope Warp

Many of the routines employ the principle of warping in which a distribution of values is transformed by an identity function. In these places an exponential function is employed to remap a 0-1 range of values into a new orientation that preserves the minima (0) and maxima (1) while bringing the distribution closer to either extreme as a result of the curvature of the exponential function selected. The curvature of the exponential function is selected through a warp index. Specifically, warp index w will reorient the input x through the function below (^ = exponentiation).

y = (1. - (e^(x * w))) / (1. - (e^w))

In this function, the warp index of 0 produces a linear function and an untransformed output. Positive warp index values of increasing magnitude produce curves of increasing concavity (increasing slope) that draw values towards the 0-valued minima, and reduce the function integral. Negative values do the opposite, drawing values towards the maxima of 1, increasing the integral.

The practical use of this mechanism is found in various places. One such place is the reshaping of the frequency response distribution characteristics. In this, positive warp indeces cause the peaks of the response to be accentuated while the weaker frequencies are expanded out (i.e. pushed towards 0). Negative values have the opposite effect as they compress the dynamic range of the response and raise the relative level of the weaker noise components. Another place where warp applies is in the remapping of FFT amplitudes through the spectrum warpshape. In this, the sucessive FFT frames have their amplitudes remapped by the identity function, similiarly expanding or compressing the dynamic range depending upon the warp specified; 0 (linear warp function) leaves the amplitudes unchanged.


Data Type

Determines how Centroid will read the values in the fields for upper and low frequency/pitch boundaries. If this value is 0, they will be read as frequencies. If it is 1, they will be read as octave.pitchclass.


Envelope Ascent Time in Seconds


Envelope Descent Time in Seconds


FFT Length

The FFT size must be a power of 2. Larger FFT sizes resolve frequencies better but transient behavior more poorly. Choose your FFT size according to the sound you are working with. A size of 1024 or 2048 works well in most cases.


High Frequency/Pitch Boundary

The upper boundary used when analyzed the input soundfile. Frequencies/pitches above this will be ignored


Low Frequency/Pitch Boundary

The lower boundary used when analyzed the input soundfile. Frequencies/pitches above this will be ignored


Multiple Channel Method

Determines whether the output file will contain data on the peak or average amplitude for each frame. 0 indicates peak amplitude, 1 indicates average.


Output Data Format

Determines what kind of data is output by Centroid:

0 = Frequency Units

1 = Pitch In Octave Units

2 = Octave.pitchclass Code

3 = Semitones of Deviation from Reference Pitch

4 = Inverted Semitones of Deviation from Reference Pitch


Output Data Type

Determines how the data will be saved. 0 indicates an ASCII file, 1 indicates 32-bit floats.


Output Samples per Second


Print Elapsed Time

If this is set to 1, Centroid will continually print out what time index it is at in the soundfile during execution. 0 turns this feature off.


Reference Pitch in Octave.Pitchclass

This parameter is used in certain types of output format. If output format is set to 3 (semitones of deviation from reference pitch), or 4 (inverted semitones of deviation from reference pitch), this pitch is the benchmark from which they are measured. The parameter is in octave.pitchclass form.


Resynthesis Channel

All routines allow both monophonic and multi-channel input files to be processed. With multi-channelled files, you can either select one channel and produce a monophonic output file, or process all the channels. Channels are numbered beginning with 1. Processing of multi-channelled files is done one channel at a time beginning with channel 1, with zeros written to channels which have yet to be processed. Processing one channel at a time requires less memory and allows you to audition the output sooner than if you did all channels at once.

Use 0 to process all channels.


Spectrum Warpshape Index

Many of the routines employ the principle of warping in which a distribution of values is transformed by an identity function. In these places an exponential function is employed to remap a 0-1 range of values into a new orientation that preserves the minima (0) and maxima (1) while bringing the distribution closer to either extreme as a result of the curvature of the exponential function selected. The curvature of the exponential function is selected through a warp index. Specifically, warp index w will reorient the input x through the function below (^ = exponentiation).

y = (1. - (e^(x * w))) / (1. - (e^w))

In this function, the warp index of 0 produces a linear function and an untransformed output. Positive warp index values of increasing magnitude produce curves of increasing concavity (increasing slope) that draw values towards the 0-valued minima, and reduce the function integral. Negative values do the opposite, drawing values towards the maxima of 1, increasing the integral.

The practical use of this mechanism is found in various places. One such place is the reshaping of the frequency response distribution characteristics. In this, positive warp indeces cause the peaks of the response to be accentuated while the weaker frequencies are expanded out (i.e. pushed towards 0). Negative values have the opposite effect as they compress the dynamic range of the response and raise the relative level of the weaker noise components. Another place where warp applies is in the remapping of FFT amplitudes through the spectrum warpshape. In this, the sucessive FFT frames have their amplitudes remapped by the identity function, similiarly expanding or compressing the dynamic range depending upon the warp specified; 0 (linear warp function) leaves the amplitudes unchanged.


Weighting Frequency


Window Size in Samples

The window size is a less opaque parameter; like the FFT, it must be a power of 2. Windows twice the size of the FFT work well. Larger window sizes may resolve frequencies better. Specifying 0 for the window size will automatically set the window to twice the FFT size.


Window Type

The FFT and inverse FFT are computed using a window. Like the FFT size, the shape of the window used can effect the quality of the analysis and resynthesis. (See F.R.Moore, Stieglitz, or Roads for further explanation.) A variety of windows are available including: Hamming, Rectangular, Blackman, Triangular, and Kaiser (in 8 different forms as related to 8 different alpha values). Blackman (-w2) or Kaiser (-w8) are recommended for most applications. In some unusual cases where transient behavior is being lost, consider using other windows such as the Rectangular, although take care to assure that it is not producing pops or a buzzy sound.