Final project for Northwestern University EECS 352,Instructor Prof. Bryan Pardo.

Team member: Zhe Chen, Chenxing Wu, Maxin Chen


View on GitHub Download Zip files View extended abstract View poster

Introduction


What is Tune Helper

Tune Helper is a program that can "Speech-to-sing", tuning human voice to sing a simple melody, or "Auto-tuning", creating perfectly tuned vocals by altering off-key pitches, or "Speech-to-rap", creating rap by shifting syllables to beat point.

It is written and built in MATLAB.

Motivation

Our final project Tune Helper is inspired by music software Auto-Tune, which can auto correlate vocals, creating a very cool electronic vocal effect. (See Related Works for details)

Goals and Process

We want to implement the basic function of Auto-Tune, pitch auto-correction, on MATLAB by ourselves. And we also want to create an interesting function, which uses human speech and a simple melody as input, after the audio processing procedure, combining human speech with melody, producing an audio which is a tuned human speech that its pitch is exactly matched to the input melody's.

We use following audio processing methods to to achieve these functions:
- Pitch tracking: analyze input human voice and melody, obtaining their time and frequency information.
- Beat tracking: detect beat and match voice with input melody.
- Phase vocoder: time stretch and compress.

Check out approaches and examples below to see how it works.

Back to top

Approaches


We use primarily these three methods, pitch tracking, beat tracking and phase vocoder, to write the core functions of Tune Helper.

Pitch Tracking:
We use praat_pd.m pitch tracker to extract the time, frequency and amplitude information from input audio.
Usage: [frq_path, autoCorr_path, time, amp] = praat_pd(y,fs,0);

Beat Tracking:
We use the method from paper: Beat Tracking by Dynamic Programming.

Phase Vocoder:
Wiki:http://en.wikipedia.org/wiki/Phase_vocoder
We use PV.m phase vocoder to perform time expansion, time compression and pitch shifting for audio clip.
Usage: y=PV(x,fs,hopin,hopout);


Back to top

Examples


Here are three examples to show and demonstrate the audio processing procedure of Tune Helper.

1. Speech-to-sing

We use the famous nursery rhyme "Mary had a little lamb" as the input of Speech-to-sing.

Human voice input:
Piano melody input:
Output of Speech-to-sing:

Explanation:
Tune Helper first uses beat tracking method to analyze both human voice input and piano melody input, extracting the time, frequency and amplitude information. It then uses phase vocoder to shift the pitch of human voice input, according to the relative pitch height information of piano melody input extracted by pitch tracker. After the processing procedure, the result is a singing speech.
(Check out Program Structure for details)

Measurement:
We present the time-frequency spectrograms of human voice input, piano melody input, and output audio, showed below. Comparing these three spectrograms, we can find out that each syllable can be detected and shifted to the notes in melody, showed in Figure 1.3. Using our program, user can import a simple melody, record the speech, and get a singing-speech.


Figure 1.1 Spectrogram of speech

Figure 1.2 Spectrogram of piano melody, "Mary had a little lamb"

Figure 1.3 Spectrogram of singing-speech

2. Auto-tuning

Vocal input:
Output of Auto-tuning:

Explanation:
Tune Helper first performs pitch tracking on input vocal, extracting its time, frequency, amplitude and pitch information. It then uses these information to calculate the nearest note for each available frequency point of input vocal. If current point is off-key, Tune Helper performs a pitch shifting by phase vocoder, shifting pitch to the nearest music note. After the processing procedure, the output audio is an auto-tuned vocal.
(Check out Program Structure for details)

Measurement of success:
We present the time-frequency spectrograms of both input audio and output audio, showed as follows. From these two spectrograms, it is easy to find out, that comparing to the input audio, the frequencies of auto-tuning output audio has been shifted to nearest notes, showing as "steps" in Figure 6.5.


Figure 1.4 Spectrogram of vocal input

Figure 1.5 Spectrogram of auto-tuning output

3. Speech-to-rap

Speech input:
Rap output (quick version):
Rap output (slow version):

Explanation: Use beat tracking method to detect onset of input speech. Adjust pitch to the beat point of preset drumset.

Back to top

Program Structure


Tune Helper consists of two functions, Auto-tuning, and Speech-to-sing.The structure and core functions of these two functions are showed as follows.

Auto-tuning


Figure 2.1

Speech-to-sing


Figure 2.2

Speech-to-rap


Figure 2.3

Back to top

User Interface


The user interface of Tune Helper is written and built in MATLAB as well. User can record vocals and speech when hitting the "Record" button showing on upper right.


Figure 3.1 Tune Helper user interface

Back to top

Conclusion


In our project, we achieved both auto-tuning and speech-to-sing functions by using pitch tracking, beat tracking and phase vocoder. We use spectrograms to analyze and compare the result with input. We can get an ideal audio output by our program, Tune Helper.

However, the quality of output audio is not very satisfying. The algorithm of tuning and pitch shifting should be improved in the future.

Back to top

Related Works


There are several existing software and papers that are related to our project, showed as follows.

Software: Auto-Tune

Wiki: http://en.wikipedia.org/wiki/Auto-Tune
Introduction: Auto-Tune is a software that used to create perfectly tuned vocals. It is based on the phase vocoder principle. Also, it is widely used in music industry.

Mobile App: Songify by Smule

iTunes App Store:https://itunes.apple.com/us/app/songify-by-smule/id438735719?mt=8
Introduction: Songify can automatically turn speech into music, creating the vocal effect that appears in the popular video "AutoTune The News" on YouTube.

Paper:

[1]. Middleton, Gareth. "Frequency Domain Pitch Correction." Connexions, December 17 (2003).
[2]. Laroche, Jean, and Mark Dolson. "Improved phase vocoder time-scale modification of audio." Speech and Audio Processing, IEEE Transactions on 7.3 (1999): 323-332.
[3]. Tyrangiel, Josh. "Auto-tune: Why pop music sounds perfect." Time Magazine (2009): 1877372-3.
[4]. Bello, Juan Pablo, Giuliano Monti, and Mark B. Sandler. "Techniques for Automatic Music Transcription." ISMIR. 2000.
[5]. McFee, Brian, and Daniel PW Ellis. "Better beat tracking through robust onset aggregation." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[6]. Ellis, Daniel PW. "Beat tracking by dynamic programming." Journal of New Music Research 36.1 (2007): 51-60.

Back to top