Skip to content

Commit c183f3e

Browse files
committed
first commit
1 parent 0cc3ab1 commit c183f3e

20 files changed

+2104
-1
lines changed

.gitignore

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
LibriSpeech/
2+
.ipynb_checkpoints/
3+
.DS_Store
4+
__pycache__/
5+
*.tar.gz
6+
*.pyc
7+
*.json

README.md

+182-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,182 @@
1-
# AIND-VUI-Capstone
1+
[//]: # (Image References)
2+
3+
[image1]: ./images/pipeline.png "ASR Pipeline"
4+
5+
## Project Overview
6+
7+
In this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!
8+
9+
![ASR Pipeline][image1]
10+
11+
We begin by investigating the [LibriSpeech dataset](http://www.openslr.org/12/) that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations.
12+
13+
## Project Instructions
14+
15+
### Getting Started
16+
17+
1. Clone the repository, and navigate to the downloaded folder.
18+
19+
```
20+
git clone https://github.com/udacity/AIND-VUI-Capstone.git
21+
cd AIND-VUI-Capstone
22+
```
23+
24+
2. Activate the virtual environment, and switch Keras backend to Tensorflow.
25+
26+
For __Mac/OSX__ or __Linux__:
27+
```
28+
conda env create -f requirements/aind-vui-mac-linux.yml
29+
source activate aind-vui
30+
KERAS_BACKEND=tensorflow python -c "from keras import backend"
31+
```
32+
33+
For __Windows__:
34+
```
35+
conda env create -f requirements/aind-vui-windows.yml
36+
activate aind-vui
37+
set KERAS_BACKEND=tensorflow
38+
python -c "from keras import backend"
39+
```
40+
41+
3. If you have access to a local GPU, follow [Tensorflow's instructions](https://www.tensorflow.org/install/) for installing Tensorflow with GPU support.
42+
43+
4. If you are using __Windows__, please obtain a couple of additional requirements:
44+
- [librosa](https://librosa.github.io/librosa/)==0.5.1
45+
- [soundfile](https://pysoundfile.readthedocs.io/en/0.9.0/)==0.9.0.post1
46+
- [python-speech-features](https://github.com/jameslyons/python_speech_features)==0.5
47+
- [seaborn](https://seaborn.pydata.org/)==0.7.1
48+
- [libav](https://libav.org/download/)
49+
50+
5. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format.
51+
52+
```
53+
wget http://www.openslr.org/resources/12/dev-clean.tar.gz
54+
tar -xzvf dev-clean.tar.gz
55+
wget http://www.openslr.org/resources/12/test-clean.tar.gz
56+
tar -xzvf test-clean.tar.gz
57+
mv flac_to_wav.sh LibriSpeech
58+
cd LibriSpeech
59+
./flac_to_wav.sh
60+
```
61+
62+
6. Create JSON files corresponding to the train and validation datasets.
63+
64+
```
65+
cd ..
66+
python create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json
67+
python create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json
68+
```
69+
70+
7. Open the notebook.
71+
72+
```
73+
jupyter notebook vui_notebook.ipynb
74+
```
75+
76+
__NOTE:__ While some code has already been implemented to get you started, you will need to implement additional functionality to successfully answer all of the questions included in the notebook. __Unless requested, do not modify code that has already been included.__
77+
78+
79+
### Amazon Web Services
80+
81+
You are **STRONGLY** encouraged to train the models in the notebook on a GPU. If you have access to a local GPU, follow [Tensorflow's instructions](https://www.tensorflow.org/install/) for installing Tensorflow with GPU support.
82+
83+
If you do not have access to a local GPU, you could use Amazon Web Services to launch an EC2 GPU instance. Please refer to the [Udacity instructions](https://classroom.udacity.com/nanodegrees/nd889/parts/16cf5df5-73f0-4afa-93a9-de5974257236/modules/53b2a19e-4e29-4ae7-aaf2-33d195dbdeba/lessons/2df3b94c-4f09-476a-8397-e8841b147f84/project) for setting up a GPU instance for this project.
84+
85+
86+
### Evaluation
87+
88+
Your project will be reviewed by a Udacity reviewer against the CNN project [rubric](#rubric). Review this rubric thoroughly, and self-evaluate your project before submission. All criteria found in the rubric must meet specifications for you to pass.
89+
90+
91+
### Project Submission
92+
93+
When you are ready to submit your project, collect the following files and compress them into a single archive for upload:
94+
- The `vui_notebook.ipynb` file with fully functional code, all code cells executed and displaying output, and all questions answered.
95+
- An HTML or PDF export of the project notebook with the name `report.html` or `report.pdf`.
96+
- The `sample_models.py` file with all model architectures that were trained in the project Jupyter notebook.
97+
- The `results/` folder containing all HDF5 and pickle files corresponding to trained models.
98+
99+
Alternatively, your submission could consist of the GitHub link to your repository.
100+
101+
102+
<a id='rubric'></a>
103+
## Project Rubric
104+
105+
#### Files Submitted
106+
107+
| Criteria | Meets Specifications |
108+
|:---------------------:|:---------------------------------------------------------:|
109+
| Submission Files | The submission includes all required files. |
110+
111+
#### Documentation
112+
113+
| Criteria | Meets Specifications |
114+
|:---------------------:|:---------------------------------------------------------:|
115+
| Comments | The submission includes comments that describe the functionality of the code. |
116+
117+
#### STEP 2: Model 0: RNN
118+
119+
| Criteria | Meets Specifications |
120+
|:---------------------:|:---------------------------------------------------------:|
121+
| Trained Model 0 | The submission trained the model for at least 20 epochs, and none of the loss values in `model_0.pickle` are undefined. The trained weights for the model specified in `simple_rnn_model` are stored in `model_0.h5`. |
122+
123+
#### STEP 2: Model 1: RNN + TimeDistributed Dense
124+
125+
| Criteria | Meets Specifications |
126+
|:---------------------:|:---------------------------------------------------------:|
127+
| Completed `rnn_model` Module | The submission includes a `sample_models.py` file with a completed `rnn_model` module containing the correct architecture. |
128+
| Trained Model 1 | The submission trained the model for at least 20 epochs, and none of the loss values in `model_1.pickle` are undefined. The trained weights for the model specified in `rnn_model` are stored in `model_1.h5`. |
129+
130+
#### STEP 2: Model 2: CNN + RNN + TimeDistributed Dense
131+
132+
| Criteria | Meets Specifications |
133+
|:---------------------:|:---------------------------------------------------------:|
134+
| Completed `cnn_rnn_model` Module | The submission includes a `sample_models.py` file with a completed `cnn_rnn_model` module containing the correct architecture. |
135+
| Trained Model 2 | The submission trained the model for at least 20 epochs, and none of the loss values in `model_2.pickle` are undefined. The trained weights for the model specified in `cnn_rnn_model` are stored in `model_2.h5`. |
136+
137+
#### STEP 2: Model 3: Deeper RNN + TimeDistributed Dense
138+
139+
| Criteria | Meets Specifications |
140+
|:---------------------:|:---------------------------------------------------------:|
141+
| Completed `deep_rnn_model` Module | The submission includes a `sample_models.py` file with a completed `deep_rnn_model` module containing the correct architecture. |
142+
| Trained Model 3 | The submission trained the model for at least 20 epochs, and none of the loss values in `model_3.pickle` are undefined. The trained weights for the model specified in `deep_rnn_model` are stored in `model_3.h5`. |
143+
144+
#### STEP 2: Model 4: Bidirectional RNN + TimeDistributed Dense
145+
146+
| Criteria | Meets Specifications |
147+
|:---------------------:|:---------------------------------------------------------:|
148+
| Completed `bidirectional_rnn_model` Module | The submission includes a `sample_models.py` file with a completed `bidirectional_rnn_model` module containing the correct architecture. |
149+
| Trained Model 4 | The submission trained the model for at least 20 epochs, and none of the loss values in `model_4.pickle` are undefined. The trained weights for the model specified in `bidirectional_rnn_model` are stored in `model_4.h5`. |
150+
151+
#### STEP 2: Compare the Models
152+
153+
| Criteria | Meets Specifications |
154+
|:---------------------:|:---------------------------------------------------------:|
155+
| Question 1 | The submission includes a detailed analysis of why different models might perform better than others. |
156+
157+
#### STEP 2: Final Model
158+
159+
| Criteria | Meets Specifications |
160+
|:---------------------:|:---------------------------------------------------------:|
161+
| Completed `final_model` Module | The submission includes a `sample_models.py` file with a completed `final_model` module containing a final architecture that is not identical to any of the previous architectures. |
162+
| Trained Final Model | The submission trained the model for at least 20 epochs, and none of the loss values in `model_end.pickle` are undefined. The trained weights for the model specified in `final_model` are stored in `model_end.h5`. |
163+
| Question 2 | The submission includes a detailed description of how the final model architecture was designed. |
164+
165+
166+
## Suggestions to Make your Project Stand Out!
167+
168+
#### (1) Add a Language Model to the Decoder
169+
170+
The performance of the decoding step can be greatly enhanced by incorporating a language model. Build your own language model from scratch, or leverage a repository or toolkit that you find online to improve your predictions.
171+
172+
#### (2) Train on Bigger Data
173+
174+
In the project, you used some of the smaller downloads from the LibriSpeech corpus. Try training your model on some larger datasets - instead of using `dev-clean.tar.gz`, download one of the larger training sets on the [website](http://www.openslr.org/12/).
175+
176+
#### (3) Try out Different Audio Features
177+
178+
In this project, you had the choice to use _either_ spectrogram or MFCC features. Take the time to test the performance of _both_ of these features. For a special challenge, train a network that uses raw audio waveforms!
179+
180+
## Special Thanks
181+
182+
We have borrowed the `create_desc_json.py` and `flac_to_wav.sh` files from the [ba-dls-deepspeech](https://github.com/baidu-research/ba-dls-deepspeech) repository, along with some functions used to generate spectrograms.

char_map.py

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
"""
2+
Defines two dictionaries for converting
3+
between text and integer sequences.
4+
"""
5+
6+
char_map_str = """
7+
' 0
8+
<SPACE> 1
9+
a 2
10+
b 3
11+
c 4
12+
d 5
13+
e 6
14+
f 7
15+
g 8
16+
h 9
17+
i 10
18+
j 11
19+
k 12
20+
l 13
21+
m 14
22+
n 15
23+
o 16
24+
p 17
25+
q 18
26+
r 19
27+
s 20
28+
t 21
29+
u 22
30+
v 23
31+
w 24
32+
x 25
33+
y 26
34+
z 27
35+
"""
36+
# the "blank" character is mapped to 28
37+
38+
char_map = {}
39+
index_map = {}
40+
for line in char_map_str.strip().split('\n'):
41+
ch, index = line.split()
42+
char_map[ch] = int(index)
43+
index_map[int(index)+1] = ch
44+
index_map[2] = ' '

create_desc_json.py

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""
2+
Use this script to create JSON-Line description files that can be used to
3+
train deep-speech models through this library.
4+
This works with data directories that are organized like LibriSpeech:
5+
data_directory/group/speaker/[file_id1.wav, file_id2.wav, ...,
6+
speaker.trans.txt]
7+
8+
Where speaker.trans.txt has in each line, file_id transcription
9+
"""
10+
11+
from __future__ import absolute_import, division, print_function
12+
13+
import argparse
14+
import json
15+
import os
16+
import wave
17+
18+
19+
def main(data_directory, output_file):
20+
labels = []
21+
durations = []
22+
keys = []
23+
for group in os.listdir(data_directory):
24+
speaker_path = os.path.join(data_directory, group)
25+
for speaker in os.listdir(speaker_path):
26+
labels_file = os.path.join(speaker_path, speaker,
27+
'{}-{}.trans.txt'
28+
.format(group, speaker))
29+
for line in open(labels_file):
30+
split = line.strip().split()
31+
file_id = split[0]
32+
label = ' '.join(split[1:]).lower()
33+
audio_file = os.path.join(speaker_path, speaker,
34+
file_id) + '.wav'
35+
audio = wave.open(audio_file)
36+
duration = float(audio.getnframes()) / audio.getframerate()
37+
audio.close()
38+
keys.append(audio_file)
39+
durations.append(duration)
40+
labels.append(label)
41+
with open(output_file, 'w') as out_file:
42+
for i in range(len(keys)):
43+
line = json.dumps({'key': keys[i], 'duration': durations[i],
44+
'text': labels[i]})
45+
out_file.write(line + '\n')
46+
47+
48+
if __name__ == '__main__':
49+
parser = argparse.ArgumentParser()
50+
parser.add_argument('data_directory', type=str,
51+
help='Path to data directory')
52+
parser.add_argument('output_file', type=str,
53+
help='Path to output file')
54+
args = parser.parse_args()
55+
main(args.data_directory, args.output_file)

0 commit comments

Comments
 (0)