|Name||Last Update||Last Commit 7e68b9c8d3b – Additions for spectral features||history|
|code||Loading commit data...|
|.gitignore||Loading commit data...|
|config.inc||Loading commit data...|
|network_final_submission.inc||Loading commit data...|
|readme.md||Loading commit data...|
|run.sh||Loading commit data...|
|spectral_features.inc||Loading commit data...|
Bird audio detection challenge 2017
This is a submission for the bird audio detection challenge 2017 using convolutional neural networks (CNNs) working on spectrograms.
Contact address: Thomas Grill (firstname.lastname@example.org)
In a first run, the networks train on the whole provided training data and make predictions on the testing data. "Safe" predictions (close to 0 or 1) are then added to the training data as so-called "pseudo-labeled" data. A second run in performed on the extended training set. Afterwards, all predictions are bagged to yield final predictions.
The implementation is done in the Python programming language using numpy, Theano and Lasagne packages, as well as the custom lasagne front-end simplenn. All softwares used are open source and cross-platform. In order for the CNN training to run at acceptable speeds, a GPU is required. Memory requirements are about 6 GiB CPU RAM, and 1.5 GiB GPU RAM.
Python (version 2.7): https://www.python.org/
simplenn: https://jobim.ofai.at/gitlab/gr/simplenn (git clone might not work, use archive download)
Detailed installation instructions for Theano and lasagne can be found on https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne .
Running the training/prediction:
Adjust the paths to labels and audiofiles, as well as other basic settings in config.inc.
Run the training/prediction procedure by executing run.sh.
With run.sh, several steps are executed in sequence:
stage1_prepare: Generation of training and testing data filelists. Generation of spectrograms for the audio files.
stage1_train: First training run, producing network models
stage1_predict: Predictions based on these models
stage2_prepare: Generating pseudo-labeled additional training data
stage2_train: Second training run, producing more network models
stage2_predict: Final predictions, employing all network models
If the spectrogram files, network models or prediction files are already present, they are not regenerated.
Each of the steps executed in run.sh can be run explicitly by specifying them as the first argument, such as in run.sh stage1_train.
For the training steps, model indices can also be specified, e.g., run.sh stage1_train 1, with the index running from 1 to the number of models (typically 5).
This can be used to train models in parallel, on several GPUs (or CPU cores).