Audition Intelligence

This open-source project aims to assist individuals, including those with hearing difficulties, multitaskers managing chores, and residents of large homes, in staying informed about household appliance alerts. The system identifies appliance alert sounds and notifies users, ensuring important alerts are never missed and enhancing convenience and accessibility for diverse users.

The project consists of two main components:

Models: Tools and experiments for training machine learning models to classify household alert sounds.
AudioInferESP: An ESP32-based embedded program that detects appliance alerts and output the result via console.

Features

Sound Classification: Identifies different household appliance alert sounds.
Embedded Integration: Runs on the ESP32 microcontroller for efficient edge computing.

Datasets

Bird Sound Data Collection:

The bird sound dataset utilized in this project was sourced from Kaggle, a prominent platform for datasets and data science resources. This dataset comprises 2161 audio files (mp3) capturing the vocalizations of 114 distinct bird species.

📙 Dataset Link: https://www.kaggle.com/datasets/soumendraprasad/sound-of-114-species-of-birds-till-2022

Urban Sound Data Collection:

The UrbanSound8k dataset, used in this project, consists of 8,732 labeled audio excerpts (≤4 seconds) capturing various urban sounds. These sounds are categorized into 10 classes, including air conditioners, car horns, children playing, dog barks, drilling, engine idling, gunshots, jackhammers, sirens, and street music.
The dataset was compiled from field recordings sourced from Freesound. The audio files are in WAV format, maintaining the original sampling rate, bit depth, and number of channels as per their uploaded versions.

📙 Dataset Link: https://urbansounddataset.weebly.com/

Appliance Alarm Data Collection

This custom dataset was assembled for transfer learning using household appliance alarms, including both self-recorded samples and curated YouTube audio clips covering categories such as alarms, dishwashers, microwaves, washing machines, and so on.
All YouTube recordings are referenced via a centralized playlist.

📙 Playlist Link: Appliances Sound Dataset

Project Structure

/
├── Model/
│   ├── bird_sound_classification/
│   ├── urban_sound_classification/
│   ├── transfer_learning/
│   └── utils/
├── AudioInferESP/

Model/

This directory contains all audio classification model experiments, training pipelines, and related utilities:

bird_sound_classification/ & urban_sound_classification/:
Experiment code and results using different configurations/datasets (birdsound/urbansound). Both folders include baselines exploring model architectures and parameter settings.
transfer_learning/:
Scripts and results for transferring pre-trained baseline models to the specific task of appliance alert classification. Includes final models fine-tuned with limited alarm sound samples and optimized using quantization techniques.
utils/:
Auxiliary scripts:
- Data cropping utilities to segment audio files into usable clips.
- Background augmentation tools to enhance data variability and robustness of trained models.

AudioInferESP/

Includes code for the embedded program running on the ESP32-S3 platform. The default onboard model is transfer_learning_butterworth_augmentation_int8.tflite, which was obtained via transfer learning followed by int8 quantization.

Hardware Requirements

This project relies on the ESP32-S3-EYE, a powerful AI development board from Espressif, designed specifically for machine vision and speech recognition applications. Key features relevant to this project include:

ESP32-S3-WROOM-1 Module: Dual-core 32-bit LX7 microprocessor with AI acceleration, supporting efficient edge inference.
Memory: 8MB PSRAM and 8MB Flash for running embedded models and real-time audio processing.
Microphone: Onboard digital microphone driven by the I2S driver in the ESP32-S3, enabling high-quality audio data acquisition suitable for sound classification tasks.
USB Port: The board features a built-in USB Type-C port for power supply, firmware flashing, and serial communication/monitoring during development and deployment.
Form Factor: Compact PCB with onboard LEDs, function buttons and so on.

We recommend using the ESP32-S3-EYE board for optimal performance, as it provides all necessary resources for audio sensing, model inference, and wireless communication. Our project demo is also based on it. Other ESP32-S3 boards with equivalent specifications may also be compatible.

Usage

Install ESP-IDF if you haven't already.
Activate the ESP-IDF environment.
In your terminal, navigate to the AudioInferESP directory:
```
cd AudioInferESP
```
Build the firmware:
```
idf.py build
```
Connect your ESP32-S3 device via USB.
Flash the binary to the board:
```
idf.py flash
```
Open a monitor for real-time output:
```
idf.py monitor
```

Limitation

This project serves as a demo, providing a proof-of-concept solution for household appliance alarm sound classification on embedded devices. While it demonstrates the feasibility of real-time audio classification and notification delivery, several important limitations should be noted:

Prediction Accuracy: The model’s predictions may still be inaccurate or unreliable under various acoustic conditions, leading to false alarms or missed detections.
Dataset and Training Scope: The system is trained and tested using a limited set of open source datasets with some synthetic augmentations. Its performance may degrade with real-world appliance alarms that differ from the training data.
Environmental Sensitivity: Factors such as background noise levels, room acoustics, overlapping sounds, and the quality or orientation of the onboard microphone can significantly affect detection reliability.
Production Readiness: This prototype is not prepared for direct deployment in production or critical environments. Users should not rely on this software for safety-related purposes.

We welcome community contributions to improve dataset diversity, model generalization, device portability, and overall usability in future versions.

Reference

https://github.com/tensorflow/models

https://github.com/keras-team/keras

https://github.com/librosa/librosa

https://github.com/scipy/scipy

https://github.com/gopiashokan/Bird-Sound-Classification-using-Deep-Learning

https://github.com/espressif/esp-tflite-micro

https://github.com/espressif/esp-nn

https://github.com/espressif/esp-dsp

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
AudioInferESP		AudioInferESP
Models		Models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audition Intelligence

Features

Datasets

Bird Sound Data Collection:

Urban Sound Data Collection:

Appliance Alarm Data Collection

Project Structure

Model/

AudioInferESP/

Hardware Requirements

Usage

Limitation

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audition Intelligence

Features

Datasets

Bird Sound Data Collection:

Urban Sound Data Collection:

Appliance Alarm Data Collection

Project Structure

Model/

AudioInferESP/

Hardware Requirements

Usage

Limitation

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages