Skip to content

ZTH7/TFG-Tech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audition Intelligence

This open-source project aims to assist individuals, including those with hearing difficulties, multitaskers managing chores, and residents of large homes, in staying informed about household appliance alerts. The system identifies appliance alert sounds and notifies users, ensuring important alerts are never missed and enhancing convenience and accessibility for diverse users.

The project consists of two main components:

  1. Models: Tools and experiments for training machine learning models to classify household alert sounds.
  2. AudioInferESP: An ESP32-based embedded program that detects appliance alerts and output the result via console.

Features

  • Sound Classification: Identifies different household appliance alert sounds.
  • Embedded Integration: Runs on the ESP32 microcontroller for efficient edge computing.

Datasets

Bird Sound Data Collection:

  • The bird sound dataset utilized in this project was sourced from Kaggle, a prominent platform for datasets and data science resources. This dataset comprises 2161 audio files (mp3) capturing the vocalizations of 114 distinct bird species.

📙 Dataset Link: https://www.kaggle.com/datasets/soumendraprasad/sound-of-114-species-of-birds-till-2022

Urban Sound Data Collection:

  • The UrbanSound8k dataset, used in this project, consists of 8,732 labeled audio excerpts (≤4 seconds) capturing various urban sounds. These sounds are categorized into 10 classes, including air conditioners, car horns, children playing, dog barks, drilling, engine idling, gunshots, jackhammers, sirens, and street music.
  • The dataset was compiled from field recordings sourced from Freesound. The audio files are in WAV format, maintaining the original sampling rate, bit depth, and number of channels as per their uploaded versions.

📙 Dataset Link: https://urbansounddataset.weebly.com/

Appliance Alarm Data Collection

  • This custom dataset was assembled for transfer learning using household appliance alarms, including both self-recorded samples and curated YouTube audio clips covering categories such as alarms, dishwashers, microwaves, washing machines, and so on.
  • All YouTube recordings are referenced via a centralized playlist.

📙 Playlist Link: Appliances Sound Dataset

Project Structure

/
├── Model/
│   ├── bird_sound_classification/
│   ├── urban_sound_classification/
│   ├── transfer_learning/
│   └── utils/
├── AudioInferESP/

Model/

This directory contains all audio classification model experiments, training pipelines, and related utilities:

  • bird_sound_classification/ & urban_sound_classification/:
    Experiment code and results using different configurations/datasets (birdsound/urbansound). Both folders include baselines exploring model architectures and parameter settings.

  • transfer_learning/:
    Scripts and results for transferring pre-trained baseline models to the specific task of appliance alert classification. Includes final models fine-tuned with limited alarm sound samples and optimized using quantization techniques.

  • utils/:
    Auxiliary scripts:

    • Data cropping utilities to segment audio files into usable clips.
    • Background augmentation tools to enhance data variability and robustness of trained models.

AudioInferESP/

Includes code for the embedded program running on the ESP32-S3 platform. The default onboard model is transfer_learning_butterworth_augmentation_int8.tflite, which was obtained via transfer learning followed by int8 quantization.

Hardware Requirements

This project relies on the ESP32-S3-EYE, a powerful AI development board from Espressif, designed specifically for machine vision and speech recognition applications. Key features relevant to this project include:

  • ESP32-S3-WROOM-1 Module: Dual-core 32-bit LX7 microprocessor with AI acceleration, supporting efficient edge inference.
  • Memory: 8MB PSRAM and 8MB Flash for running embedded models and real-time audio processing.
  • Microphone: Onboard digital microphone driven by the I2S driver in the ESP32-S3, enabling high-quality audio data acquisition suitable for sound classification tasks.
  • USB Port: The board features a built-in USB Type-C port for power supply, firmware flashing, and serial communication/monitoring during development and deployment.
  • Form Factor: Compact PCB with onboard LEDs, function buttons and so on.

We recommend using the ESP32-S3-EYE board for optimal performance, as it provides all necessary resources for audio sensing, model inference, and wireless communication. Our project demo is also based on it. Other ESP32-S3 boards with equivalent specifications may also be compatible.

Usage

  1. Install ESP-IDF if you haven't already.
  2. Activate the ESP-IDF environment.
  3. In your terminal, navigate to the AudioInferESP directory:
    cd AudioInferESP
  4. Build the firmware:
    idf.py build
  5. Connect your ESP32-S3 device via USB.
  6. Flash the binary to the board:
    idf.py flash
  7. Open a monitor for real-time output:
    idf.py monitor

Limitation

This project serves as a demo, providing a proof-of-concept solution for household appliance alarm sound classification on embedded devices. While it demonstrates the feasibility of real-time audio classification and notification delivery, several important limitations should be noted:

  • Prediction Accuracy: The model’s predictions may still be inaccurate or unreliable under various acoustic conditions, leading to false alarms or missed detections.

  • Dataset and Training Scope: The system is trained and tested using a limited set of open source datasets with some synthetic augmentations. Its performance may degrade with real-world appliance alarms that differ from the training data.

  • Environmental Sensitivity: Factors such as background noise levels, room acoustics, overlapping sounds, and the quality or orientation of the onboard microphone can significantly affect detection reliability.

  • Production Readiness: This prototype is not prepared for direct deployment in production or critical environments. Users should not rely on this software for safety-related purposes.

We welcome community contributions to improve dataset diversity, model generalization, device portability, and overall usability in future versions.

Reference

https://github.com/tensorflow/models

https://github.com/keras-team/keras

https://github.com/librosa/librosa

https://github.com/scipy/scipy

https://github.com/gopiashokan/Bird-Sound-Classification-using-Deep-Learning

https://github.com/espressif/esp-tflite-micro

https://github.com/espressif/esp-nn

https://github.com/espressif/esp-dsp

About

Graduation project which focuses on classifying household alarms using a CNN model. The trained model is deployed on an ESP32-S3 microcontroller to enable real-time sound detection, providing an efficient and low-cost solution for smart home safety applications.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors