Projects
Research projects in engineering education, educational technology, and speech processing for learning applications.
AfriSpeech-200
A comprehensive dataset of Pan-African accented speech for clinical and general domain ASR, featuring 100+ African accents totaling 196+ hours of audio. The dataset includes 2,463 unique speakers with a balanced gender distribution (57.11% female, 42.41% male, 0.48% other). This resource aims to address the gap in African-accented speech recognition and provide a benchmark for developing more inclusive speech technologies.
AfriSpeech-Dialog
A dataset of long-form African accented English conversation for evaluation diarization, ASR, and summarization. I worked on the Diarization of open and closed Diarization models on our Custom dataset.
ASR Fine-Tuning with Nvidia NeMo
Built, trained, and deployed a GPU-accelerated automatic speech recognition service (ASR) tailored for Nigerian English using NVIDIA's Riva and NeMo frameworks. The project includes fine-tuning pre-trained models, implementing word boosting, and deploying custom ASR pipelines.
Academic Portfolio Website
A modern, responsive portfolio website built with Next.js, TypeScript, and Tailwind CSS. Features include dark mode support, smooth animations using Framer Motion, and a fully responsive design showcasing my research, publications, and projects.
AfriSpeech-TTS
African Digital Voices: A pan-African parameter-efficient multi-accent multi-speaker text-to-speech system. Features 751 unique speakers with a balanced gender distribution (54.45% female, 44.36% male, 1.19% other). The system uses parameter-efficient approaches to achieve competitive performance in voice synthesis while using only 1.2% to 0.8% of original trainable parameters.
RAG Application with NVIDIA NIM
Built a Retrieval-Augmented Generation (RAG) pipeline using NVIDIA's NIM (NeMo Inference Microservices) and LangChain, with a Streamlit UI for interaction. The system processes PDF documents, creates vector embeddings using NVIDIAEmbeddings, and enables natural language querying over the content using the meta/llama3-70b-instruct model through ChatNVIDIA.
Speechbrain-LLaMA3-Story-Writer
Turn your voice prompt into a story! This project uses SpeechBrain to transcribe spoken audio into text, and passes the transcribed prompt to a LLaMA 3 language model to generate a creative short story. Features include voice-to-text conversion, story generation with Meta's LLaMA 3, and a seamless pipeline from spoken prompt to creative writing.
CodecEval-Africa
Neural audio codecs evaluation framework for low-resource African language settings. This project evaluates the performance of various neural audio codecs on African speech data, addressing the gap in codec performance for underrepresented languages and accents.
Child Speech Analysis with LALMs
Evaluating Large Audio Language Models (LALMs) for child interview summarization, focusing on speaker separation and content isolation from mixed interviewer-child audio. The project addresses challenging scenarios like children who stutter and aims to maintain speaker purity in downstream summaries.