Projects

Research projects in engineering education, educational technology, and speech processing for learning applications.

AfriSpeech-200

A comprehensive dataset of Pan-African accented speech for clinical and general domain ASR, featuring 100+ African accents totaling 196+ hours of audio. The dataset includes 2,463 unique speakers with a balanced gender distribution (57.11% female, 42.41% male, 0.48% other). This resource aims to address the gap in African-accented speech recognition and provide a benchmark for developing more inclusive speech technologies.

Speech ProcessingASRAfrican AccentsDatasetClinical DomainPython

AfriSpeech-Dialog

A dataset of long-form African accented English conversation for evaluation diarization, ASR, and summarization. I worked on the Diarization of open and closed Diarization models on our Custom dataset.

Speech ProcessingASRDiarizationPython

ASR Fine-Tuning with Nvidia NeMo

Built, trained, and deployed a GPU-accelerated automatic speech recognition service (ASR) tailored for Nigerian English using NVIDIA's Riva and NeMo frameworks. The project includes fine-tuning pre-trained models, implementing word boosting, and deploying custom ASR pipelines.

ASRDeep LearningNVIDIA NeMoNVIDIA RivaPythonGPU Computing

Academic Portfolio Website

A modern, responsive portfolio website built with Next.js, TypeScript, and Tailwind CSS. Features include dark mode support, smooth animations using Framer Motion, and a fully responsive design showcasing my research, publications, and projects.

Next.jsTypeScriptTailwind CSSFramer MotionReact

AfriSpeech-TTS

African Digital Voices: A pan-African parameter-efficient multi-accent multi-speaker text-to-speech system. Features 751 unique speakers with a balanced gender distribution (54.45% female, 44.36% male, 1.19% other). The system uses parameter-efficient approaches to achieve competitive performance in voice synthesis while using only 1.2% to 0.8% of original trainable parameters.

Text-to-SpeechSpeech SynthesisAfrican VoicesDeep LearningPython

RAG Application with NVIDIA NIM

Built a Retrieval-Augmented Generation (RAG) pipeline using NVIDIA's NIM (NeMo Inference Microservices) and LangChain, with a Streamlit UI for interaction. The system processes PDF documents, creates vector embeddings using NVIDIAEmbeddings, and enables natural language querying over the content using the meta/llama3-70b-instruct model through ChatNVIDIA.

RAGNVIDIA NIMLangChainStreamlitLLMVector Embeddings

Speechbrain-LLaMA3-Story-Writer

Turn your voice prompt into a story! This project uses SpeechBrain to transcribe spoken audio into text, and passes the transcribed prompt to a LLaMA 3 language model to generate a creative short story. Features include voice-to-text conversion, story generation with Meta's LLaMA 3, and a seamless pipeline from spoken prompt to creative writing.

Speech-to-TextLLMStory GenerationSpeechBrainLLaMA 3Python

CodecEval-Africa

Neural audio codecs evaluation framework for low-resource African language settings. This project evaluates the performance of various neural audio codecs on African speech data, addressing the gap in codec performance for underrepresented languages and accents.

Neural Audio CodecsLow-Resource LanguagesAfrican LanguagesAudio CompressionPython

Child Speech Analysis with LALMs

Evaluating Large Audio Language Models (LALMs) for child interview summarization, focusing on speaker separation and content isolation from mixed interviewer-child audio. The project addresses challenging scenarios like children who stutter and aims to maintain speaker purity in downstream summaries.

Child SpeechAudio Language ModelsSpeaker SeparationSpeech SummarizationPython