Projects

A collection of Group Research projects and Personal AI projects.

AfriSpeech-200

A comprehensive dataset of Pan-African accented speech for clinical and general domain ASR, featuring 100+ African accents totaling 196+ hours of audio. The dataset includes 2,463 unique speakers with a balanced gender distribution (57.11% female, 42.41% male, 0.48% other). This resource aims to address the gap in African-accented speech recognition and provide a benchmark for developing more inclusive speech technologies.

Speech ProcessingASRAfrican AccentsDatasetClinical DomainPython

AfriSpeech-Dialog

A dataset of long-form African accented English conversation for evaluation diarization, ASR, and summarization. I worked on the Diarization of open and closed Diarization models on our Custom dataset.

Speech ProcessingASRDiarizationPython

ASR Fine-Tuning with Nvidia NeMo

Built, trained, and deployed a GPU-accelerated automatic speech recognition service (ASR) tailored for Nigerian English using NVIDIA's Riva and NeMo frameworks. The project includes fine-tuning pre-trained models, implementing word boosting, and deploying custom ASR pipelines.

ASRDeep LearningNVIDIA NeMoNVIDIA RivaPythonGPU Computing

Academic Portfolio Website

A modern, responsive portfolio website built with Next.js, TypeScript, and Tailwind CSS. Features include dark mode support, smooth animations using Framer Motion, and a fully responsive design showcasing my research, publications, and projects.

Next.jsTypeScriptTailwind CSSFramer MotionReact

AfriSpeech-TTS

African Digital Voices: A pan-African parameter-efficient multi-accent multi-speaker text-to-speech system. Features 751 unique speakers with a balanced gender distribution (54.45% female, 44.36% male, 1.19% other). The system uses parameter-efficient approaches to achieve competitive performance in voice synthesis while using only 1.2% to 0.8% of original trainable parameters.

Text-to-SpeechSpeech SynthesisAfrican VoicesDeep LearningPython

RAG Application with NVIDIA NIM

Built a Retrieval-Augmented Generation (RAG) pipeline using NVIDIA's NIM (NeMo Inference Microservices) and LangChain, with a Streamlit UI for interaction. The system processes PDF documents, creates vector embeddings using NVIDIAEmbeddings, and enables natural language querying over the content using the meta/llama3-70b-instruct model through ChatNVIDIA.

RAGNVIDIA NIMLangChainStreamlitLLMVector Embeddings

Speechbrain-LLaMA3-Story-Writer

Turn your voice prompt into a story! This project uses SpeechBrain to transcribe spoken audio into text, and passes the transcribed prompt to a LLaMA 3 language model to generate a creative short story. Features include voice-to-text conversion, story generation with Meta's LLaMA 3, and a seamless pipeline from spoken prompt to creative writing.

Speech-to-TextLLMStory GenerationSpeechBrainLLaMA 3Python