Introduction

Emotion Classification Pipeline

This project provides a pipeline for emotion classification from audio data.

Features

Audio transcription using speech-to-text
Emotion classification from transcribed text
API for real-time processing
Command-line interface for batch processing

Technical Specifications

Pipeline Architecture

YouTube URL
    ↓
Input Handler
    ↓
Audio Download
    ↓
Speech-to-Text
    ↓
Emotion Classifier
    ↓
Result Storage
    ↓
API/CLI Output

Component Details

Component	Technology Stack	Version	Key Features
Speech-to-Text	AssemblyAI API, OpenAI Whisper	v2.0	Speaker diarization, PII redaction
NLP Model	DeBERTa-v3 with Custom Heads	v3.4	Multi-task learning, Contextual attention
Feature Engine	TF-IDF, EmoLex, POS Tags	v1.1	42 linguistic features
API Service	FastAPI + Uvicorn	0.85+	JWT auth, Rate limiting
Containerization	Docker	20.10+	Multi-stage builds, GPU support

System Requirements

Hardware Specifications

Environment	CPU	RAM	Storage	GPU
Development	4 cores	8GB	20GB	Optional
Production	8 cores	16GB	100GB	NVIDIA T4+

Software Dependencies

See requirements.txt for complete list

Installation Guide

Local Installation

# Create virtual environment
python -m venv .venv

# Activate environment
# Windows:
.venv\Scripts\activate
# Mac/Linux:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Docker Deployment

docker build -t emotion-clf .
docker run -p 8000:8000 -e ASSEMBLYAI_API_KEY=your_key emotion-clf

Azure ML Deployment

Create Azure ML workspace
Register model in Azure ML Studio
Create inference configuration
Deploy as ACI (dev) or AKS (prod)

Core Functionality

Emotion Taxonomy

Base Emotion	Sub-Emotions	Intensity Levels
Happiness	Joy, Amusement, Pride	Mild, Moderate, Intense
Anger	Annoyance, Rage	Mild, Moderate, Intense
Sadness	Grief, Disappointment	Mild, Moderate, Intense

Processing Pipeline

YouTube audio extraction
Speech-to-text transcription
Text segmentation
Feature extraction
Emotion classification
Result aggregation

Usage Documentation

CLI Interface

emotion-clf predict --url "https://youtube.com/watch?v=example"

Python API

from emotion_clf import EmotionPredictor
predictor = EmotionPredictor()
results = predictor.predict(["Exciting news!"])

REST API Endpoints

Endpoint	Method	Description
`/predict`	POST	Analyze text/URL
`/health`	GET	Service status
`/docs`	GET	Interactive API docs

Configuration Management

Environment Variables

ASSEMBLYAI_API_KEY="your_api_key"
WHISPER_MODEL="medium"
LOG_LEVEL="INFO"

config.yaml Example

transcription:
  method: whisper
  timeout: 300

classification:
  confidence_threshold: 0.65
  batch_size: 16

Troubleshooting Guide

Common Issues

CUDA Out of Memory

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

Missing Dependencies

pip install --upgrade -r requirements.txt

API Timeouts

Increase timeout in config.yaml
Check network connectivity
Verify API key validity

Error Codes

Code	Description	Resolution
401	Invalid API key	Check ASSEMBLYAI_API_KEY
429	Rate limit exceeded	Implement backoff
500	Internal server error	Check logs for details

Architecture Diagrams

System Architecture

┌────────┐    ┌─────────────┐    ┌───────────────┐
│ Client │───▶│ API Gateway │───▶│ Load Balancer │
└────────┘    └─────────────┘    └───────┬───────┘
                                         │
                  ┌──────────────────────┼──────────────────────┐
                  │                      │                      │
                  ▼                      ▼                      │
            ┌───────────┐          ┌───────────┐                │
            │ Service 1 │          │ Service 2 │                │
            └─────┬─────┘          └─────┬─────┘                │
                  │                      │                      │
                  └──────────────────────┼──────────────────────┘
                                         ▼
                                  ┌──────────┐
                                  │ Database │
                                  └──────────┘

Data Flow Sequence

User                API              Model            Database
 │                  │                 │                 │
 │──POST /predict──▶│                 │                 │
 │                  │──Process req──▶│                 │
 │                  │                 │──Store results─▶│
 │                  │                 │◀──Return data──│
 │◀──Return pred───│                 │                 │
 │                  │                 │                 │

Testing Procedures

Unit Tests

python -m pytest tests/unit -v

Integration Tests

python -m pytest tests/integration -v

Test Coverage

coverage run -m pytest
coverage report

Load Testing

locust -f tests/load_test.py

Deployment Guide

Dockerfile

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

Azure Deployment Steps

Create Azure Container Registry
Build and push Docker image
Create Azure Kubernetes Service
Deploy using Helm charts
Configure ingress controller

CI/CD Pipeline

Code commit triggers build
Run unit/integration tests
Build Docker image
Push to container registry
Deploy to staging
Run smoke tests
Promote to production

License & Attribution

MIT License - Full text in LICENSE

Third-Party Components

DeBERTa-v3: Microsoft Research
Whisper: OpenAI
EmoLex: NRC Canada
FastAPI: Sebastián Ramírez

Requirements

requirements.txt

python>=3.9
torch==2.0.1
transformers==4.30.2
fastapi==0.95.2
pytube==15.0.0
pandas==2.0.2
uvicorn==0.22.0
python-dotenv==1.0.0
nltk==3.8.1
numpy==1.24.3
pytest==7.4.0
coverage==7.3.0

setup.py

from setuptools import setup, find_packages

setup(
    name="emotion_clf",
    version="1.0.0",
    packages=find_packages(),
    install_requires=[
        "torch>=2.0.1",
        "transformers>=4.30.2",
        "fastapi>=0.95.2",
        "uvicorn>=0.22.0"
    ],
    entry_points={
        "console_scripts": [
            "emotion-clf=cli:main"
        ]
    }
)

Error Handling Documentation

Transcription Errors

Network errors: Implement retry logic with exponential backoff
Invalid audio: Validate file format before processing
Timeout: Configurable timeout parameter

Classification Errors

Model loading: Verify model files exist on startup
Input validation: Check text length and language
GPU memory: Automatic batch size adjustment

API Errors

Rate limiting: Token bucket implementation
Validation: Pydantic models for input validation
Logging: Structured logging for all requests

Performance Benchmarks

Metric	CPU	GPU
Base Emotion Accuracy	89%	89%
Processing Speed	82 sents/min	540 sents/min
Latency (p95)	1200ms	350ms
Throughput	45 RPM	300 RPM