emotion_clf_pipeline.predict
A complete end-to-end pipeline for extracting and analyzing emotional content from YouTube videos. This module orchestrates the entire workflow from audio extraction to emotion classification, providing a streamlined interface for sentiment analysis research and applications.
The pipeline supports multiple transcription services with automatic fallback mechanisms to ensure robustness in production environments.
- Usage:
python predict.py “https://youtube.com/watch?v=…” –transcription_method whisper
Functions
|
Extract video title from YouTube URL for meaningful file naming. |
|
Apply emotion classification to text using trained models. |
|
Execute the complete emotion analysis pipeline for a YouTube video. |
|
Convert audio to text using configurable transcription services. |
|
Convert time strings to seconds for numerical operations. |
- class emotion_clf_pipeline.predict.AzureEndpointPredictor(api_key, endpoint_url, encoders_dir='models/encoders')[source]
Bases:
object
A class to interact with an Azure endpoint for emotion classification. It handles API requests, decodes predictions, and post-processes sub-emotions.
- __init__(api_key, endpoint_url, encoders_dir='models/encoders')[source]
Initialize with API key, endpoint URL, and encoder directory. Automatically converts private network URLs to public NGROK URLs.
- decode_and_postprocess(raw_predictions)[source]
Decode raw predictions and post-process sub-emotion to ensure consistency.
- emotion_clf_pipeline.predict.extract_audio_transcript(video_url)[source]
Extract transcript using speech-to-text from stt.py.
- emotion_clf_pipeline.predict.extract_transcript(video_url)[source]
Extract transcript from YouTube video using subtitles (fallback to STT).
- emotion_clf_pipeline.predict.get_azure_config(endpoint_url=None, api_key=None, use_ngrok=None, server_ip=None)[source]
Get Azure endpoint configuration with fallback priorities.
Priority order: 1. Explicit parameters passed to function 2. Environment variables from .env file 3. Raise error if required values missing
- Parameters:
- Returns:
Dictionary with Azure configuration
- Raises:
ValueError – If required configuration is missing
- Return type:
- emotion_clf_pipeline.predict.get_video_title(youtube_url)[source]
Extract video title from YouTube URL for meaningful file naming.
Video titles serve as natural identifiers for organizing processing results and enable easy correlation between source content and analysis outputs.
- Parameters:
youtube_url (str) – Valid YouTube video URL
- Returns:
Video title or “Unknown Title” if extraction fails
- Return type:
Note
Gracefully handles network errors and invalid URLs to prevent pipeline interruption during batch processing scenarios.
- emotion_clf_pipeline.predict.predict_emotion(texts, feature_config=None, reload_model=False)[source]
Apply emotion classification to text using trained models.
This function serves as the core intelligence of the pipeline, transforming raw text into structured emotional insights. It supports both single text analysis and batch processing for efficiency.
- Parameters:
texts – Single text string or list of texts for analysis
feature_config – Optional configuration for feature extraction methods
reload_model – Force model reinitialization (useful for memory management)
- Returns:
- Emotion predictions with confidence scores.
Single dict for one text, list of dicts for multiple texts. Returns None if prediction fails.
- Return type:
- Performance:
Logs processing latency for performance monitoring and optimization.
- emotion_clf_pipeline.predict.predict_emotions_azure(video_url, endpoint_url=None, api_key=None, use_stt=False, chunk_size=200, use_ngrok=None, server_ip=None)[source]
Predict emotions using Azure ML endpoint with auto-loaded configuration.
- Parameters:
video_url (str) – URL or path to video/audio file
endpoint_url (str | None) – Azure ML endpoint URL (overrides .env if provided)
api_key (str | None) – Azure ML API key (overrides .env if provided)
use_stt (bool) – Whether to use speech-to-text for audio
chunk_size (int) – Text chunk size for processing
use_ngrok (bool | None) – Whether to use NGROK tunnel (overrides .env if provided)
server_ip (str | None) – Server IP for NGROK (overrides .env if provided)
- Returns:
Dictionary containing predictions and metadata
- Return type:
- emotion_clf_pipeline.predict.predict_emotions_local(video_url, model_path='models/weights/baseline_weights.pt', config_path='models/weights/model_config.json', use_stt=False, chunk_size=200)[source]
Predict emotions using local model inference.
- Parameters:
- Returns:
Dictionary containing predictions and metadata
- Return type:
- emotion_clf_pipeline.predict.process_text_chunks(text, model, feature_extractor, chunk_size=200, expected_feature_dim=121)[source]
Process text in chunks for local model inference.
- emotion_clf_pipeline.predict.process_youtube_url_and_predict(youtube_url, transcription_method)[source]
- Execute the complete emotion analysis pipeline for a YouTube video.
This is the main orchestration function that coordinates all pipeline stages:
Audio extraction from YouTube (with title metadata)
Speech-to-text transcription (with fallback mechanisms)
Emotion classification (with temporal alignment)
Results persistence (structured Excel output)
The function maintains data lineage throughout the process, ensuring that timestamps from transcription are preserved and aligned with emotion predictions for temporal analysis capabilities.
- Parameters:
- Returns:
- Structured emotion analysis results where each dictionary
contains temporal and emotional metadata: - start_time/end_time: Temporal boundaries of the segment - text: Transcribed speech content - emotion/sub_emotion: Classified emotional states - intensity: Emotional intensity measurement
- Return type:
Returns empty list if essential processing steps fail.
Note
Creates necessary output directories automatically. All intermediate and final results are persisted to disk for reproducibility and further analysis.
- emotion_clf_pipeline.predict.speech_to_text(transcription_method, audio_file, output_file)[source]
Convert audio to text using configurable transcription services.
Implements a robust transcription strategy with automatic fallback: - Primary: AssemblyAI (cloud-based, high accuracy) - Fallback: Whisper (local processing, privacy-preserving)
This dual-service approach ensures pipeline reliability even when external services are unavailable or API limits are reached.
- Parameters:
transcription_method – “assemblyAI” or “whisper” for primary service
audio_file – Path to input audio file
output_file – Path where transcript will be saved
- Raises:
ValueError – If transcription_method is not recognized
Note
AssemblyAI failures trigger automatic Whisper fallback. All transcription attempts are logged for debugging purposes.
- emotion_clf_pipeline.predict.time_str_to_seconds(time_str)[source]
Convert time strings to seconds for numerical operations.
Handles multiple time formats commonly found in transcription outputs: - HH:MM:SS or HH:MM:SS.mmm (hours, minutes, seconds with optional milliseconds) - MM:SS or MM:SS.mmm (minutes, seconds with optional milliseconds) - Numeric values (already in seconds)
This conversion is essential for temporal analysis and synchronization between audio timestamps and emotion predictions.
- Parameters:
time_str – Time in string format or numeric value
- Returns:
- Time converted to seconds, or 0.0 if parsing fails
Note:
Returns 0.0 for invalid inputs rather than raising exceptions to maintain pipeline robustness during batch processing.
- Return type: