emotion_clf_pipeline.predict

A complete end-to-end pipeline for extracting and analyzing emotional content from YouTube videos. This module orchestrates the entire workflow from audio extraction to emotion classification, providing a streamlined interface for sentiment analysis research and applications.

The pipeline supports multiple transcription services with automatic fallback mechanisms to ensure robustness in production environments.

Usage:: python predict.py “https://youtube.com/watch?v=…” –transcription_method whisper

Functions

`get_video_title`(youtube_url)	Extract video title from YouTube URL for meaningful file naming.
`predict_emotion`(texts[, feature_config, ...])	Apply emotion classification to text using trained models.
`process_youtube_url_and_predict`(youtube_url, ...)	Execute the complete emotion analysis pipeline for a YouTube video.
`speech_to_text`(transcription_method, ...)	Convert audio to text using configurable transcription services.
`time_str_to_seconds`(time_str)	Convert time strings to seconds for numerical operations.

class emotion_clf_pipeline.predict.AzureEndpointPredictor(api_key, endpoint_url, encoders_dir='models/encoders')[source]

Bases: object

A class to interact with an Azure endpoint for emotion classification. It handles API requests, decodes predictions, and post-processes sub-emotions.

__init__(api_key, endpoint_url, encoders_dir='models/encoders')[source]: Initialize with API key, endpoint URL, and encoder directory. Automatically converts private network URLs to public NGROK URLs.

decode_and_postprocess(raw_predictions)[source]: Decode raw predictions and post-process sub-emotion to ensure consistency.

get_prediction(text)[source]: Send a request to the Azure endpoint and return the raw response.

predict(text)[source]

Full workflow: get prediction, decode, and post-process. Handles double-encoded JSON from the API.

Parameters:: text (str)
Return type:: dict

predict_batch(texts)[source]

Predict emotions for multiple texts (sequential calls).

Parameters:: texts (List[str]) – List of input texts
Returns:: List of prediction results
Return type:: List[Dict[str, Any]]

emotion_clf_pipeline.predict.extract_audio_transcript(video_url)[source]

Extract transcript using speech-to-text from stt.py.

Parameters:: video_url (str)
Return type:: Dict[str, Any]

emotion_clf_pipeline.predict.extract_transcript(video_url)[source]

Extract transcript from YouTube video using subtitles (fallback to STT).

Parameters:: video_url (str)
Return type:: Dict[str, Any]

emotion_clf_pipeline.predict.get_azure_config(endpoint_url=None, api_key=None, use_ngrok=None, server_ip=None)[source]

Get Azure endpoint configuration with fallback priorities.

Priority order: 1. Explicit parameters passed to function 2. Environment variables from .env file 3. Raise error if required values missing

Parameters:

endpoint_url (str | None) – Azure ML endpoint URL (override .env)
api_key (str | None) – Azure ML API key (override .env)
use_ngrok (bool | None) – Use NGROK tunnel (override .env)
server_ip (str | None) – Server IP for NGROK (override .env)

Returns:

Dictionary with Azure configuration

Raises:

ValueError – If required configuration is missing

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.get_video_title(youtube_url)[source]

Extract video title from YouTube URL for meaningful file naming.

Video titles serve as natural identifiers for organizing processing results and enable easy correlation between source content and analysis outputs.

Parameters:: youtube_url (str) – Valid YouTube video URL
Returns:: Video title or “Unknown Title” if extraction fails
Return type:: str

Note

Gracefully handles network errors and invalid URLs to prevent pipeline interruption during batch processing scenarios.

emotion_clf_pipeline.predict.predict_emotion(texts, feature_config=None, reload_model=False)[source]

Apply emotion classification to text using trained models.

This function serves as the core intelligence of the pipeline, transforming raw text into structured emotional insights. It supports both single text analysis and batch processing for efficiency.

Parameters:

texts – Single text string or list of texts for analysis
feature_config – Optional configuration for feature extraction methods
reload_model – Force model reinitialization (useful for memory management)

Returns:

Emotion predictions with confidence scores.: Single dict for one text, list of dicts for multiple texts. Returns None if prediction fails.

Return type:

dict or list

Performance:: Logs processing latency for performance monitoring and optimization.

emotion_clf_pipeline.predict.predict_emotions_azure(video_url, endpoint_url=None, api_key=None, use_stt=False, chunk_size=200, use_ngrok=None, server_ip=None)[source]

Predict emotions using Azure ML endpoint with auto-loaded configuration.

Parameters:

video_url (str) – URL or path to video/audio file
endpoint_url (str | None) – Azure ML endpoint URL (overrides .env if provided)
api_key (str | None) – Azure ML API key (overrides .env if provided)
use_stt (bool) – Whether to use speech-to-text for audio
chunk_size (int) – Text chunk size for processing
use_ngrok (bool | None) – Whether to use NGROK tunnel (overrides .env if provided)
server_ip (str | None) – Server IP for NGROK (overrides .env if provided)

Returns:

Dictionary containing predictions and metadata

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.predict_emotions_local(video_url, model_path='models/weights/baseline_weights.pt', config_path='models/weights/model_config.json', use_stt=False, chunk_size=200)[source]

Predict emotions using local model inference.

Parameters:

video_url (str) – URL or path to video/audio file
model_path (str) – Path to model weights
config_path (str) – Path to model configuration
use_stt (bool) – Whether to use speech-to-text for audio
chunk_size (int) – Text chunk size for processing

Returns:

Dictionary containing predictions and metadata

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.process_text_chunks(text, model, feature_extractor, chunk_size=200, expected_feature_dim=121)[source]

Process text in chunks for local model inference.

Parameters:

text (str) – Input text to process
model – Loaded model
feature_extractor (FeatureExtractor) – Feature extractor instance
chunk_size (int) – Size of text chunks
expected_feature_dim (int)

Returns:

List of predictions for each chunk

Return type:

List[Dict[str, Any]]

emotion_clf_pipeline.predict.process_youtube_url_and_predict(youtube_url, transcription_method)[source]

Execute the complete emotion analysis pipeline for a YouTube video.: This is the main orchestration function that coordinates all pipeline stages:

Audio extraction from YouTube (with title metadata)
Speech-to-text transcription (with fallback mechanisms)
Emotion classification (with temporal alignment)
Results persistence (structured Excel output)

The function maintains data lineage throughout the process, ensuring that timestamps from transcription are preserved and aligned with emotion predictions for temporal analysis capabilities.

Parameters:

youtube_url (str) – Valid YouTube video URL for processing
transcription_method (str) – “assemblyAI” or “whisper” for speech recognition

Returns:

Structured emotion analysis results where each dictionary: contains temporal and emotional metadata: - start_time/end_time: Temporal boundaries of the segment - text: Transcribed speech content - emotion/sub_emotion: Classified emotional states - intensity: Emotional intensity measurement

Return type:

list[dict]

Returns empty list if essential processing steps fail.

Note

Creates necessary output directories automatically. All intermediate and final results are persisted to disk for reproducibility and further analysis.

emotion_clf_pipeline.predict.speech_to_text(transcription_method, audio_file, output_file)[source]

Convert audio to text using configurable transcription services.

Implements a robust transcription strategy with automatic fallback: - Primary: AssemblyAI (cloud-based, high accuracy) - Fallback: Whisper (local processing, privacy-preserving)

This dual-service approach ensures pipeline reliability even when external services are unavailable or API limits are reached.

Parameters:

transcription_method – “assemblyAI” or “whisper” for primary service
audio_file – Path to input audio file
output_file – Path where transcript will be saved

Raises:

ValueError – If transcription_method is not recognized

Note

AssemblyAI failures trigger automatic Whisper fallback. All transcription attempts are logged for debugging purposes.

emotion_clf_pipeline.predict.time_str_to_seconds(time_str)[source]

Convert time strings to seconds for numerical operations.

Handles multiple time formats commonly found in transcription outputs: - HH:MM:SS or HH:MM:SS.mmm (hours, minutes, seconds with optional milliseconds) - MM:SS or MM:SS.mmm (minutes, seconds with optional milliseconds) - Numeric values (already in seconds)

This conversion is essential for temporal analysis and synchronization between audio timestamps and emotion predictions.

Parameters:

time_str – Time in string format or numeric value

Returns:

Time converted to seconds, or 0.0 if parsing fails: Note:

Returns 0.0 for invalid inputs rather than raising exceptions to maintain pipeline robustness during batch processing.

Return type:

float

emotion_clf_pipeline.predict.transcribe_youtube_url(video_url, use_stt=False)[source]

Main transcription function that chooses between subtitle and STT methods.

Parameters:

video_url (str) – YouTube video URL
use_stt (bool) – If True, force use of speech-to-text. If False, try subtitles first.

Returns:

Dictionary containing transcript data and metadata

Return type:

Dict[str, Any]