emotion_clf_pipeline.predict

A complete end-to-end pipeline for extracting and analyzing emotional content from YouTube videos. This module orchestrates the entire workflow from audio extraction to emotion classification, providing a streamlined interface for sentiment analysis research and applications.

The pipeline supports multiple transcription services with automatic fallback mechanisms to ensure robustness in production environments.

Usage:

python predict.py “https://youtube.com/watch?v=…” –transcription_method whisper

Functions

get_video_title(youtube_url)

Extract video title from YouTube URL for meaningful file naming.

predict_emotion(texts[, feature_config, ...])

Apply emotion classification to text using trained models.

process_youtube_url_and_predict(youtube_url, ...)

Execute the complete emotion analysis pipeline for a YouTube video.

speech_to_text(transcription_method, ...)

Convert audio to text using configurable transcription services.

time_str_to_seconds(time_str)

Convert time strings to seconds for numerical operations.

class emotion_clf_pipeline.predict.AzureEndpointPredictor(api_key, endpoint_url, encoders_dir='models/encoders')[source]

Bases: object

A class to interact with an Azure endpoint for emotion classification. It handles API requests, decodes predictions, and post-processes sub-emotions.

__init__(api_key, endpoint_url, encoders_dir='models/encoders')[source]

Initialize with API key, endpoint URL, and encoder directory. Automatically converts private network URLs to public NGROK URLs.

decode_and_postprocess(raw_predictions)[source]

Decode raw predictions and post-process sub-emotion to ensure consistency.

get_prediction(text)[source]

Send a request to the Azure endpoint and return the raw response.

predict(text)[source]

Full workflow: get prediction, decode, and post-process. Handles double-encoded JSON from the API.

Parameters:

text (str)

Return type:

dict

predict_batch(texts)[source]

Predict emotions for multiple texts (sequential calls).

Parameters:

texts (List[str]) – List of input texts

Returns:

List of prediction results

Return type:

List[Dict[str, Any]]

emotion_clf_pipeline.predict.extract_audio_transcript(video_url)[source]

Extract transcript using speech-to-text from stt.py.

Parameters:

video_url (str)

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.extract_transcript(video_url)[source]

Extract transcript from YouTube video using subtitles (fallback to STT).

Parameters:

video_url (str)

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.get_azure_config(endpoint_url=None, api_key=None, use_ngrok=None, server_ip=None)[source]

Get Azure endpoint configuration with fallback priorities.

Priority order: 1. Explicit parameters passed to function 2. Environment variables from .env file 3. Raise error if required values missing

Parameters:
  • endpoint_url (str | None) – Azure ML endpoint URL (override .env)

  • api_key (str | None) – Azure ML API key (override .env)

  • use_ngrok (bool | None) – Use NGROK tunnel (override .env)

  • server_ip (str | None) – Server IP for NGROK (override .env)

Returns:

Dictionary with Azure configuration

Raises:

ValueError – If required configuration is missing

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.get_video_title(youtube_url)[source]

Extract video title from YouTube URL for meaningful file naming.

Video titles serve as natural identifiers for organizing processing results and enable easy correlation between source content and analysis outputs.

Parameters:

youtube_url (str) – Valid YouTube video URL

Returns:

Video title or “Unknown Title” if extraction fails

Return type:

str

Note

Gracefully handles network errors and invalid URLs to prevent pipeline interruption during batch processing scenarios.

emotion_clf_pipeline.predict.predict_emotion(texts, feature_config=None, reload_model=False)[source]

Apply emotion classification to text using trained models.

This function serves as the core intelligence of the pipeline, transforming raw text into structured emotional insights. It supports both single text analysis and batch processing for efficiency.

Parameters:
  • texts – Single text string or list of texts for analysis

  • feature_config – Optional configuration for feature extraction methods

  • reload_model – Force model reinitialization (useful for memory management)

Returns:

Emotion predictions with confidence scores.

Single dict for one text, list of dicts for multiple texts. Returns None if prediction fails.

Return type:

dict or list

Performance:

Logs processing latency for performance monitoring and optimization.

emotion_clf_pipeline.predict.predict_emotions_azure(video_url, endpoint_url=None, api_key=None, use_stt=False, chunk_size=200, use_ngrok=None, server_ip=None)[source]

Predict emotions using Azure ML endpoint with auto-loaded configuration.

Parameters:
  • video_url (str) – URL or path to video/audio file

  • endpoint_url (str | None) – Azure ML endpoint URL (overrides .env if provided)

  • api_key (str | None) – Azure ML API key (overrides .env if provided)

  • use_stt (bool) – Whether to use speech-to-text for audio

  • chunk_size (int) – Text chunk size for processing

  • use_ngrok (bool | None) – Whether to use NGROK tunnel (overrides .env if provided)

  • server_ip (str | None) – Server IP for NGROK (overrides .env if provided)

Returns:

Dictionary containing predictions and metadata

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.predict_emotions_local(video_url, model_path='models/weights/baseline_weights.pt', config_path='models/weights/model_config.json', use_stt=False, chunk_size=200)[source]

Predict emotions using local model inference.

Parameters:
  • video_url (str) – URL or path to video/audio file

  • model_path (str) – Path to model weights

  • config_path (str) – Path to model configuration

  • use_stt (bool) – Whether to use speech-to-text for audio

  • chunk_size (int) – Text chunk size for processing

Returns:

Dictionary containing predictions and metadata

Return type:

Dict[str, Any]

emotion_clf_pipeline.predict.process_text_chunks(text, model, feature_extractor, chunk_size=200, expected_feature_dim=121)[source]

Process text in chunks for local model inference.

Parameters:
  • text (str) – Input text to process

  • model – Loaded model

  • feature_extractor (FeatureExtractor) – Feature extractor instance

  • chunk_size (int) – Size of text chunks

  • expected_feature_dim (int)

Returns:

List of predictions for each chunk

Return type:

List[Dict[str, Any]]

emotion_clf_pipeline.predict.process_youtube_url_and_predict(youtube_url, transcription_method)[source]
Execute the complete emotion analysis pipeline for a YouTube video.

This is the main orchestration function that coordinates all pipeline stages:

  1. Audio extraction from YouTube (with title metadata)

  2. Speech-to-text transcription (with fallback mechanisms)

  3. Emotion classification (with temporal alignment)

  4. Results persistence (structured Excel output)

The function maintains data lineage throughout the process, ensuring that timestamps from transcription are preserved and aligned with emotion predictions for temporal analysis capabilities.

Parameters:
  • youtube_url (str) – Valid YouTube video URL for processing

  • transcription_method (str) – “assemblyAI” or “whisper” for speech recognition

Returns:

Structured emotion analysis results where each dictionary

contains temporal and emotional metadata: - start_time/end_time: Temporal boundaries of the segment - text: Transcribed speech content - emotion/sub_emotion: Classified emotional states - intensity: Emotional intensity measurement

Return type:

list[dict]

Returns empty list if essential processing steps fail.

Note

Creates necessary output directories automatically. All intermediate and final results are persisted to disk for reproducibility and further analysis.

emotion_clf_pipeline.predict.speech_to_text(transcription_method, audio_file, output_file)[source]

Convert audio to text using configurable transcription services.

Implements a robust transcription strategy with automatic fallback: - Primary: AssemblyAI (cloud-based, high accuracy) - Fallback: Whisper (local processing, privacy-preserving)

This dual-service approach ensures pipeline reliability even when external services are unavailable or API limits are reached.

Parameters:
  • transcription_method – “assemblyAI” or “whisper” for primary service

  • audio_file – Path to input audio file

  • output_file – Path where transcript will be saved

Raises:

ValueError – If transcription_method is not recognized

Note

AssemblyAI failures trigger automatic Whisper fallback. All transcription attempts are logged for debugging purposes.

emotion_clf_pipeline.predict.time_str_to_seconds(time_str)[source]

Convert time strings to seconds for numerical operations.

Handles multiple time formats commonly found in transcription outputs: - HH:MM:SS or HH:MM:SS.mmm (hours, minutes, seconds with optional milliseconds) - MM:SS or MM:SS.mmm (minutes, seconds with optional milliseconds) - Numeric values (already in seconds)

This conversion is essential for temporal analysis and synchronization between audio timestamps and emotion predictions.

Parameters:

time_str – Time in string format or numeric value

Returns:

Time converted to seconds, or 0.0 if parsing fails

Note:

Returns 0.0 for invalid inputs rather than raising exceptions to maintain pipeline robustness during batch processing.

Return type:

float

emotion_clf_pipeline.predict.transcribe_youtube_url(video_url, use_stt=False)[source]

Main transcription function that chooses between subtitle and STT methods.

Parameters:
  • video_url (str) – YouTube video URL

  • use_stt (bool) – If True, force use of speech-to-text. If False, try subtitles first.

Returns:

Dictionary containing transcript data and metadata

Return type:

Dict[str, Any]