emotion_clf_pipeline.model

Emotion classification model components.

Provides multi-task DEBERTA-based emotion classification with sub-emotion mapping and intensity prediction. Supports both local and Azure ML model synchronization.

Classes

CustomPredictor(model, tokenizer, device[, ...])

Multi-task emotion prediction engine.

DEBERTAClassifier(*args, **kwargs)

Multi-task DEBERTA-based emotion classifier.

EmotionPredictor()

High-level interface for emotion classification.

ModelLoader([model_name, device])

Handles DEBERTA model and tokenizer loading with device management.

class emotion_clf_pipeline.model.CustomPredictor(model, tokenizer, device, encoders_dir='models/encoders', feature_config=None)[source]

Bases: object

Multi-task emotion prediction engine.

Handles emotion classification inference by combining the trained model with feature engineering and post-processing. Maps sub-emotions to main emotions for consistent predictions.

__init__(model, tokenizer, device, encoders_dir='models/encoders', feature_config=None)[source]

Initialize emotion predictor with model and supporting components.

Parameters:
  • model (nn.Module) – Trained emotion classification model

  • tokenizer – Tokenizer for text preprocessing

  • device (torch.device) – Target device for inference

  • encoders_dir (str) – Directory containing label encoder files

  • feature_config (dict, optional) – Feature extraction configuration

post_process(df)[source]

Refine predictions by aligning sub-emotions with main emotions.

Uses probability distributions to select sub-emotions that are consistent with predicted main emotions, improving classification coherence.

Parameters:

df (pd.DataFrame) – Predictions with sub_emotion_logits column

Returns:

Refined predictions with emotion_pred_post_processed

Return type:

pd.DataFrame

predict(texts, batch_size=16)[source]

Generate emotion predictions for text inputs.

Processes texts through feature extraction, model inference, and post-processing to produce final emotion classifications.

Parameters:
  • texts (list) – List of text strings to classify

  • batch_size (int) – Batch size for inference. Defaults to 16.

Returns:

Predictions with mapped emotions and confidence scores

Return type:

pd.DataFrame

class emotion_clf_pipeline.model.DEBERTAClassifier(*args, **kwargs)[source]

Bases: Module

Multi-task DEBERTA-based emotion classifier.

Performs simultaneous classification for: - Main emotions (7 categories) - Sub-emotions (28 categories) - Emotion intensity (3 levels)

Combines DEBERTA embeddings with engineered features through projection layers.

__init__(model_name, feature_dim, num_classes, hidden_dim=256, dropout=0.1)[source]

Initialize the multi-task emotion classifier.

Parameters:
  • model_name (str) – Pretrained DEBERTA model identifier

  • feature_dim (int) – Dimension of engineered features

  • num_classes (dict) – Class counts for each task (emotion, sub_emotion, intensity)

  • hidden_dim (int) – Hidden layer dimension. Defaults to 256.

  • dropout (float) – Dropout probability. Defaults to 0.1.

forward(input_ids, attention_mask, features)[source]

Compute multi-task emotion predictions.

Parameters:
  • input_ids (torch.Tensor) – Tokenized input text

  • attention_mask (torch.Tensor) – Attention mask for input

  • features (torch.Tensor) – Engineered features

Returns:

Logits for each classification task

Return type:

dict

class emotion_clf_pipeline.model.EmotionPredictor[source]

Bases: object

High-level interface for emotion classification.

Provides a simple API for predicting emotions from text with automatic model loading, Azure ML synchronization, and feature configuration. Handles single texts or batches transparently.

__init__()[source]

Initialize predictor with lazy model loading.

ensure_best_baseline()[source]

Ensure we have the best available baseline model from Azure ML.

This is an alias for ensure_best_baseline_model() for backward compatibility. Checks Azure ML for models with better F1 scores than the current local baseline and downloads them if found.

Returns:

True if a better model was downloaded and loaded, False otherwise

Return type:

bool

ensure_best_baseline_model()[source]

Ensure we have the best available baseline model from Azure ML.

This method checks Azure ML for models with better F1 scores than the current local baseline and downloads them if found. It forces a reload of the prediction model to use the updated baseline.

Returns:

True if a better model was downloaded and loaded, False otherwise

Return type:

bool

predict(texts, feature_config=None, reload_model=False)[source]

Predict emotions for single text or batch of texts.

Automatically handles model loading, feature extraction, and result formatting. Returns structured predictions with emotion, sub-emotion, and intensity classifications.

Parameters:
  • texts (str or list) – Text(s) to classify

  • feature_config (dict, optional) – Feature extraction settings. Defaults to tfidf=True, emolex=True, others=False.

  • reload_model (bool) – Force model reload. Defaults to False.

Returns:

Prediction dict for single text, list for batch

Return type:

dict or list

class emotion_clf_pipeline.model.ModelLoader(model_name='microsoft/deberta-v3-xsmall', device=None)[source]

Bases: object

Handles DEBERTA model and tokenizer loading with device management.

Supports loading pretrained models, applying custom weights, and creating predictor instances. Provides automatic device selection (GPU/CPU).

__init__(model_name='microsoft/deberta-v3-xsmall', device=None)[source]

Initialize model loader with tokenizer.

Parameters:
  • model_name (str) – Pretrained model identifier. Defaults to ‘microsoft/deberta-v3-xsmall’.

  • device (torch.device, optional) – Target device. Auto-detects if None.

create_predictor(model, encoders_dir='models/encoders', feature_config=None)[source]

Create predictor instance for emotion classification.

Parameters:
  • model (nn.Module) – Trained emotion classification model

  • encoders_dir (str) – Directory containing label encoder files

  • feature_config (dict, optional) – Feature extraction configuration

Returns:

Ready-to-use predictor instance

Return type:

CustomPredictor

ensure_best_baseline_model()[source]

Ensure we have the best available baseline model from Azure ML.

This method checks Azure ML for models with better F1 scores than the current local baseline and downloads them if found. It forces a reload of the prediction model to use the updated baseline.

Returns:

True if a better model was downloaded and loaded, False otherwise

Return type:

bool

load_baseline_model(weights_dir='models/weights', sync_azure=True)[source]

Load stable production model with optional Azure ML sync.

Parameters:
  • weights_dir (str) – Directory containing model weights

  • sync_azure (bool) – Whether to sync with Azure ML on startup

load_dynamic_model(weights_dir='models/weights', sync_azure=True)[source]

Load latest trained model with optional Azure ML sync.

Parameters:
  • weights_dir (str) – Directory containing model weights

  • sync_azure (bool) – Whether to sync with Azure ML on startup

load_model(feature_dim, num_classes, weights_path=None, hidden_dim=256, dropout=0.1)[source]

Create and optionally load pretrained model weights.

Parameters:
  • feature_dim (int) – Dimension of engineered features

  • num_classes (dict) – Class counts for each classification task

  • weights_path (str, optional) – Path to saved model weights

  • hidden_dim (int) – Hidden layer dimension. Defaults to 256.

  • dropout (float) – Dropout probability. Defaults to 0.1.

Returns:

Loaded model ready for inference or training

Return type:

DEBERTAClassifier

Raises:
promote_dynamic_to_baseline(weights_dir='models/weights', sync_azure=True)[source]

Copies dynamic weights to baseline location, optionally syncing with Azure ML.

Parameters:
  • weights_dir (str) – Directory containing model weights

  • sync_azure (bool) – Whether to sync with Azure ML on startup