emotion_clf_pipeline.train

DeBERTa-based Multi-Task Emotion Classification Trainer.

This module implements a production-ready training framework for emotion classification models supporting multiple prediction tasks: emotion, sub-emotion, and intensity levels.

Key Features: - Multi-task learning with weighted loss functions - Automatic model checkpointing and validation-based selection - Azure ML integration for model versioning and deployment - Comprehensive evaluation metrics and visualization - Flexible feature engineering pipeline integration

The trainer handles end-to-end model lifecycle from training through evaluation and deployment, with built-in support for class imbalance and feature fusion.

Classes

CustomTrainer(model, train_dataloader, ...)

Production-ready trainer for multi-task emotion classification using DeBERTa.

class emotion_clf_pipeline.train.AzureMLLogger[source]

Bases: object

Comprehensive logging class for Azure ML integration.

Handles both MLflow and Azure ML native logging to ensure metrics and artifacts appear correctly in Azure ML job overview.

__init__()[source]

Initialize the Azure ML logger with environment detection.

complete_run(status='COMPLETED')[source]

Complete the Azure ML run.

Parameters:

status (str)

create_evaluation_plots(test_preds, test_labels, test_metrics, evaluation_dir, output_tasks)[source]

Create comprehensive evaluation plots for Azure ML visualization.

Parameters:
  • test_preds – Dictionary of test predictions per task

  • test_labels – Dictionary of test labels per task

  • test_metrics – Dictionary of test metrics per task

  • evaluation_dir – Directory to save plots

  • output_tasks – List of output tasks

end_logging()[source]

End logging session.

log_artifact(local_path, artifact_path=None)[source]

Log artifacts (files/images) to both Azure ML and MLflow.

Parameters:
  • local_path (str) – Path to local file

  • artifact_path (str) – Optional subdirectory in artifacts

log_evaluation_artifacts(evaluation_dir)[source]

Log all evaluation artifacts to Azure ML for visualization.

Parameters:

evaluation_dir – Directory containing evaluation artifacts

log_image(image_path, name=None)[source]

Log image specifically for Azure ML visualization.

Parameters:
  • image_path (str) – Path to image file

  • name (str) – Display name for the image

log_metric(key, value, step=None)[source]

Log metrics to both Azure ML and MLflow.

Parameters:
  • key (str) – Metric name

  • value (float) – Metric value

  • step (int) – Step/epoch number

log_param(key, value)[source]

Log parameters to both Azure ML and MLflow.

Parameters:

key (str)

log_table(name, data)[source]

Log table data to Azure ML.

Parameters:
start_logging(run_name=None)[source]

Start logging session.

Parameters:

run_name (str)

class emotion_clf_pipeline.train.AzureMLManager(weights_dir='models/weights')[source]

Bases: object

Unified Azure ML manager for emotion classification pipeline.

Handles all Azure ML operations including: - Model weight synchronization (download/upload) - Model promotion and versioning - Status reporting and configuration validation - Backup and recovery operations

__init__(weights_dir='models/weights')[source]

Initialize Azure ML manager.

Parameters:

weights_dir – Directory path for local model weights storage

create_backup(timestamp=None)[source]

Create timestamped backup of existing model weights.

Parameters:

timestamp – Optional timestamp string, defaults to current time

download_models(dry_run=False)[source]

Download models from Azure ML if they don’t exist locally.

Parameters:

dry_run – If True, only show what would be downloaded

Returns:

(baseline_downloaded, dynamic_downloaded)

Return type:

tuple

get_status_info()[source]

Get comprehensive status information.

Returns:

Combined configuration and model status information

Return type:

dict

handle_post_training_sync(f1_score, auto_upload=False, auto_promote_threshold=0.85)[source]

Handle sync operations after training completion.

Parameters:
  • f1_score – F1 score from training

  • auto_upload – Whether to automatically upload dynamic model

  • auto_promote_threshold – F1 threshold for auto-promotion

Returns:

Results of sync operations

Return type:

dict

print_status_report(save_to_file=None)[source]

Generate and display comprehensive status report.

Parameters:

save_to_file – Optional file path to save status as JSON

promote_dynamic_to_baseline(dry_run=False)[source]

Promote dynamic model to baseline (locally and in Azure ML).

Parameters:

dry_run – If True, only show what would be promoted

Returns:

True if promotion successful, False otherwise

Return type:

bool

sync_on_startup()[source]

Perform automatic sync operations on startup.

upload_dynamic_model(f1_score, dry_run=False)[source]

Upload dynamic model to Azure ML with F1 score metadata.

Parameters:
  • f1_score – F1 score to tag the model with

  • dry_run – If True, only show what would be uploaded

Returns:

True if upload successful, False otherwise

Return type:

bool

validate_operation(operation, f1_score=None)[source]

Validate that the requested operation can be performed.

Parameters:
  • operation – Operation to validate (‘upload’, ‘promote’, etc.)

  • f1_score – F1 score for upload operations

Returns:

True if operation is valid, False otherwise

Return type:

bool

class emotion_clf_pipeline.train.CustomTrainer(model, train_dataloader, val_dataloader, test_dataloader, device, test_set_df, class_weights_tensor, encoders_dir, output_tasks=None, learning_rate=2e-05, weight_decay=0.01, epochs=1, feature_config=None)[source]

Bases: object

Production-ready trainer for multi-task emotion classification using DeBERTa.

Manages the complete training lifecycle including data loading, model training, validation, checkpointing, and evaluation. Supports flexible task configuration and automatic model promotion based on performance thresholds.

Key Capabilities: - Multi-task learning with weighted loss aggregation - Automatic best model selection via validation metrics - Feature engineering pipeline integration - Azure ML model versioning and deployment - Class imbalance handling through weighted loss functions

Thread Safety: Not thread-safe. Use separate instances for concurrent training.

__init__(model, train_dataloader, val_dataloader, test_dataloader, device, test_set_df, class_weights_tensor, encoders_dir, output_tasks=None, learning_rate=2e-05, weight_decay=0.01, epochs=1, feature_config=None)[source]

Initialize the emotion classification trainer.

Sets up training infrastructure, loads encoders, validates model dimensions, and configures feature engineering pipeline. Automatically determines feature dimensions from training data.

Parameters:
  • model – DeBERTa classifier instance with multi-task heads

  • train_dataloader – PyTorch DataLoader for training data

  • val_dataloader – PyTorch DataLoader for validation data

  • test_dataloader – PyTorch DataLoader for test data

  • device – PyTorch device (cuda/cpu) for model execution

  • test_set_df – Pandas DataFrame containing original test data with text

  • class_weights_tensor – Tensor or dict of class weights for imbalanced data

  • encoders_dir – Directory path containing label encoder pickle files

  • output_tasks – List of pred tasks [‘emotion’, ‘sub_emotion’, ‘intensity’]

  • learning_rate – AdamW optimizer learning rate (default: 2e-5)

  • weight_decay – L2 regularization coefficient (default: 0.01)

  • epochs – Number of training epochs (default: 1)

  • feature_config – Dict specifying which engineered features to use

Raises:
  • FileNotFoundError – If encoder files are missing from encoders_dir

  • ValueError – If model dimensions don’t match encoder classes

Side Effects:
  • Loads and validates label encoders

  • Configures task-specific loss weights

  • Logs initialization status and warnings

static calculate_metrics(preds, labels, task_name='')[source]

Compute comprehensive classification metrics for model evaluation.

Calculates accuracy, F1-score, precision, and recall using weighted averaging to handle class imbalance. Generates detailed classification report with per-class statistics.

Parameters:
  • preds – Model predictions as numeric class indices

  • labels – Ground truth labels as numeric class indices

  • task_name – Descriptive name for logging context

Returns:

Metrics dictionary containing:
  • acc: Accuracy score (0-1)

  • f1: Weighted F1-score (0-1)

  • prec: Weighted precision (0-1)

  • rec: Weighted recall (0-1)

  • report: Detailed classification report string

Return type:

dict

Handles edge cases like empty datasets and length mismatches gracefully by returning zero metrics with appropriate warnings.

evaluate(dataloader, criterion_dict, is_test=False)[source]

Evaluate model performance on validation or test data.

Runs inference on provided dataset without gradient computation, collecting predictions and computing loss for all active tasks.

Parameters:
  • dataloader – PyTorch DataLoader containing evaluation data

  • criterion_dict – Task-specific loss functions for loss computation

  • is_test – Boolean flag for logging context (test vs validation)

Returns:

(avg_eval_loss, all_preds, all_labels) where:
  • avg_eval_loss: Mean loss across all evaluation batches

  • all_preds: Dict mapping task names to prediction lists

  • all_labels: Dict mapping task names to ground truth lists

Return type:

tuple

Side Effects:
  • Sets model to evaluation mode (disables dropout/batch norm)

  • Logs evaluation progress via tqdm progress bar

evaluate_final_model(model_path, evaluation_output_dir)[source]

Perform comprehensive evaluation of a trained model on test data.

Loads a trained model from disk, runs inference on the test dataset, and generates detailed evaluation reports including per-sample predictions, accuracy metrics, and exported results for analysis.

Parameters:
  • model_path – File path to saved model state dict (.pt file)

  • evaluation_output_dir – Directory for saving evaluation artifacts

Returns:

Comprehensive results with columns:
  • text: Original input text samples

  • true_{task}: Ground truth labels for each task

  • pred_{task}: Model predictions for each task

  • {task}_correct: Boolean correctness per task

  • all_correct: Boolean indicating all tasks correct (if multi-task)

Return type:

pd.DataFrame

Raises:
Side Effects:
  • Loads model weights and sets to evaluation mode

  • Creates evaluation output directory if it doesn’t exist

  • Saves detailed evaluation report as CSV file

  • Logs progress and any warnings encountered

plot_evaluation_results(results_df, output_dir)[source]

Generate comprehensive plots for the evaluation results.

Parameters:
  • results_df – DataFrame containing evaluation results

  • output_dir – Directory for saving plot artifacts

Side Effects:
  • Creates plots for per-task accuracy, confusion matrix, and sample predictions

  • Saves plots as image files in the specified directory

static print_metrics(metrics_dict, split, loss=None)[source]

Display formatted training metrics in a readable table format.

Renders metrics for all tasks in a visually appealing table with color-coded headers and consistent decimal formatting. Supports different contexts (train/validation/test) with appropriate styling.

Parameters:
  • metrics_dict – Dict mapping task names to metric dictionaries

  • split – Context string (‘Train’, ‘Val’, ‘Test’) for header styling

  • loss – Optional loss value to display above metrics table

Side Effects:
  • Prints colored headers and formatted tables to console

  • Uses tabulate library for professional table formatting

  • Applies context-appropriate terminal colors

static promote_dynamic_to_baseline(weights_dir='models/weights')[source]

Promote current dynamic model to baseline status for production use.

Copies dynamic_weights.pt to baseline_weights.pt, effectively making the current best-performing model the new production baseline. This operation is typically performed after validating model performance meets promotion criteria.

Parameters:

weights_dir (str) – Directory containing model weight files

Returns:

True if promotion successful, False if dynamic model missing

Return type:

bool

Side Effects:
  • Creates or overwrites baseline_weights.pt file

  • Logs promotion status and any errors encountered

setup_training()[source]

Initialize training components for multi-task learning.

Configures loss functions, optimizer, and learning rate scheduler for all active prediction tasks. Sets up class-weighted losses for imbalanced datasets and linear warmup scheduling.

Returns:

(criterion_dict, optimizer, scheduler) where:
  • criterion_dict: Task-specific CrossEntropyLoss functions

  • optimizer: AdamW optimizer with L2 regularization

  • scheduler: Linear warmup learning rate scheduler

Return type:

tuple

Side Effects:
  • Moves class weights to appropriate device

  • Logs successful setup completion

should_promote_to_baseline(dynamic_f1, baseline_f1, threshold=0.01)[source]

Determine whether dynamic model performance justifies baseline promotion.

Compares dynamic model F1 score against current baseline with a configurable improvement threshold to prevent frequent updates from marginal improvements. Implements a simple but effective promotion strategy based on statistical significance.

Parameters:
  • dynamic_f1 – F1 score of the newly trained dynamic model

  • baseline_f1 – F1 score of the current production baseline model

  • threshold – Minimum improvement required for promotion (default: 0.01)

Returns:

True if dynamic model should replace baseline, False otherwise

Return type:

bool

Note

Uses emotion task F1 as the primary promotion criterion. In multi-task scenarios, consider weighted combinations of task performances.

train_and_evaluate(trained_model_output_dir, metrics_output_file, weights_dir_base='models/weights')[source]

Execute complete training pipeline with validation-based model selection.

Orchestrates the full training workflow including epoch iteration, validation evaluation, best model tracking, and artifact persistence. Integrates with MLflow for experiment tracking and Azure ML for model deployment.

Parameters:
  • trained_model_output_dir – Directory path for saving the best model

  • metrics_output_file – JSON file path for training metrics storage

  • weights_dir_base – Base directory for temporary model checkpoints

Returns:

Best validation F1 scores for each task from optimal epoch

Return type:

dict

Side Effects:
  • Creates temporary directories for model checkpoints

  • Logs training progress and metrics to MLflow

  • Saves model configuration and state dict files

  • Attempts Azure ML model upload with auto-promotion

  • Cleans up temporary checkpoint files after completion

train_epoch(criterion_dict, optimizer, scheduler)[source]

Execute one complete training epoch across all batches.

Performs forward pass, loss computation, backpropagation, and optimizer updates for all configured tasks. Collects predictions and ground truth labels for comprehensive metric calculation.

Parameters:
  • criterion_dict – Task-specific loss functions (from setup_training)

  • optimizer – AdamW optimizer instance

  • scheduler – Learning rate scheduler instance

Returns:

(avg_train_loss, train_metrics_epoch) where:
  • avg_train_loss: Mean loss across all batches

  • train_metrics_epoch: Dict of metrics per task

Return type:

tuple

Side Effects:
  • Updates model parameters via backpropagation

  • Advances learning rate scheduler

  • Logs training progress via tqdm progress bar

emotion_clf_pipeline.train.main()[source]

Main function for training the model.

emotion_clf_pipeline.train.parse_arguments()[source]

Parse command line arguments for training configuration.

Returns:

Parsed arguments containing training parameters

Return type:

argparse.Namespace