Evaluate Module

audiovisually.evaluate.compare_models(model_path_1, model_path_2, df, text_column='Sentence', label_column='True Emotion', label_list=None)

Compare two transformer models on the same labeled dataset.

Parameters:
  • model_path_1 (str) – Path to the first model directory or Hugging Face model name.

  • model_path_2 (str) – Path to the second model directory or Hugging Face model name.

  • df (pd.DataFrame) – DataFrame containing text and true labels.

  • text_column (str) – Name of the column with input text.

  • label_column (str) – Name of the column with true labels.

  • label_list (list, optional) – List of label names. Defaults to standard 7 emotions.

Returns:

{
“model_1”: {

“accuracy”: float, “f1”: float, “classification_report”: dict or None, “classification_report_message”: str or None, “confusion_matrix”: np.ndarray, “result_df”: pd.DataFrame

}, “model_2”: {

”accuracy”: float, “f1”: float, “classification_report”: dict or None, “classification_report_message”: str or None, “confusion_matrix”: np.ndarray, “result_df”: pd.DataFrame

}

}

Return type:

dict

Example

>>> from audiovisually.evaluate import compare_models
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'True Emotion': ['happiness', 'sadness']})
>>> results = compare_models("distilroberta-base", "cardiffnlp/twitter-roberta-base-emotion", df)
>>> print(results["model_1"]["accuracy"])
>>> print(results["model_2"]["result_df"].head())
audiovisually.evaluate.evaluate_model(model_path, df, text_column='Sentence', label_column='True Emotion', label_list=None, batch_size=8)

Evaluate a transformer model on a labeled dataset and return metrics and predictions.

Parameters:
  • model_path (str) – Path to the trained model directory or Hugging Face model name.

  • df (pd.DataFrame) – DataFrame containing text and true labels.

  • text_column (str) – Name of the column with input text.

  • label_column (str) – Name of the column with true labels.

  • label_list (list, optional) – List of label names. Defaults to standard 7 emotions.

  • batch_size (int) – Batch size for prediction.

Returns:

{

“accuracy”: float, “f1”: float, “classification_report”: dict or None, “classification_report_message”: str or None, “confusion_matrix”: np.ndarray, “result_df”: pd.DataFrame

}

Return type:

dict

Example

>>> from audiovisually.evaluate import evaluate_model
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'True Emotion': ['happiness', 'sadness']})
>>> results = evaluate_model("distilroberta-base", df)
>>> print(results["accuracy"])
>>> print(results["result_df"].head())