Train Module

audiovisually.train.get_model_info(model_path_or_name)

Print and return key information about a transformer model and its tokenizer.

Parameters:

model_path_or_name (str) – Hugging Face model name or local path.

Returns:

Model and tokenizer information.

Return type:

dict

Example

>>> from audiovisually.train import get_model_info
>>> info = get_model_info("distilroberta-base")
audiovisually.train.retrain_existing_model(model_path, train_df, text_column='Sentence', label_column='Label', output_dir='./retrained_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)

Fine-tune an existing transformer model on new data, with early stopping and best model saving.

Parameters:
  • model_path (str) – Path to the pre-trained model directory.

  • train_df (pd.DataFrame) – DataFrame with text and labels.

  • text_column (str) – Column with input text.

  • label_column (str) – Column with target labels.

  • output_dir (str) – Directory to save the retrained model.

  • epochs (int) – Number of epochs.

  • batch_size (int) – Batch size.

  • learning_rate (float) – Learning rate.

  • eval_split (float) – Fraction for validation split.

  • label_list (list) – Optional list of label names.

  • patience (int) – Early stopping patience.

  • validation_df (pd.DataFrame) – Optional DataFrame for validation data.

  • validation_text_column (str) – Optional column name for validation text.

  • validation_label_column (str) – Optional column name for validation labels.

  • **kwargs – Additional TrainingArguments.

Returns:

Hugging Face Trainer object (already trained).

Return type:

Trainer

Example

>>> from audiovisually.train import retrain_existing_model
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']})
>>> trainer = retrain_existing_model("./my_model", df, output_dir="./my_model_retrained")
audiovisually.train.train_new_transformer_model(train_df, model_name='distilroberta-base', text_column='Sentence', label_column='Label', num_labels=7, output_dir='./new_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)

Train a transformer model from pretrained weights with a new classification head, using early stopping and saving the best model.

Parameters:
  • train_df (pd.DataFrame) – DataFrame with text and labels.

  • model_name (str) – Hugging Face model name (default “distilroberta-base”).

  • text_column (str) – Column with input text.

  • label_column (str) – Column with target labels.

  • num_labels (int) – Number of classes.

  • output_dir (str) – Directory to save the model.

  • epochs (int) – Number of epochs.

  • batch_size (int) – Batch size.

  • learning_rate (float) – Learning rate.

  • eval_split (float) – Fraction for validation split.

  • label_list (list) – Optional list of label names.

  • patience (int) – Early stopping patience.

  • validation_df (pd.DataFrame) – Optional DataFrame for validation data.

  • validation_text_column (str) – Optional column name for validation text.

  • validation_label_column (str) – Optional column name for validation labels.

  • **kwargs – Additional TrainingArguments.

Returns:

Hugging Face Trainer object (already trained).

Return type:

Trainer

Example

>>> from audiovisually.train import train_new_transformer_model
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']})
>>> trainer = train_new_transformer_model(df, model_name="distilroberta-base", output_dir="./my_model")