Train Module

audiovisually.train.get_model_info(model_path_or_name)

Print and return key information about a transformer model and its tokenizer.

Parameters:: model_path_or_name (str) – Hugging Face model name or local path.
Returns:: Model and tokenizer information.
Return type:: dict

Example

>>> from audiovisually.train import get_model_info
>>> info = get_model_info("distilroberta-base")

audiovisually.train.retrain_existing_model(model_path, train_df, text_column='Sentence', label_column='Label', output_dir='./retrained_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)

Fine-tune an existing transformer model on new data, with early stopping and best model saving.

Parameters:

model_path (str) – Path to the pre-trained model directory.
train_df (pd.DataFrame) – DataFrame with text and labels.
text_column (str) – Column with input text.
label_column (str) – Column with target labels.
output_dir (str) – Directory to save the retrained model.
epochs (int) – Number of epochs.
batch_size (int) – Batch size.
learning_rate (float) – Learning rate.
eval_split (float) – Fraction for validation split.
label_list (list) – Optional list of label names.
patience (int) – Early stopping patience.
validation_df (pd.DataFrame) – Optional DataFrame for validation data.
validation_text_column (str) – Optional column name for validation text.
validation_label_column (str) – Optional column name for validation labels.
**kwargs – Additional TrainingArguments.

Returns:

Hugging Face Trainer object (already trained).

Return type:

Trainer

Example

>>> from audiovisually.train import retrain_existing_model
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']})
>>> trainer = retrain_existing_model("./my_model", df, output_dir="./my_model_retrained")

audiovisually.train.train_new_transformer_model(train_df, model_name='distilroberta-base', text_column='Sentence', label_column='Label', num_labels=7, output_dir='./new_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)

Train a transformer model from pretrained weights with a new classification head, using early stopping and saving the best model.

Parameters:

train_df (pd.DataFrame) – DataFrame with text and labels.
model_name (str) – Hugging Face model name (default “distilroberta-base”).
text_column (str) – Column with input text.
label_column (str) – Column with target labels.
num_labels (int) – Number of classes.
output_dir (str) – Directory to save the model.
epochs (int) – Number of epochs.
batch_size (int) – Batch size.
learning_rate (float) – Learning rate.
eval_split (float) – Fraction for validation split.
label_list (list) – Optional list of label names.
patience (int) – Early stopping patience.
validation_df (pd.DataFrame) – Optional DataFrame for validation data.
validation_text_column (str) – Optional column name for validation text.
validation_label_column (str) – Optional column name for validation labels.
**kwargs – Additional TrainingArguments.

Returns:

Hugging Face Trainer object (already trained).

Return type:

Trainer

Example

>>> from audiovisually.train import train_new_transformer_model
>>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']})
>>> trainer = train_new_transformer_model(df, model_name="distilroberta-base", output_dir="./my_model")