Train Module
- audiovisually.train.get_model_info(model_path_or_name)
Print and return key information about a transformer model and its tokenizer.
- Parameters:
model_path_or_name (str) – Hugging Face model name or local path.
- Returns:
Model and tokenizer information.
- Return type:
dict
Example
>>> from audiovisually.train import get_model_info >>> info = get_model_info("distilroberta-base")
- audiovisually.train.retrain_existing_model(model_path, train_df, text_column='Sentence', label_column='Label', output_dir='./retrained_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)
Fine-tune an existing transformer model on new data, with early stopping and best model saving.
- Parameters:
model_path (str) – Path to the pre-trained model directory.
train_df (pd.DataFrame) – DataFrame with text and labels.
text_column (str) – Column with input text.
label_column (str) – Column with target labels.
output_dir (str) – Directory to save the retrained model.
epochs (int) – Number of epochs.
batch_size (int) – Batch size.
learning_rate (float) – Learning rate.
eval_split (float) – Fraction for validation split.
label_list (list) – Optional list of label names.
patience (int) – Early stopping patience.
validation_df (pd.DataFrame) – Optional DataFrame for validation data.
validation_text_column (str) – Optional column name for validation text.
validation_label_column (str) – Optional column name for validation labels.
**kwargs – Additional TrainingArguments.
- Returns:
Hugging Face Trainer object (already trained).
- Return type:
Trainer
Example
>>> from audiovisually.train import retrain_existing_model >>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']}) >>> trainer = retrain_existing_model("./my_model", df, output_dir="./my_model_retrained")
- audiovisually.train.train_new_transformer_model(train_df, model_name='distilroberta-base', text_column='Sentence', label_column='Label', num_labels=7, output_dir='./new_model', epochs=10, batch_size=8, learning_rate=2e-05, eval_split=0.1, label_list=None, patience=3, validation_df=None, validation_text_column=None, validation_label_column=None, **kwargs)
Train a transformer model from pretrained weights with a new classification head, using early stopping and saving the best model.
- Parameters:
train_df (pd.DataFrame) – DataFrame with text and labels.
model_name (str) – Hugging Face model name (default “distilroberta-base”).
text_column (str) – Column with input text.
label_column (str) – Column with target labels.
num_labels (int) – Number of classes.
output_dir (str) – Directory to save the model.
epochs (int) – Number of epochs.
batch_size (int) – Batch size.
learning_rate (float) – Learning rate.
eval_split (float) – Fraction for validation split.
label_list (list) – Optional list of label names.
patience (int) – Early stopping patience.
validation_df (pd.DataFrame) – Optional DataFrame for validation data.
validation_text_column (str) – Optional column name for validation text.
validation_label_column (str) – Optional column name for validation labels.
**kwargs – Additional TrainingArguments.
- Returns:
Hugging Face Trainer object (already trained).
- Return type:
Trainer
Example
>>> from audiovisually.train import train_new_transformer_model >>> df = pd.DataFrame({'Sentence': ['I am happy', 'I am sad'], 'Label': ['happiness', 'sadness']}) >>> trainer = train_new_transformer_model(df, model_name="distilroberta-base", output_dir="./my_model")