Fine-Tuning GPT-3.5: A Step-by-Step Guide

Fine-Tuning GPT-3.5: A Step-by-Step Guide

Nov 13, 2023ยท

4 min read


In the rapidly evolving world of AI and machine learning, fine-tuning pre-trained models like GPT-3.5 has become a pivotal step in achieving enhanced and specialized performance. This guide will walk you through the fine-tuning process of the GPT-3.5 model, explaining its benefits and providing a step-by-step tutorial with code.

Why Fine-Tune GPT-3.5?

Fine-tuning GPT-3.5 has several advantages:

  1. Improved Quality: It leads to higher quality results compared to using the model with generic prompts.

  2. Customization: Fine-tuning allows the model to adapt to specific use cases or domains, which might not be effectively covered in the standard model.

  3. Efficiency: It can reduce the need for long, complex prompts by embedding domain knowledge directly into the model.

Preparing for Fine-Tuning

Before starting the fine-tuning process, it's crucial to prepare your dataset. This dataset should consist of examples relevant to the specific task or domain you're targeting.

A Step-by-Step Guide to Fine-Tuning GPT-3.5

Install OpenAI Library:

Begin by installing the OpenAI library in your Python environment.

!pip install -U openai

Prepare Your Dataset:

Load and format your dataset. This example uses a dataset for customer support queries:

import pandas as pd
df = pd.read_csv("your_dataset.csv")

Format the Data:

Convert your data into a format suitable for GPT-3.5. This involves structuring your examples as a series of messages, emulating a conversation.

def convert_to_gpt35_format(dataset):
    fine_tuning_data = []
    for _, row in dataset.iterrows():
        json_response = '{"Top Category": "' + row['Top Category'] + '", "Sub Category": "' + row['Sub Category'] + '"}'
            "messages": [
                {"role": "user", "content": row['Support Query']},
                {"role": "system", "content": json_response}
    return fine_tuning_data

Creating Training and Validation Sets

After formatting the data, the next step is to split it into training and validation sets. This is crucial for training the model on a subset of data and then validating its performance on a different subset.

from sklearn.model_selection import train_test_split

# Stratified splitting. Assuming 'Top Category' can be used for stratification
train_data, val_data = train_test_split(
    stratify=dataset['Top Category'],
    random_state=42  # for reproducibility

Creating JSONL Files

Fine-tuning with OpenAI requires the data to be in JSONL format. The code demonstrates how to convert the training and validation sets into this format and save them as files.

def write_to_jsonl(data, file_path):
    with open(file_path, 'w') as file:
        for entry in data:
            json.dump(entry, file)

training_file_name = "train.jsonl"
validation_file_name = "val.jsonl"

write_to_jsonl(train_data, training_file_name)
write_to_jsonl(val_data, validation_file_name)

Uploading Data and Starting the Fine-Tuning Job

With the JSONL files ready, you upload them to OpenAI and initiate the fine-tuning process.

from openai import OpenAI
client = OpenAI(api_key="your_open_ai_key")

# Upload Training and Validation Files
training_file = client.files.create(
    file=open(training_file_name, "rb"), purpose="fine-tune"
validation_file = client.files.create(
    file=open(validation_file_name, "rb"), purpose="fine-tune"

# Create Fine-Tuning Job
suffix_name = "yt_tutorial"
response =,,

Testing the Fine-Tuned Model

Once fine-tuned, it's essential to test the model's performance. The provided code includes a function to format test queries, a prediction function using the fine-tuned model, and a method to store predictions.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def format_test(row):
    formatted_message = [{"role": "user", "content": row['Support Query']}]
    return formatted_message

def predict(test_messages, fine_tuned_model_id):
    response =
        model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=50
    return response.choices[0].message.content

def store_predictions(test_df, fine_tuned_model_id):
    test_df['Prediction'] = None
    for index, row in test_df.iterrows():
        test_message = format_test(row)
        prediction_result = predict(test_message, fine_tuned_model_id)[index, 'Prediction'] = prediction_result



With just 100 examples, the model shows promising results, particularly in identifying top categories. This experiment highlights the importance of starting with a small dataset and progressively adding more data for refinement.


This detailed guide, enriched with code snippets and explanations, illustrates the entire process of fine-tuning the GPT-3.5 model. It's a testament to the power and flexibility of AI models in adapting to specific needs and domains, providing enhanced and more relevant responses.

If you're curious about the latest in AI technology, I invite you to visit my project, AI Demos, at It's a rich resource offering a wide array of video demos showcasing the most advanced AI tools. My goal with AI Demos is to educate and illuminate the diverse possibilities of AI.

For even more in-depth exploration, be sure to visit my YouTube channel at Here, you'll find a wealth of content that delves into the exciting future of AI and its various applications.

Code and Dataset: