Upgraded LLM for Better Contextualisation (using Nvidia GPU) Part-2

LLM

Reading Time: 8 minutes

The previous post explains the basis of LLM and guides you through a very simple implementation, has only produced a final model that performs very average on the task it is required to perform. So in this part 2 of the LLM series, I am going to introduce you to certain techniques and key ideologies that enhances the model and gives better results.

Areas of focus in this post:

How to get better results for a specific task from an LLM?
Setting up GPU for faster training.
Cleaning the data
Loading an advanced model and fine-tuning it.

Before starting with the project, I was looking for the right datasets to be fed into the model so that it is specifically trained with finance based content. But since the purely finance based dataset which I identified in HuggingFace had just about 4000 rows, so I had to go with a bigger dataset which had only a few rows that contextualises finance based content.

Then I decided to fine-tune the trained model over the new dataset which had only a larger rows comparatively(It is available in HuggingFace in the name “Atharva07/hc3_finance”) and contextually rich information, I kept the same configurations (lora rank, alpha, training parameters, and even the same tokenising function). The final results where a bit decent than the previous model’s output.

You can perform the same procedures as we did on the last post on LLM to get the results I am taking about, with this dataset I mentioned and you’ll get decent outputs. But, this post is not just me telling you to fine-tune over different datasets again and again to represent a domain that follows a specific style of answers or how the data is designed.

I would like to take this upgrade a notch higher, the end model should dynamically adapt to different prompt sizes and make a logical sense out of the given corpus and then contextualise it.

So, where do we start? The first step I believe is to get a decent model with a good number of parameters downloaded in you system. The 2nd step lies in cleaning the data and building the tokenising function where we need to train the model to address the intricacies of the sentence and what are all the important keywords and connections between them to consider.

But before we get into the code and logics, I suggest you to switch to a computationally powerful machine to run heavy LLM’s. If you have a local GPU, you can leverage it by installing the specific drivers and libraries in you environment. Or else you can subscribe to services like premium version of google colab and use their environment to run powerful models and perform computationally heavy tasks.

onur binay z3MP5DDiEME unsplash

Setting up the GPU

The GPU demonstrates significantly enhanced computational capabilities in comparison to the CPU. Comparing them would be like contrasting a motorcycle with a jet aircraft. In essence, a GPU (Graphics Processing Unit) efficiently divides tasks into segments, enabling parallel computation to achieve faster and more effective results.

I have a Nvidia RTX GeForce 3050 Ti laptop GPU on windows 11 OS. From hindsight it feels very easy to get your GPU running, I’m not trying to scare you but it was really difficult to setup. After a few searches in google, and running some code from tensorflow, if you install the right libraries and print tensorflow.config.list_physical_devices in your IDE, then you might get an output like this:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] # GPU is detected

Does it mean that we can use the GPU for our program, not yet. One important aspect that we need to consider when we run these LLM’s from hugging face is that, most of the models under the transformer library uses the pytorch library. And it’s not as simple as pip install torch. Even if you try that the IDE will be just filled with error when you try “.to(‘cuda’)” when loading your model. I know how it feels, to walk thorough almost all the resources in the internet to get the above output itself, hours of going to documentation, stack overflow and asking help from chat-gpt.

So here is what I am going to do, I’ll link down an amazing youtube video which will save a lot to time in installing the right drivers and libraries for setting up the GPU and then I will tell you how to install the right libraries to get things running for our task.

Please be patient and follow downloading all the versions and application type Jeff uses, for example installing python version 3.9 in miniconda is a very important step to run the LLM successfully in Nvidia GPU. It won’t work on anaconda or directly calling any python kernel in your IDE as of now.

Once you have completed all the instructions in the above video, you can also use the environment in visual studio code instead of Jupyter notebook.

As mentioned earlier, pytorch is very important. There are a lot of versions out there, even if you navigate to the correct version there might be some inconsistencies when downloading. So the correct code to run is (make sure that your python version is between 3.8 to 3.11):

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

After installing it you have to install these libraries, I got this output by typing “pip list” in my miniconda prompt. (I am including the version of the libraries so that you don’t get any errors)

First run this command to get the latest version of pip:

run pip install --upgrade pip

Now install these libraries with the version mentioned (example: pip install package_name==version_number) except torch based libraries.

Package Version

accelerate 0.28.0
datasets 2.18.0
evaluate 0.4.1
keras 2.10.0
nltk 3.8.1
numpy (latest version)
peft 0.9.0
pickleshare 0.7.5
pip 24.0
sentencepiece 0.2.0
tensorflow 2.10.1
torch 2.2.1+cu118
torchaudio 2.2.1+cu118
torchvision 0.17.1+cu118
transformers 4.38.2
wordcloud (latest version)

I have kept only the main libraries that you need to install, you will get a bigger list once all the installation process is complete as the depended libraries will be installed.

Once all the installation is successfully done, and you are able to see …. when you the following code and it give this output then you are good to go.

import torch
print(torch.cuda.get_device_name(0))

#output: 'NVIDIA GeForce RTX 3050 T1 Laptop GPU' # you'll get your GPU's name

But if any error comes (it won’t mostly if you follow the exact instructions until this point), do not give up. It took me a week to get to the video and install everything properly, I am not saying that it will take a week for you, it could take more than that ……..😅 just kidding. If you get any error keep trying new alternatives and look out for solutions in stack overflow, chat-gpt and other online programming communities. Or you can put your doubts in the comment section once you try all the solutions.

Before we set to load and train the model, there is one important step we need to perform on our data and that is EDA (exploratory data analysis). It’s actually quite a big process that handles data, it consists of multiple sub routines. But here we’ll focus on visualising the data and chopping off the unnecessary data that could degrade the performance of the model.

samu lopez T6u10VL2kjo unsplash

Cleaning the Data

from datasets import load_dataset
dataset = load_dataset("gbharti/finance-alpaca")

dataset = dataset.remove_columns(['input', 'text'])
dataset = dataset.rename_column('output', 'input').rename_column('instruction', 'output')

WhatsApp Image 2024 04 20 at 11.50.59 AM

This is the histogram of the inputs and outputs. Here each point in the x-axis is the number of words in each sentence. Here is the python code to plot it =>

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style("darkgrid")

# Calculate average lengths of answers
avg_op = [len(answer.split()) for answer in dataset['train']['output']]

# Calculate average lengths of questions
avg_ip = [len(question.split()) for question in dataset['train']['input']]

# Create subplots for side-by-side histograms
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Plot histogram for questions
sns.histplot(avg_ip, bins=20, kde=True, ax=axes[0], color='blue')
axes[0].set_title('Average Length of Inputs before cleaning')
axes[0].set_xlabel('Length')
axes[0].set_ylabel('Frequency')

# Plot histogram for answers
sns.histplot(avg_op, bins=20, kde=True, ax=axes[1], color='green')
axes[1].set_title('Average Length of Outputs before cleaning')
axes[1].set_xlabel('Length')
axes[1].set_ylabel('Frequency')

# Show plot
plt.tight_layout()
plt.show()

As you can see, there are a huge number of outliers in the data, such as only a few examples of inputs having length greater than 500 and few outputs greater than 20 words. So we need to filter them out, in order to have a standard max_length padding in our tokenising function for our model to learn properly(evenly from all examples). If we don’t remove this then we have to set the padding length as 2500 for input and 80 for output, which would have a very adverse effect in the quality of output and takes took long to process in tokenising function(your kernel could even crash).

# Removing the unwanted data

train_df_filtered = DatasetDict()
train_df_filtered['train'] = dataset['train'].filter(lambda example:  15 <= len(example['input'].split()) <= 200 and 5 <= len(example['output'].split()) <= 17)

# Function to clean sentences
def clean_sentences(example):
    example['input'] = example['input'].replace('\n', '').replace(',', ', ').strip()
    example['output'] = example['output'].replace('\n', '').replace(',', ', ').strip()
    return example

# Clean train split
cleaned_train = train_df_filtered['train'].map(clean_sentences)

# Updated dataset
updated_dataset_dict = DatasetDict({'train': cleaned_train})
train_df_filtered = cleaned_train

Run the same code for the graph above with train_df_filtered to get the graph of cleaned data.

WhatsApp Image 2024 04 02 at 6.44.08 PM

Now that we have reduced the size of inputs and outputs to a limited value, the learning process during our training becomes very efficient. So during the process of tokenising all the rows in our cleaned dataset, the inputs will be padded to length of 200 and outputs to 17. It basically means that if input has 130 words, then extra 70 null tokens will be appended to the converted tensor for the trainer to train uniformly.

tim mossholder fSSD9w Z1mM unsplash

Crashing your GPU 💥

Your GPU can crash when you run highly intensive tokenising functions or train with high batch size.

So before setting up the model and training, you need to know about how to force stop your GPU activity if the memory is fully utilised to restart again. One is you can restart the miniconda terminal, or you can open the command prompt in your computer by typing cmd in the search bar.

Once the prompt is opened, type to view all the processes that are runnning:

nvidia-smi

copy the PID(process id of python and visual studio code(if you have changed settings in Visual studio code to use GPU for graphics) and terminate each process separately using:

taskkill /pid <your process id> /f

f stands for force-quit.

Now that you have installed all the necessaries, let’s load the model and try out a simple example to see how it performs on the given data.

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "google/flan-t5-xl"

# b) Load model & tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to('cuda')
tokenizer = AutoTokenizer.from_pretrained(model_name)

I have given the datatype as bfloat16 just on an account of quantisation(reducing precision) to save memory on my GPU for further processing and training.

The task now is to summarise this given baseline content.

baseline_context = """The market is currently navigating through a phase of uncertainty.
Despite a persistent dominance by growth trades, overall economic growth expectations remain subdued.
The forthcoming economic data, particularly jobless claims and nonfarm payroll figures, are anticipated to be critical in shaping market sentiments."""

Code to run the model for summarisation ->

prompt = f"""summarise this content:

{baseline_context}

\n Answer:
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

original_model_outputs = model.generate(input_ids=input_ids)
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)


print(f'ORIGINAL MODEL:\n{original_model_text_output}')

This is the output produced->

ORIGINAL MODEL:
The market is currently navigating through a phase of uncertainty. Despite a persistent domin

As you can see, the result is very bad and it needs fine-tuning to get it work for our task of contextualisation. Check out my post on 4 simple steps to setup LLM on your machine and make it work for your use case!!! to fine-tune the LLM loaded on GPU and the data cleaned to achieve good results.

That’s is folks, thank you for reading all along. It’s really great that you have been able to get all the configurations correct and load the model finally without crashing your GPU. I’m really eager to hear your thoughts, questions and your experience of leveraging the GPU. Drop your comments below, and d subscribe to Sapiencespace. Stay curious, stay engaged, and unlock a world of continuous insights by enabling notifications.

_{Title image and cover picture credits – unsplash content creators}

Click here to view similar insights.

What’s your Reaction?

Like

3

Like

Insightful

4

Insightful

Helpful

8

Helpful

Amazing

7

Amazing

Clap

9

Clap

Hi-fi

7

Hi-fi

Leave a Reply Cancel reply

Recently Posted

Data Science & Programming

Can page-based indexing save Compute, Memory and Time in RAG(Retrieval Augmented Generation)? A comparative study in medical field

Share

Subscribe To Newsletter