Introduction

Hi! In this (my first ever 😅) blog post I’m going to show you how to run Cabuxa-7B. I recommend you take a look at the working notes (see references below). But, if you want a very quick intro, let me tell you that Cabuxa-7B is an LLaMA-7B LoRA-instruct-tuned model for Galician that can answer instructions in the Alpaca format. Galician is my native language and, like many other low-resource languages in the world, it is underrepresented in the current (impressive) landscape of AI technologies. So, this is a humble attempt to contribute to the inclusion of all linguistic communities in the development of Large Language Models (LLMs).

Setting Up the Environment

Before jumping into the demonstration, ensure you have the required dependencies installed by executing the following command:

pip install transformers==4.34.0 && pip install peft==0.6.0 && pip install sentencepiece==0.1.99

Now, let’s import the necessary components:

import torch

from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, LlamaTokenizer, GenerationConfig

Loading the Model and Tokenizer

Low-Rank Adaptation (LoRA) is a technique that can be used to improve the efficiency of fine-tuning LLMs by using only a subset of the model weights, resulting in a significant reduction in memory requirements. In this way, Cabuxa-7B works as an adapter on top of LLaMA (its base model).

About 7 * 4 = 28GB of GPU memory are required to run the 7B model in full precision. In order to use half the memory and fit the model on a T4 (see link to Google Colab below), we set torch_dtype=torch.float16.

config = PeftConfig.from_pretrained("irlab-udc/cabuxa-7b")
model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="cuda", torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, "irlab-udc/cabuxa-7b")
tokenizer = LlamaTokenizer.from_pretrained("huggyllama/llama-7b")

Prompt Generation and Evaluation

We define two functions: generate_prompt for creating prompts with or without additional input and evaluate for generating responses. The evaluate function takes an instruction and optional input, generates a prompt, and prints the model’s response.

def generate_prompt(instruction, input=None):
    if input:
        return f"""Abaixo está unha instrución que describe unha tarefa, xunto cunha entrada que proporciona máis contexto. 
               Escribe unha resposta que responda adecuadamente a entrada.
               ### Instrución:
               {instruction}
               ### Entrada:
               {input}
               ### Resposta:"""
    else:
        return f"""Abaixo está unha instrución que describe unha tarefa.
               Escribe unha resposta que responda adecuadamente a entrada.
               ### Instrución:
               {instruction}
               ### Resposta:"""
def evaluate(instruction, input=None):
    prompt = generate_prompt(instruction, input)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=GenerationConfig(do_sample=True),
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256,
    )
    for s in generation_output.sequences:
        output = tokenizer.decode(s)
        print("Resposta:", output.split("### Resposta:")[1].strip())

Example Usages

And finally, let’s see Cabuxa-7B in action with a couple of examples. Outputs are not perfect, but there exists room for improvement as instruct-tuning data was very limited.

evaluate("Cal é a fórmula química da auga?")
>>> Resposta: A fórmula química da auga é H2O.
evaluate(
    "Convence ao lector por que é importante un determinado tema.",
    "Por que é esencial priorizar o sono?",
)
>>> Resposta: Comezar co dormir corrixidamente asegura unha boa almacenaxe de enerxía, mellora o proceso de aprendizaxe e aumenta a lonxitude de sangra. Desgraciadamente, a maioría das persoas non prioriza o seu sono e, consecuentemente, eles non disfrutan o maior beneficio potencial. Sería importante xestionar un haupaxeo de tiempo para dormir para aumentar a xestion de enerxía e a eficiencia, aforrar tempo para aprender e disfrutar do son, así como aforrar eficiencia física e mental. Devolverá resultados positivos ao teu organismo e, en consecuencia, ao teu benestar!

Try It Yourself!

For an interactive experience, you can access the Google Colab notebook here. Remember to set GPU as hardware accelerator on runtime options.

Feel free to explore and experiment with Cabuxa-7B, and don’t hesitate to provide feedback or suggestions for improvement. See you!