Mastering LlamaIndex : Create, Save & Load Indexes, Customize LLMs, Prompts & Embeddings

Mastering LlamaIndex : Create, Save & Load Indexes, Customize LLMs, Prompts & Embeddings


4 min read

If you're venturing into the domain of natural language processing, you're likely to come across an abundance of tools and libraries designed to assist you in understanding and generating human-like text. One such toolkit is LlamaIndex, a robust indexing tool that facilitates connecting Language Learning Models (LLM) with your external data. In this blog post, we'll explore LlamaIndex in-depth, discussing how to create and query an index, save and load an index, and customize the LLM, prompt, and embedding.


Before we begin, ensure that you have installed the necessary Python packages. We use LlamaIndex, PyPDF to handle PDF files, and Sentence Transformers to create embeddings. You can install these packages by running the following command:

!pip install llama-index pypdf sentence_transformers -q

Next, you will need an OpenAI API key to access their GPT-3 models. Ensure to replace the empty string with your OpenAI key:

import os
import openai
openai.api_key = ""  # Replace with your OpenAI API key
os.environ["OPENAI_API_KEY"] = ""  # Replace with your OpenAI API key

Creating and Querying Index

With the prerequisites out of the way, let's dive into LlamaIndex. First, we'll create an index using a document set and then query it. In this example, we assume that we have a directory called 'book' containing our documents.

from llama_index import VectorStoreIndex, SimpleDirectoryReader

# Load documents from a directory
documents = SimpleDirectoryReader('book').load_data()

# Create an index from the documents
index = VectorStoreIndex.from_documents(documents)

# Create a query engine from the index
query_engine = index.as_query_engine()

# Query the engine
response = query_engine.query("What is this text about?")

The VectorStoreIndex.from_documents() function takes our loaded documents and creates an index. We then create a query engine from this index using the as_query_engine() function. The query engine allows us to ask questions about our indexed documents and get responses based on the content of the documents.

Saving and Loading Index

LlamaIndex allows you to save an index for later use. This is particularly helpful when dealing with large document sets where creating an index can take considerable time. Let's see how to save and load an index:

# Persist index to disk

from llama_index import StorageContext, load_index_from_storage

# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="naval_index")

# Load index from the storage context
new_index = load_index_from_storage(storage_context)

new_query_engine = new_index.as_query_engine()
response = new_query_engine.query("who is this text about?")

Here, we've saved our index to a directory called "naval_index". Later, we can rebuild our storage context and load the index from it.

Customizing LLM's

One of the powerful features of LlamaIndex is the ability to customize the underlying LLM. In this example, we'll use LangChain's ChatOpenAI model and customize its prediction.

from llama_index import LLMPredictor, ServiceContext
from langchain.chat_models import ChatOpenAI

# Create a predictor using a custom model
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))

# Create a service context with the custom predictor
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# Create an index using the service context
custom_llm_index = VectorStoreIndex.from_documents(
    documents, service_context=service_context

custom_llm_query_engine = custom_llm_index.as_query_engine()
response = custom_llm_query_engine.query("who is this text about?")

The LLMPredictor allows us to utilize different language models and change their parameters.

Custom Prompt

By creating a custom prompt, we can provide more structured questions and responses. This allows us to guide the language model to give more specific answers.

from llama_index import Prompt

# Define a custom prompt
template = (
    "We have provided context information below. \n"
    "Given this information, please answer the question and each answer should start with code word AI Demos: {query_str}\n"
qa_template = Prompt(template)

# Use the custom prompt when querying
query_engine = custom_llm_index.as_query_engine(text_qa_template=qa_template)
response = query_engine.query("who is this text about?")

This provides a more structured conversation with the LLM, which can be helpful in certain use cases.

Custom Embedding

LlamaIndex also allows us to customize the embeddings used in our index. This can be helpful if you want to use a specific embedding model or if the default embeddings do not provide satisfactory results.

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

# Load in a specific embedding model
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2'))

# Create a service context with the custom embedding model
service_context = ServiceContext.from_defaults(embed_model=embed_model)

# Create an index using the service context
new_index = VectorStoreIndex.from_documents(

query_engine = new_index.as_query_engine()
response = query_engine.query("list 5 important points from this book")

We've used the sentence-transformers/all-MiniLM-L6-v2 embedding model, but you could use any model that suits your requirements.

And that's a wrap! We've explored various functionalities of the LlamaIndex toolkit, and I hope it helps you in building and customizing your search engine.

If you'd like to see these steps in action, we've also created a YouTube video tutorial explaining the entire process. Feel free to check it out and don't hesitate to ask if you have any queries. Happy coding!

Full code: