Develop a RAG-based application using Amazon Aurora with Amazon Kendra

RAG retrieves data from a preexisting knowledge base (your data), combines it with the LLM’s knowledge, and generates responses with more human-like language. However, in order for generative AI to understand your data, some amount of data preparation is required, which involves a big learning curve. In this post, we walk you through how to convert your existing Aurora data into an index without needing data preparation for Amazon Kendra to perform data search and implement RAG that combines your data along with LLM knowledge to produce accurate responses.

Jan 28, 2025 - 18:00
Develop a RAG-based application using Amazon Aurora with Amazon Kendra

Generative AI and large language models (LLMs) are revolutionizing organizations across diverse sectors to enhance customer experience, which traditionally would take years to make progress. Every organization has data stored in data stores, either on premises or in cloud providers.

You can embrace generative AI and enhance customer experience by converting your existing data into an index on which generative AI can search. When you ask a question to an open source LLM, you get publicly available information as a response. Although this is helpful, generative AI can help you understand your data along with additional context from LLMs. This is achieved through Retrieval Augmented Generation (RAG).

RAG retrieves data from a preexisting knowledge base (your data), combines it with the LLM’s knowledge, and generates responses with more human-like language. However, in order for generative AI to understand your data, some amount of data preparation is required, which involves a big learning curve.

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud. Aurora combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.

In this post, we walk you through how to convert your existing Aurora data into an index without needing data preparation for Amazon Kendra to perform data search and implement RAG that combines your data along with LLM knowledge to produce accurate responses.

Solution overview

In this solution, use your existing data as a data source (Aurora), create an intelligent search service by connecting and syncing your data source to Amazon Kendra search, and perform generative AI data search, which uses RAG to produce accurate responses by combining your data along with the LLM’s knowledge. For this post, we use Anthropic’s Claude on Amazon Bedrock as our LLM.

The following are the high-level steps for the solution:

The following diagram illustrates the solution architecture.

ML-16454_solution_architecture.jpg

Prerequisites

To follow this post, the following prerequisites are required:

Create an Aurora PostgreSQL cluster

Run the following AWS CLI commands to create an Aurora PostgreSQL Serverless v2 cluster:

aws rds create-db-cluster \
--engine aurora-postgresql \
--engine-version 15.4 \
--db-cluster-identifier genai-kendra-ragdb \
--master-username postgres \
--master-user-password XXXXX \
--db-subnet-group-name dbsubnet \
--vpc-security-group-ids "sg-XXXXX" \
--serverless-v2-scaling-configuration "MinCapacity=2,MaxCapacity=64" \
--enable-http-endpoint \
--region us-east-2

aws rds create-db-instance \
--db-cluster-identifier genai-kendra-ragdb \
--db-instance-identifier genai-kendra-ragdb-instance \
--db-instance-class db.serverless \
--engine aurora-postgresql

The following screenshot shows the created instance.

ML-16454-Aurora_instance

Ingest data to Aurora PostgreSQL-Compatible

Connect to the Aurora instance using the pgAdmin tool. Refer to Connecting to a DB instance running the PostgreSQL database engine for more information. To ingest your data, complete the following steps:

  1. Run the following PostgreSQL statements in pgAdmin to create the database, schema, and table:
    CREATE DATABASE genai;
    CREATE SCHEMA 'employees';
    
    CREATE DATABASE genai;
    SET SCHEMA 'employees';
    
    CREATE TABLE employees.amazon_review(
    pk int GENERATED ALWAYS AS IDENTITY NOT NULL,
    id varchar(50) NOT NULL,
    name varchar(300) NULL,
    asins Text NULL,
    brand Text NULL,
    categories Text NULL,
    keys Text NULL,
    manufacturer Text NULL,
    reviews_date Text NULL,
    reviews_dateAdded Text NULL,
    reviews_dateSeen Text NULL,
    reviews_didPurchase Text NULL,
    reviews_doRecommend varchar(100) NULL,
    reviews_id varchar(150) NULL,
    reviews_numHelpful varchar(150) NULL,
    reviews_rating varchar(150) NULL,
    reviews_sourceURLs Text NULL,
    reviews_text Text NULL,
    reviews_title Text NULL,
    reviews_userCity varchar(100) NULL,
    reviews_userProvince varchar(100) NULL,
    reviews_username Text NULL,
    PRIMARY KEY
    (
    pk
    )
    ) ;
  2. In your pgAdmin Aurora PostgreSQL connection, navigate to Databases, genai, Schemas, employees, Tables.
  3. Choose (right-click) Tables and choose PSQL Tool to open a PSQL client connection.
    ML-16454_psql_tool
  4. Place the csv file under your pgAdmin location and run the following command:
    \copy employees.amazon_review (id, name, asins, brand, categories, keys, manufacturer, reviews_date, reviews_dateadded, reviews_dateseen, reviews_didpurchase, reviews_dorecommend, reviews_id, reviews_numhelpful, reviews_rating, reviews_sour
    ceurls, reviews_text, reviews_title, reviews_usercity, reviews_userprovince, reviews_username) FROM 'C:\Program Files\pgAdmin 4\runtime\amazon_review.csv' DELIMITER ',' CSV HEADER ENCODING 'utf8';

  5. Run the following PSQL query to verify the number of records copied:
    Select count (*) from employees.amazon_review;

Create an Amazon Kendra index

The Amazon Kendra index holds the contents of your documents and is structured in a way to make the documents searchable. It has three index types:

  • Generative AI Enterprise Edition index – Offers the highest accuracy for the Retrieve API operation and for RAG use cases (recommended)
  • Enterprise Edition index – Provides semantic search capabilities and offers a high-availability service that is suitable for production workloads
  • Developer Edition index – Provides semantic search capabilities for you to test your use cases

To create an Amazon Kendra index, complete the following steps:

  1. On the Amazon Kendra console, choose Indexes in the navigation pane.
  2. Choose Create an index.
  3. On the Specify index details page, provide the following information:
    • For Index name, enter a name (for example, genai-kendra-index).
    • For IAM role, choose Create a new role (Recommended).
    • For Role name, enter an IAM role name (for example, genai-kendra). Your role name will be prefixed with AmazonKendra-- (for example, AmazonKendra-us-east-2-genai-kendra).
  4. Choose Next.
    ML-16454-specify-index-details
  5. On the Add additional capacity page, select Developer edition (for this demo) and choose Next.
    ML-16454-additional-capacity
  6. On the Configure user access control page, provide the following information:
    • Under Access control settings¸ select No.
    • Under User-group expansion, select None.
  7. Choose Next.
    ML-16454-configure-user-access-control
  8. On the Review and create page, verify the details and choose Create.
    ML-16454-review-and-create

It might take some time for the index to create. Check the list of indexes to watch the progress of creating your index. When the status of the index is ACTIVE, your index is ready to use.
ML-16454-genai-kendra-index

Set up the Amazon Kendra Aurora PostgreSQL connector

Complete the following steps to set up your data source connector:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Choose Add data source.
  3. Choose Aurora PostgreSQL connector as the data source type.
    ML-16454-postgresql-connector
  4. On the Specify data source details page, provide the following information:
    • For Data source name, enter a name (for example, data_source_genai_kendra_postgresql).
    • For Default language¸ choose English (en).
    • Choose Next.
  5. On the Define access and security page, under Source, provide the following information:
    • For Host, enter the host name of the PostgreSQL instance (cvgupdj47zsh.us-east-2.rds.amazonaws.com).
    • For Port, enter the port number of the PostgreSQL instance (5432).
    • For Instance, enter the database name of the PostgreSQL instance (genai).
  6. Under Authentication, if you already have credentials stored in AWS Secrets Manager, choose it on the dropdown Otherwise, choose Create and add new secret.
  7. In the Create an AWS Secrets Manager secret pop-up window, provide the following information:
    • For Secret name, enter a name (for example, AmazonKendra-Aurora-PostgreSQL-genai-kendra-secret).
    • For Data base user name, enter the name of your database user.
    • For Password¸ enter the user password.
  8. Choose Add Secret.
    ML-16454-create-aws-secrets-manager
  9. Under Configure VPC and security group, provide the following information:
    • For Virtual Private Cloud, choose your virtual private cloud (VPC).
    • For Subnet, choose your subnet.
    • For VPC security groups, choose the VPC security group to allow access to your data source.
  10. Under IAM role¸ if you have an existing role, choose it on the dropdown menu. Otherwise, choose Create a new role.
    ML-16454-create_a_new_IAM_role
  11. On the Configure sync settings page, under Sync scope, provide the following information:
    • For SQL query, enter the SQL query and column values as follows: select * from employees.amazon_review.
    • For Primary key, enter the primary key column (pk).
    • For Title, enter the title column that provides the name of the document title within your database table (reviews_title).
    • For Body, enter the body column on which your Amazon Kendra search will happen (reviews_text).
  12. Under Sync node, select Full sync to convert the entire table data into a searchable index.

After the sync completes successfully, your Amazon Kendra index will contain the data from the specified Aurora PostgreSQL table. You can then use this index for intelligent search and RAG applications.

  1. Under Sync run schedule, choose Run on demand.
  2. Choose Next.
  3. On the Set field mappings page, leave the default settings and choose Next.
  4. Review your settings and choose Add data source.

Your data source will appear on the Data sources page after the data source has been created successfully.

ML-16454-data-source-creation-success

Invoke the RAG application

The Amazon Kendra index sync can take minutes to hours depending on the volume of your data. When the sync completes without error, you are ready to develop your RAG solution in your preferred IDE. Complete the following steps:

  1. Configure your AWS credentials to allow Boto3 to interact with AWS services. You can do this by setting the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables or by using the ~/.aws/credentials file:
    import boto3
      pip install langchain
    
    # Create a Boto3 session
    
    session = boto3.Session(
       aws_access_key_id='YOUR_AWS_ACCESS_KEY_ID',
       aws_secret_access_key='YOUR_AWS_SECRET_ACCESS_KEY',
       region_name='YOUR_AWS_REGION'
    )
  2. Import LangChain and the necessary components:
    from langchain_community.llms import Bedrock
    from langchain_community.retrievers import AmazonKendraRetriever
    from langchain.chains import RetrievalQA
  3. Create an instance of the LLM (Anthropic’s Claude):
    llm = Bedrock(
    region_name = "bedrock_region_name",
    model_kwargs = {
    "max_tokens_to_sample":300,
    "temperature":1,
    "top_k":250,
    "top_p":0.999,
    "anthropic_version":"bedrock-2023-05-31"
    },
    model_id = "anthropic.claude-v2"
    )
  4. Create your prompt template, which provides instructions for the LLM:
    from langchain_core.prompts import PromptTemplate
    
    prompt_template = """
    You are a Product Review Specialist, and you provide detail product review insights.
    You have access to the product reviews in the  XML tags below and nothing else.
    
    
    {context}
    
    
    
    {question}
    
    """
    
    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
  5. Initialize the KendraRetriever with your Amazon Kendra index ID by replacing the Kendra_index_id that you created earlier and the Amazon Kendra client:
    session = boto3.Session(region_name='Kendra_region_name')
    kendra_client = session.client('kendra')
    # Create an instance of AmazonKendraRetriever
    kendra_retriever = AmazonKendraRetriever(
    kendra_client=kendra_client,
    index_id="Kendra_Index_ID"
    )
  6. Combine Anthropic’s Claude and the Amazon Kendra retriever into a RetrievalQA chain:
    qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=kendra_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
    )
  7. Invoke the chain with your own query:
    query = "What are some products that has bad quality reviews, summarize the reviews"
    result_ = qa.invoke(
    query
    )
    result_

    ML-16454-RAG-output

Clean up

To avoid incurring future charges, delete the resources you created as part of this post:

  1. Delete the Aurora DB cluster and DB instance.
  2. Delete the Amazon Kendra index.

Conclusion

In this post, we discussed how to convert your existing Aurora data into an Amazon Kendra index and implement a RAG-based solution for the data search. This solution drastically reduces the data preparation need for Amazon Kendra search. It also increases the speed of generative AI application development by reducing the learning curve behind data preparation.

Try out the solution, and if you have any comments or questions, leave them in the comments section.


About the Authors

Aravind Hariharaputran is a Data Consultant with the Professional Services team at Amazon Web Services. He is passionate about Data and AIML in general with extensive experience managing Database technologies .He helps customers transform legacy database and applications to Modern data platforms and generative AI applications. He enjoys spending time with family and playing cricket.

Ivan Cui is a Data Science Lead with AWS Professional Services, where he helps customers build and deploy solutions using ML and generative AI on AWS. He has worked with customers across diverse industries, including software, finance, pharmaceutical, healthcare, IoT, and entertainment and media. In his free time, he enjoys reading, spending time with his family, and traveling.

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.