Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities
In this post, we explore how Amazon Bedrock's multimodal RAG capabilities revolutionize drug data analysis by efficiently processing complex medical documentation containing text, images, graphs, and tables.

In the pharmaceutical industry, biotechnology and healthcare companies face an unprecedented challenge for efficiently managing and analyzing vast amounts of drug-related data from diverse sources. Traditional data analysis methods prove inadequate for processing complex medical documentation that includes a mix of text, images, graphs, and tables. Amazon Bedrock offers features like multimodal retrieval, advanced chunking capabilities, and citations to help organizations get high-accuracy responses.
Pharmaceutical and healthcare organizations process a vast number of complex document formats and unstructured data that pose analytical challenges. Clinical study documents and research papers related to them typically present an intricate blend of technical text, detailed tables, and sophisticated statistical graphs, making automated data extraction particularly challenging. Clinical study documents present additional challenges through non-standardized formatting and varied data presentation styles across multiple research institutions. This post showcases a solution to extract data-driven insights from complex research documents through a sample application with high-accuracy responses. It analyzes clinical trial data, patient outcomes, molecular diagrams, and safety reports from the research documents. It can help pharmaceutical companies accelerate their research process. The solution provides citations from the source documents, reducing hallucinations and enhancing the accuracy of the responses.
Solution overview
The sample application uses Amazon Bedrock to create an intelligent AI assistant that analyzes and summarizes research documents containing text, graphs, and unstructured data. Amazon Bedrock is a fully managed service that offers a choice of industry-leading foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI.
To equip FMs with up-to-date and proprietary information, organizations use Retrieval Augmented Generation (RAG), a technique that fetches data from company data sources and enriches the prompt to provide relevant and accurate responses.
Amazon Bedrock Knowledge Bases is a fully managed RAG capability within Amazon Bedrock with in-built session context management and source attribution that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows.
Amazon Bedrock Knowledge Bases introduces powerful document parsing capabilities, including Amazon Bedrock Data Automation powered parsing and FM parsing, revolutionizing how we handle complex documents. Amazon Bedrock Data Automation is a fully managed service that processes multimodal data effectively, without the need to provide additional prompting. The FM option parses multimodal data using an FM. This parser provides the option to customize the default prompt used for data extraction. This advanced feature goes beyond basic text extraction by intelligently breaking down documents into distinct components, including text, tables, images, and metadata, while preserving document structure and context. When working with supported formats like PDF, specialized FMs interpret and extract tabular data, charts, and complex document layouts. Additionally, the service provides advanced chunking strategies like semantic chunking, which intelligently divides text into meaningful segments based on semantic similarity calculated by the embedding model. Unlike traditional syntactic chunking methods, this approach preserves the context and meaning of the content, improving the quality and relevance of information retrieval.
The solution architecture implements these capabilities through a seamless workflow that begins with administrators securely uploading knowledge base documents to an Amazon Simple Storage Service (Amazon S3) bucket. These documents are then ingested into Amazon Bedrock Knowledge Bases, where a large language model (LLM) processes and parses the ingested data. The solution employs semantic chunking to store document embeddings efficiently in Amazon OpenSearch Service for optimized retrieval. The solution features a user-friendly interface built with Streamlit, providing an intuitive chat experience for end-users. When users interact with the Streamlit application, it triggers AWS Lambda functions that handle the requests, retrieving relevant context from the knowledge base and generating appropriate responses. The architecture is secured through AWS Identity and Access Management (IAM), maintaining proper access control throughout the workflow. Amazon Bedrock uses AWS Key Management Service (AWS KMS) to encrypt resources related to your knowledge bases. By default, Amazon Bedrock encrypts this data using an AWS managed key. Optionally, you can encrypt the model artifacts using a customer managed key. This end-to-end solution provides efficient document processing, context-aware information retrieval, and secure user interactions, delivering accurate and comprehensive responses through a seamless chat interface.
The following diagram illustrates the solution architecture.
This solution uses the following additional services and features:
- The Anthropic Claude 3 family offers Opus, Sonnet, and Haiku models that accept text, image, and video inputs and generate text output. They provide a broad selection of capability, accuracy, speed, and cost operation points. These models understand complex research documents that include charts, graphs, tables, diagrams, and reports.
- AWS Lambda is a serverless computing service that empowers you to run code without provisioning or managing servers cost effectively.
- Amazon S3 is a highly scalable, durable, and secure object storage service.
- Amazon OpenSearch Service is a fully managed search and analytics engine for efficient document retrieval. The OpenSearch Service vector database capabilities enable semantic search, RAG with LLMs, recommendation engines, and search rich media.
- Streamlit is a faster way to build and share data applications using interactive web-based data applications in pure Python.
Prerequisites
The following prerequisites are needed to proceed with this solution. For this post, we use the us-east-1 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.
- An AWS account with an IAM user who has permissions to Lambda, Amazon Bedrock, Amazon S3, and IAM.
- Access to Anthropic’s Claude model in Amazon Bedrock. For instructions, see Access Amazon Bedrock foundation models.
Deploy the solution
Refer to the GitHub repository for the deployment steps listed under the deployment guide section. We use an AWS CloudFormation template to deploy solution resources, including S3 buckets to store the source data and knowledge base data.
Test the sample application
Imagine you are a member of an R&D department for a biotechnology firm, and your job requires you to derive insights from drug- and vaccine-related information from diverse sources like research studies, drug specifications, and industry papers. You are performing research on cancer vaccines and want to gain insights based on cancer research publications. You can upload the documents given in the reference section to the S3 bucket and sync the knowledge base. Let’s explore example interactions that demonstrate the application’s capabilities. The responses generated by the AI assistant are based on the documents uploaded to the S3 bucket connected with the knowledge base. Due to non-deterministic nature of machine learning (ML), your responses might be slightly different from the ones presented in this post.
Understanding historical context
We use the following query: “Create a timeline of major developments in mRNA vaccine technology for cancer treatment based on the information provided in the historical background sections.”The assistant analyzes multiple documents and presents a chronological progression of mRNA vaccine development, including key milestones based on the chunks of information retrieved from the OpenSearch Service vector database.
The following screenshot shows the AI assistant’s response.
Complex data analysis
We use the following query: “Synthesize the information from the text, figures, and tables to provide a comprehensive overview of the current state and future prospects of therapeutic cancer vaccines.”
The AI assistant is able to provide insights from complex data types, which is enabled by FM parsing, while ingesting the data to OpenSearch Service. It is also able to provide images in the source attribution using the multimodal data capabilities of Amazon Bedrock Knowledge Bases.
The following screenshot shows the AI assistant’s response.
The following screenshot shows the visuals provided in the citations when the mouse hovers over the question mark icon.
Comparative analysis
We use the following query: “Compare the efficacy and safety profiles of MAGE-A3 and NY-ESO-1 based vaccines as described in the text and any relevant tables or figures.”
The AI assistant used the semantically similar chunks returned from the OpenSearch Service vector database and added this context to the user’s question, which enabled the FM to provide a relevant answer.
The following screenshot shows the AI assistant’s response.
Technical deep dive
We use the following query: “Summarize the potential advantages of mRNA vaccines over DNA vaccines for targeting tumor angiogenesis, as described in the review.”
With the semantic chunking feature of the knowledge base, the AI assistant was able to get the relevant context from the OpenSearch Service database with higher accuracy.
The following screenshot shows the AI assistant’s response.
The following screenshot shows the diagram that was used for the answer as one of the citations.
The sample application demonstrates the following:
- Accurate interpretation of complex scientific diagrams
- Precise extraction of data from tables and graphs
- Context-aware responses that maintain scientific accuracy
- Source attribution for provided information
- Ability to synthesize information across multiple documents
This application can help you quickly analyze vast amounts of complex scientific literature, extracting meaningful insights from diverse data types while maintaining accuracy and providing proper attribution to source materials. This is enabled by the advanced features of the knowledge bases, including FM parsing, which aides in interpreting complex scientific diagrams and extraction of data from tables and graphs, semantic chunking, which aides with high-accuracy context-aware responses, and multimodal data capabilities, which aides in providing relevant images as source attribution.
These are some of the many new features added to Amazon Bedrock, empowering you to generate high-accuracy results depending on your use case. To learn more, see New Amazon Bedrock capabilities enhance data processing and retrieval.
Production readiness
The proposed solution accelerates the time to value of the project development process. Solutions built on the AWS Cloud benefit from inherent scalability while maintaining robust security and privacy controls.
The security and privacy framework includes fine-grained user access controls using IAM for both OpenSearch Service and Amazon Bedrock services. In addition, Amazon Bedrock enhances security by providing encryption at rest and in transit, and private networking options using virtual private cloud (VPC) endpoints. Data protection is achieved using KMS keys, and API calls and usage are tracked through Amazon CloudWatch logs and metrics. For specific compliance validation for Amazon Bedrock, see Compliance validation for Amazon Bedrock.
For additional details on moving RAG applications to production, refer to From concept to reality: Navigating the Journey of RAG from proof of concept to production.
Clean up
Complete the following steps to clean up your resources.
- Empty the
SourceS3Bucket
andKnowledgeBaseS3BucketName
buckets. - Delete the main CloudFormation stack.
Conclusion
This post demonstrated the powerful multimodal document analysis (text, graphs, images) using advanced parsing and chunking features of Amazon Bedrock Knowledge Bases. By combining the powerful capabilities of Amazon Bedrock FMs, OpenSearch Service, and intelligent chunking strategies through Amazon Bedrock Knowledge Bases, organizations can transform their complex research documents into searchable, actionable insights. The integration of semantic chunking makes sure that document context and relationships are preserved, and the user-friendly Streamlit interface makes the system accessible to end-users through an intuitive chat experience. This solution not only streamlines the process of analyzing research documents, but also demonstrates the practical application of AI/ML technologies in enhancing knowledge discovery and information retrieval. As organizations continue to grapple with increasing volumes of complex documents, this scalable and intelligent system provides a robust framework for extracting maximum value from their document repositories.
Although our demonstration focused on the healthcare industry, the versatility of this technology extends beyond a single industry. RAG on Amazon Bedrock has proven its value across diverse sectors. Notable adopters include global brands like Adidas in retail, Empolis in information management, Fractal Analytics in AI solutions, Georgia Pacific in manufacturing, and Nasdaq in financial services. These examples illustrate the broad applicability and transformative potential of RAG technology across various business domains, highlighting its ability to drive innovation and efficiency in multiple industries.
Refer to the GitHub repo for the agentic RAG application, including samples and components for building agentic RAG solutions. Be on the lookout for additional features and samples in the repository in the coming months.
To learn more about Amazon Bedrock Knowledge Bases, check out the RAG workshop using Amazon Bedrock. Get started with Amazon Bedrock Knowledge Bases, and let us know your thoughts in the comments section.
References
The following are sample research documents available with an open access distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license https://creativecommons.org/licenses/by/4.0/:
- Vaccine Therapies for Cancer: Then and Now
- Therapeutic cancer vaccines: advancements, challenges and prospects
- Cancer Vaccines: Adjuvant Potency, Importance of Age, Lifestyle, and Treatments
- Recent advances in mRNA cancer vaccines: meeting challenges and embracing opportunities
- Nucleic acid cancer vaccines targeting tumor related angiogenesis. Could mRNA vaccines constitute a game changer?
- Recent Findings on Therapeutic Cancer Vaccines: An Updated Review
- Cancer Vaccines: Antigen Selection Strategy
About the authors
Vivek Mittal is a Solution Architect at Amazon Web Services, where he helps organizations architect and implement cutting-edge cloud solutions. With a deep passion for Generative AI, Machine Learning, and Serverless technologies, he specializes in helping customers harness these innovations to drive business transformation. He finds particular satisfaction in collaborating with customers to turn their ambitious technological visions into reality.
Shamika Ariyawansa, serving as a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences division at Amazon Web Services (AWS), has a keen focus on Generative AI. He assists customers in integrating Generative AI into their projects, emphasizing the importance of explainability within their AI-driven initiatives. Beyond his professional commitments, Shamika passionately pursues skiing and off-roading adventures.
Shaik Abdulla is a Sr. Solutions Architect, specializes in architecting enterprise-scale cloud solutions with focus on Analytics, Generative AI and emerging technologies. His technical expertise is validated by his achievement of all 12 AWS certifications and the prestigious Golden jacket recognition. He has a passion to architect and implement innovative cloud solutions that drive business transformation. He speaks at major industry events like AWS re:Invent and regional AWS Summits, where he shares insights on cloud architecture and emerging technologies.