Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock

In this post, we present a centralized Model Context Protocol (MCP) server implementation using Amazon Bedrock that provides shared access to tools and resources for enterprise AI workloads. The solution enables organizations to accelerate AI innovation by standardizing access to resources and tools through MCP, while maintaining security and governance through a centralized approach.

Jat AI

Jul 1, 2025 - 20:00

Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock

Generative AI has been moving at a rapid pace, with new tools, offerings, and models released frequently. According to Gartner, agentic AI is one of the top technology trends of 2025, and organizations are performing prototypes on how to use agents in their enterprise environment. Agents depend on tools, and each tool might have its own mechanism to send and receive information. Model Context Protocol (MCP) by Anthropic is an open source protocol that attempts to solve this challenge. It provides a protocol and communication standard that is cross-compatible with different tools, and can be used by an agentic application’s large language model (LLM) to connect to enterprise APIs or external tools using a standard mechanism. However, large enterprise organizations like financial services tend to have complex data governance and operating models, which makes it challenging to implement agents working with MCP.

One major challenge is the siloed approach in which individual teams build their own tools, leading to duplication of efforts and wasted resources. This approach slows down innovation and creates inconsistencies in integrations and enterprise design. Furthermore, managing multiple disconnected MCP tools across teams makes it difficult to scale AI initiatives effectively. These inefficiencies hinder enterprises from fully taking advantage of generative AI for tasks like post-trade processing, customer service automation, and regulatory compliance.

In this post, we present a centralized MCP server implementation using Amazon Bedrock that offers an innovative approach by providing shared access to tools and resources. With this approach, teams can focus on building AI capabilities rather than spending time developing or maintaining tools. By standardizing access to resources and tools through MCP, organizations can accelerate the development of AI agents, so teams can reach production faster. Additionally, a centralized approach provides consistency and standardization and reduces operational overhead, because the tools are managed by a dedicated team rather than across individual teams. It also enables centralized governance that enforces controlled access to MCP servers, which reduces the risk of data exfiltration and prevents unauthorized or insecure tool use across the organization.

Solution overview

The following figure illustrates a proposed solution based on a financial services use case that uses MCP servers across multiple lines of business (LoBs), such as compliance, trading, operations, and risk management. Each LoB performs distinct functions tailored to their specific business. For instance, the trading LoB focuses on trade execution, whereas the risk LoB performs risk limit checks. For performing these functions, each division provides a set of MCP servers that facilitate actions and access to relevant data within their LoBs. These servers are accessible to agents developed within the respective LoBs and can also be exposed to agents outside LoBs.

The development of MCP servers is decentralized. Each LoB is responsible for developing the servers that support their specific functions. When the development of a server is complete, it’s hosted centrally and accessible across LoBs. It takes the form of a registry or marketplace that facilitates integration of AI-driven solutions across divisions while maintaining control and governance over shared resources.

In the following sections, we explore what the solution looks like on a conceptual level.

Agentic application interaction with a central MCP server hub

The following flow diagram showcases how an agentic application built using Amazon Bedrock interacts with one of the MCP servers located in the MCP server hub.

The flow consists of the following steps:

The application connects to the central MCP hub through the load balancer and requests a list of available tools from the specific MCP server. This can be fine-grained based on what servers the agentic application has access to.
The trade server responds with list of tools available, including details such as tool name, description, and required input parameters.
The agentic application invokes an Amazon Bedrock agent and provides the list of tools available.
Using this information, the agent determines what to do next based on the given task and the list of tools available to it.
The agent chooses the most suitable tool and responds with the tool name and input parameters. The control comes back to the agentic application.
The agentic application calls for the execution of the tool through the MCP server using the tool name and input parameters.
The trade MCP server executes the tool and returns the results of the execution back to the application.
The application returns the results of the tool execution back to the Amazon Bedrock agent.
The agent observes the tool execution results and determines the next step.

Let’s dive into the technical architecture of the solution.

Architecture overview

The following diagram illustrates the architecture to host the centralized cluster of MCP servers for an LoB.

The architecture can be split in five sections:

MCP server discovery API
Agentic applications
Central MCP server hub
Tools and resources

Let’s explore each section in detail:

MCP server discovery API – This API is a dedicated endpoint for discovering various MCP servers. Different teams can call this API to find what MCP servers are available in the registry; read their description, tool, and resource details; and decide which MCP server would be the right one for their agentic application. When a new MCP server is published, it’s added to an Amazon DynamoDB database. MCP server owners are responsible for keeping the registry information up-to-date.
Agentic application – The agentic applications are hosted on AWS Fargate for Amazon Elastic Container Service (Amazon ECS) and built using Amazon Bedrock Agents. Teams can also use the newly released open source AWS Strands Agents SDK, or other agentic frameworks of choice, to build the agentic application and their own containerized solution to host the agentic application. The agentic applications access Amazon Bedrock through a secure private virtual private cloud (VPC) endpoint. It uses private VPC endpoints to access MCP servers.
Central MCP server hub – This is where the MCP servers are hosted. Access to servers is enabled through an AWS Network Load Balancer. Technically, each server is a Docker container that can is hosted on Amazon ECS, but you can choose your own container deployment solution. These servers can scale individually without impacting the other server. These servers in turn connect to one or more tools using private VPC endpoints.
Tools and resources – This component holds the tools, such as databases, another application, Amazon Simple Storage Service (Amazon S3), or other tools. For enterprises, access to the tools and resources is provided only through private VPC endpoints.

Benefits of the solution

The solution offers the following key benefits:

Scalability and resilience – Because you’re using Amazon ECS on Fargate, you get scalability out of the box without managing infrastructure and handling scaling concerns. Amazon ECS automatically detects and recovers from failures by restarting failed MCP server tasks locally or reprovisioning containers, minimizing downtime. It can also redirect traffic away from unhealthy Availability Zones and rebalance tasks across healthy Availability Zones to provide uninterrupted access to the server.
Security – Access to MCP servers is secured at the network level through network controls such as PrivateLink. This makes sure the agentic application only connects to trusted MCP servers hosted by the organization, and vice versa. Each Fargate workload runs in an isolated environment. This prevents resource sharing between tasks. For application authentication and authorization, we propose using an MCP Auth Server (refer to the following GitHub repo) to hand off those tasks to a dedicated component that can scale independently.

At the time of writing, the MCP protocol doesn’t provide built-in mechanisms for user-level access control or authorization. Organizations requiring user-specific access restrictions must implement additional security layers on top of the MCP protocol. For a reference implementation, refer to the following GitHub repo.

Let’s dive deeper in the implementation of this solution.

Use case

The implementation is based on a financial services use case featuring post-trade execution. Post-trade execution refers to the processes and steps that take place after an equity buy/sell order has been placed by a customer. It involves many steps, including verifying trade details, actual transfer of assets, providing a detailed report of the execution, running fraudulent checks, and more. For simplification of the demo, we focus on the order execution step.

Although this use case is tailored to the financial industry, you can apply the architecture and the approach to other enterprise workloads as well. The entire code of this implementation is available on GitHub. We use the AWS Cloud Development Kit (AWS CDK) for Python to deploy this solution, which creates an agentic application connected to tools through the MCP server. It also creates a Streamlit UI to interact with the agentic application.

The following code snippet provides access to the MCP discovery API:

def get_server_registry():
    # Initialize DynamoDB client
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table(DDBTBL_MCP_SERVER_REGISTRY)
    
    try:
        # Scan the table to get all items
        response = table.scan()
        items = response.get('Items', [])
        
        # Format the items to include only id, description, server
        formatted_items = []
        for item in items:
            formatted_item = {
                'id': item.get('id', ''),
                'description': item.get('description', ''),
                'server': item.get('server', ''),
            }
            formatted_items.append(formatted_item)
        
        # Return the formatted items as JSON
        return {
            'statusCode': 200,
            'headers': cors_headers,
            'body': json.dumps(formatted_items)
        }
    except Exception as e:
        # Handle any errors
        return {
            'statusCode': 500,
            'headers': cors_headers,
            'body': json.dumps({'error': str(e)})
        }

The preceding code is invoked through an AWS Lambda function. The complete code is available in the GitHub repository. The following graphic shows the response of the discovery API.

Let’s explore a scenario where the user submits a question: “Buy 100 shares of AMZN at USD 186, to be distributed equally between accounts A31 and B12.”To execute this task, the agentic application invokes the trade-execution MCP server. The following code is the sample implementation of the MCP server for trade execution:

from fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import PlainTextResponse
mcp = FastMCP("server")

@mcp.custom_route("/", methods=["GET"])
async def health_check(request: Request) -> PlainTextResponse:
    return PlainTextResponse("OK")

@mcp.tool()
async def executeTrade(ticker, quantity, price):
    """
    Execute a trade for the given ticker, quantity, and price.
    
    Sample input:
    {
        "ticker": "AMZN",
        "quantity": 1000,
        "price": 150.25
    }
    """
    # Simulate trade execution
    return {
        "tradeId": "T12345",
        "status": "Executed",
        "timestamp": "2025-04-09T22:58:00"
    }
    
@mcp.tool()
async def sendTradeDetails(tradeId):
    """
    Send trade details for the given tradeId.
    Sample input:
    {
        "tradeId": "T12345"
    }
    """
    return {
        "status": "Details Sent",
        "recipientSystem": "MiddleOffice",
        "timestamp": "2025-04-09T22:59:00"
    }
if __name__ == "__main__":
    mcp.run(host="0.0.0.0", transport="streamable-http")

The complete code is available in the following GitHub repo.

The following graphic shows the MCP server execution in action.

This is a sample implementation of the use case focusing on the deployment step. For a production scenario, we strongly recommend adding a human oversight workflow to monitor the execution and provide input at various steps of the trade execution.

Now you’re ready to deploy this solution.

Prerequisites

Prerequisites for the solution are available in the README.md of the GitHub repository.

Deploy the application

Complete the following steps to run this solution:

Navigate to the README.md file of the GitHub repository to find the instructions to deploy the solution. Follow these steps to complete deployment.

The successful deployment will exit with a message similar to the one shown in the following screenshot.

When the deployment is complete, access the Streamlit application.

You can find the Streamlit URL in the terminal output, similar to the following screenshot.

Enter the URL of the Streamlit application in a browser to open the application console.

On the application console, different sets of MCP servers are listed in the left pane under MCP Server Registry. Each set corresponds to an MCP server and includes the definition of the tools, such as the name, description, and input parameters.

In the right pane, Agentic App, a request is pre-populated: “Buy 100 shares of AMZN at USD 186, to be distributed equally between accounts A31 and B12.” This request is ready to be submitted to the agent for execution.

Choose Submit to invoke an Amazon Bedrock agent to process the request.

The agentic application will evaluate the request together with the list of tools it has access to, and iterate through a series of tools execution and evaluation to fulfil the request.You can view the trace output to see the tools that the agent used. For each tool used, you can see the values of the input parameters, followed by the corresponding results. In this case, the agent operated as follows:

The agent first used the function executeTrade with input parameters of ticker=AMZN, quantity=100, and price=186
After the trade was executed, used the allocateTrade tool to allocate the trade position between two portfolio accounts

Clean up

You will incur charges when you consume the services used in this solution. Instructions to clean up the resources are available in the README.md of the GitHub repository.

Summary

This solution offers a straightforward and enterprise-ready approach to implement MCP servers on AWS. With this centralized operating model, teams can focus on building their applications rather than maintaining the MCP servers. As enterprises continue to embrace agentic workflows, centralized MCP servers offer a practical solution for overcoming operational silos and inefficiencies. With the AWS scalable infrastructure and advanced tools like Amazon Bedrock Agents and Amazon ECS, enterprises can accelerate their journey toward smarter workflows and better customer outcomes.

Check out the GitHub repository to replicate the solution in your own AWS environment.

To learn more about how to run MCP servers on AWS, refer to the following resources:

About the authors

Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan dedicates most of his free time to his family, where he lovingly takes direction from his two young daughters, aged one and four. You can find Xan on LinkedIn: https://www.linkedin.com/in/xanhuang/

Vikesh Pandey is a Principal GenAI/ML Specialist Solutions Architect at AWS helping large financial institutions adopt and scale generative AI and ML workloads. He is the author of book “Generative AI for financial services.” He carries more than decade of experience building enterprise-grade applications on generative AI/ML and related technologies. In his spare time, he plays an unnamed sport with his son that lies somewhere between football and rugby.

Tags:

No-code personal agents, powered by GPT-4.1 and Realtime API

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.