Accenture scales video analysis with Amazon Nova and Amazon Bedrock Agents

This post was written with Ilan Geller, Kamal Mannar, Debasmita Ghosh, and Nakul Aggarwal of Accenture. Video highlights offer a powerful way to boost audience engagement and extend content value for content publishers. These short, high-impact clips capture key moments that drive viewer retention, amplify reach across social media, reinforce brand identity, and open new […]

Jul 17, 2025 - 00:00
Accenture scales video analysis with Amazon Nova and Amazon Bedrock Agents

This post was written with Ilan Geller, Kamal Mannar, Debasmita Ghosh, and Nakul Aggarwal of Accenture.

Video highlights offer a powerful way to boost audience engagement and extend content value for content publishers. These short, high-impact clips capture key moments that drive viewer retention, amplify reach across social media, reinforce brand identity, and open new avenues for monetization. However, traditional highlight creation workflows are slow and labor-intensive. Editors must manually review footage, identify significant moments, cut clips, and add transitions or narration—followed by manual quality checks and formatting for distribution. Although this provides editorial control, it creates bottlenecks that don’t scale efficiently.

This post showcases how Accenture Spotlight delivers a scalable, cost-effective video highlight generation solution using Amazon Nova and Amazon Bedrock Agents. Amazon Nova foundation models (FMs) deliver frontier intelligence and industry-leading price-performance. With Spotlight, content owners can configure AI models and agents to support diverse use cases across the media industry while offering a human-in-the-loop option for quality assurance and collaborative refinement. This maintains accuracy, editorial oversight, and alignment with brand guidelines—without compromising on speed or scalability.

Real-world use cases

Spotlight has been applied across a range of industry scenarios, including:

  • Personalized short-form video generation – Spotlight’s specialized agents analyze popular short-form content (such as video reels and other social media) to identify patterns of high-performing content. The agents then apply this understanding to long-form video to generate personalized short clips, with built-in checks for brand alignment and content standards.
  • Sports editing and highlights – Spotlight automates creation of video highlights for sports like soccer, Formula 1, and rugby, tailoring them to specific user preferences and interests. It also validates each highlight’s quality and accuracy, streamlining editorial workflows as a result.
  • Content matching for stakeholders – Using enriched metadata, Spotlight matches archived or live video content to audience demographics, optimizing distribution strategies and maximizing advertiser value through precise targeting.
  • Real-time retail offer generation – In retail environments such as gas stations, Spotlight processes live CCTV footage to infer customer profiles using data (such as vehicle type or transaction history), and then dynamically generates personalized product offers. These offers consider contextual factors such as time of day and weather; and they are delivered with custom visuals in near real time.

Spotlight’s architecture

Spotlight’s architecture addresses the challenge of scalable video processing, efficiently analyzing and generating content while maintaining speed and quality. It incorporates both task-specific models and Amazon Nova FMs that are orchestrated by specialized Amazon Bedrock agents. Key architectural highlights include:

  • Task-driven model selection – Spotlight dynamically selects between traditional AI models and Amazon Nova FMs based on a given task’s complexity and latency requirements. This intelligent orchestration enables fast inference for time-sensitive operations while deploying deeper multimodal reasoning where sophisticated analysis is needed—balancing speed and intelligence across applications from real-time retail offers to complex video processing.
  • Agent orchestration – Specialized agents, each purpose-built for specific analysis tasks, operate across the end-to-end workflow under the direction of a central orchestrator agent. The orchestrator agent manages task breakdown, data flow, and inter-agent communication.
  • Scalable and adaptable – By using AWS capabilities, Spotlight’s architecture is configurable to support different workloads—from high-throughput video highlight generation to low-latency offer personalization at the edge.

Spotlight uses a multi-layered agent workflow to automate video processing and generation while maintaining quality control. For example, to generate dynamic video highlights, Spotlight uses three specialized “super agents” that work in coordination under a central orchestrator agent’s supervision. Each super agent is powered by Amazon Nova models, and is supported by a collection of utility agents (see the following diagram). These agents work together to understand video content, generate high-quality highlights, and maintain alignment with user requirements and brand standards.

The workflow consists of the following super agents and utility agents:

  • Video processing agent – This agent analyzes long-form video and generates detailed metadata to guide short-form video creation. It uses the following utility agents:
    • Research agent – Analyzes popular short-form videos to identify key components that create video virality, and creates recipes for successful short-form content. For example, in music videos, it can highlight choreographed dance sequences with the lead performer as essential segments and a recipe based on this insight.
    • Visual analysis agent – Applies the research agent’s findings to new long-form content. It identifies matching segments, tags key individuals, and timestamps relevant moments. It uses traditional AI models (such as person recognition and tracking) to capture fine-grained details for segment identification.
    • Audio analysis agent – Performs speech diarization and transcription to support both the research and visual analysis agents with deeper context from the video’s audio track.
  • Short video generation agent – This agent orchestrates the actual creation of the short-form video by integrating relevant segments and refining the sequence. Its utility agents include:
    • Section of interest (SOI) agent – Identifies potential segments based on video genre, target length, featured performers, and JSON metadata from the visual analysis agent. This agent prioritizes logical flow and viewer engagement.
    • Video generation agent – Constructs video using segment recommendations and component patterns from the video processing agent. For example, influencer videos might follow a structure of an attention-grabbing hook, key messages, and a call to action. The process will be iteratively improved based on feedback from the reviewer agent.
    • Video postprocessing agent – Refines the final output for publishing by performing tasks like cropping to mobile-friendly aspect ratios, or adding subtitles, background music, and brand overlays.
  • Reviewer agent – This agent works iteratively with the generation agent to maintain video quality and relevance. Its utility agents include:
    • Relevance check agent – Evaluates alignment with user-defined content guidelines, audience expectations, and desired themes.
    • Abruptness check agent – Provides smooth transitions between segments to avoid jarring cuts, enhancing viewer experience and professionalism.

See Spotlight in action:

Solution overview

To interact with Spotlight, users access a frontend UI where they provide natural language input to specify their objective. Spotlight then employs its agentic workflow powered by Amazon Nova to achieve its given task. The following diagram illustrates the solution architecture for video highlight generation.

The workflow consists of the following key components (as numbered in the preceding diagram):

  1. Frontend UI for user interaction:
    1. Users interact through a web portal secured by Amazon Cognito authentication and delivered using Amazon CloudFront.
    2. Amazon API Gateway serves a restful endpoint for video processing and highlight generation services.
  2. Live video stream processing:
    1. AWS Elemental MediaLive processes incoming video stream and triggers AWS Lambda to initiate workflows. (Spotlight also accepts video archive content as media files for processing and highlight generation.)
  3. Video processing workflow orchestrated with AWS Step Functions:
    1. Open source models hosted on Amazon SageMaker enable speech analysis and computer vision for person and object detection.
    2. The video processing agent powered by Amazon Nova Pro analyzes video and generates fine-grained metadata (for example, identifying patterns from viral videos).
    3. The reviewer agent powered by Amazon Nova Premier maintains alignment with brand standards.
    4. Open source utility tooling is used for pre-analysis tasks.
  4. Highlight generation workflow orchestrated with Step Functions:
    1. Amazon Nova Pro analyzes the user query for clips of interest to understand intent, and reformulates the query for downstream processing.
    2. The short video generation agent powered by Amazon Nova Pro constructs a video highlight using segment recommendations.
    3. The reviewer agent powered by Amazon Nova Premier makes sure the constructed highlight aligns with quality, brand, and contextual expectations.
    4. AWS Elemental Media Convert and open source tooling enable video highlight construction and postprocessing (such as subtitle layover, aspect ratio change, and transitions).
  5. Storage and monitoring:
    1. Amazon Simple Storage Service (Amazon S3) stores metadata extracted from processing workflows, reference content (such as scripts and brand guidelines), and generated outputs.
    2. Amazon CloudWatch maintains end-to-end system health and monitors performance.

Key benefits

Spotlight’s approach to video processing and generation creates dynamic value. Additionally, its technical design using Amazon Nova and an integrated agentic workflow helps content owners realize gains in their video processing and editorial operations. Key benefits for Spotlight include:

  • Cross-industry application – Spotlight’s modular design allows it to be applied seamlessly across industries—from media and entertainment to retail
  • Real-time processing – It supports both live stream feeds and pre-recorded video, with custom highlight generation happening in minutes, reducing from hours or days
  • Cost-efficient deployment – It is entirely serverless and on-demand, minimizing idle infrastructure costs and maximizing utilization
  • Efficiency – Accenture’s review of costs using Amazon Nova models showed that Amazon Nova-powered agents deliver over 10 times better cost savings over traditional highlight creation methods

The following table provides is a comparative analysis of Spotlight’s video processing approach to conventional approaches for video highlight creation.

Metric Spotlight Performance Conventional Approach
Video Processing Latency Minutes for 2–3-hour sessions Hours to days
Highlight Review Cost (3–5 minutes) 10 times lower with Amazon Nova High cost using conventional approaches
Overall Highlight Generation Cost 10 times lower using serverless and on-demand LLM deployment Manual workflows with high operational overhead
Deployment Architecture Fully serverless with scalable LLM invocation Typically resource-heavy and statically provisioned
Use Case Flexibility Sports, media editing, retail personalization, and more Often tailored to a single use case

Conclusion

Spotlight represents a cutting-edge agentic solution designed to tackle complex media processing and customer personalization challenges using generative AI. With modular, multi-agent workflows built on Amazon Nova, Spotlight seamlessly enables dynamic short-form video generation. The solution’s core framework is also extensible to diverse industry use cases that require multimodal content analysis at scale.

As an AWS Premier Tier Services Partner and Managed Services Provider (MSP), Accenture brings deep cloud and industry expertise. Accenture and AWS have worked together for more than a decade to help organizations realize value from their applications and data. Accenture brings its industry understanding and generative AI specialists to build and adapt generative AI solutions to client needs. Together with AWS, through the Accenture AWS Business Group (AABG), we help enterprises unlock business value by rapidly scaling generative AI solutions tailored to their needs—driving innovation and transformation in the cloud.

Try out Spotlight for your own use case, and share your feedback in the comments.


About the authors

Ilan Geller is a Managing Director in the Data and AI practice at Accenture. He is the Global AWS Partner Lead for Data and AI and the Center for Advanced AI. His roles at Accenture have primarily been focused on the design, development, and delivery of complex data, AI/ML, and most recently Generative AI solutions.

Dr. Kamal Mannar is a Global Computer Vision Lead at Accenture’s Center for Advanced AI, with over 20 years of experience applying AI across industries like agriculture, healthcare, energy, and telecom. He has led large-scale AI transformations, built scalable GenAI and computer vision solutions, and holds 10+ patents in areas including deep learning, wearable AI, and vision transformers. Previously, he headed AI at Vulcan AI, driving cutting-edge innovation in precision agriculture. Kamal holds a Ph.D. in Industrial & Systems Engineering from the University of Wisconsin–Madison.

Debasmita Ghosh is working as Associate Director in Accenture with 21 years of experience in Information Technology (8 years in AI/Gen AI capability), who currently among multiple responsibilities leads Computer Vision practice in India. She has presented her paper on Handwritten Text Recognition in multiple conferences including MCPR 2020, GHCI 2020. She has patent granted on Handwritten Text Recognition solution and received recognition from Accenture under the Accenture Inventor Award Program being named as an inventor on a granted patent. She has multiple papers on Computer Visions solutions like Table Extraction including non-uniform and borderless tables accepted and presented in the ComPE 2021 and CCVPR 2021 international conferences. She has managed projects across multiple technologies (Oracle Apps, SAP). As a programmer, she has worked during various phases of SDLC with experience on Oracle Apps Development across CRM, Procurement, Receivables, SCM, SAP Professional Services, SAP CRM. Debasmita holds M.Sc. in Statistics from Calcutta University.

Nakul Aggarwal is a Subject Matter Expert in Computer Vision and Generative AI at Accenture, with around 7 years of experience in developing and delivering cutting-edge solutions across computer vision, multimodal AI, and agentic systems. He holds a Master’s degree from the Indian Institute of Technology (IIT) Delhi and has authored several research papers presented at international conferences. He holds two patents in AI and currently leads multiple projects focused on multimodal and agentic AI. Beyond technical delivery, he plays a key role in mentoring teams and driving innovation by bridging advanced research with real-world enterprise applications.

Aramide Kehinde is Global Partner Solutions Architect for Amazon Nova at AWS. She works with high growth companies to build and deliver forward thinking technology solutions using AWS Generative AI. Her experience spans multiple industries, including Media & Entertainment, Financial Services, and Healthcare. Aramide enjoys building the intersection of AI and creative arenas and spending time with her family.

Rajdeep Banerjee is a Senior Partner Solutions Architect at AWS helping strategic partners and clients in the AWS cloud migration and digital transformation journey. Rajdeep focuses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. He is a member of Serverless technical field community. Rajdeep is based out of Richmond, Virginia.

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.