From beginner to champion: A student’s journey through the AWS AI League ASEAN finals

The AWS AI League, launched by Amazon Web Services (AWS), expanded its reach to the Association of Southeast Asian Nations (ASEAN) last year, welcoming student participants from Singapore, Indonesia, Malaysia, Thailand, Vietnam, and the Philippines. In this blog post, you’ll hear directly from the AWS AI League champion, Blix D. Foryasen, as he shares his reflection on the challenges, breakthroughs, and key lessons discovered throughout the competition.

Jan 16, 2026 - 16:00
From beginner to champion: A student’s journey through the AWS AI League ASEAN finals

The AWS AI League, launched by Amazon Web Services (AWS), expanded its reach to the Association of Southeast Asian Nations (ASEAN) last year, welcoming student participants from Singapore, Indonesia, Malaysia, Thailand, Vietnam, and the Philippines. The goal was to introduce students of all backgrounds and experience levels to the exciting world of generative AI through a gamified, hands-on challenge focused on fine-tuning large language models (LLMs).

In this blog post, you’ll hear directly from the AWS AI League champion, Blix D. Foryasen, as he shares his reflection on the challenges, breakthroughs, and key lessons discovered throughout the competition.

Behind the competition

The AWS AI League competition began with a tutorial session led by the AWS team and the Gen-C Generative AI Learning Community, featuring two powerful user-friendly services: Amazon SageMaker JumpStart and PartyRock.

  • SageMaker JumpStart enabled participants to run the LLM fine-tuning process in a cloud-based environment, offering flexibility to adjust hyperparameters and optimize performance.
  • PartyRock, powered by Amazon Bedrock, provided an intuitive playground and interface to curate the dataset used in fine-tuning a Llama 3.2 3B Instruct model. Amazon Bedrock offers a comprehensive selection of high-performing foundation models from leading AI companies, including Anthropic Claude, Meta Llama, Mistral, and more; all accessible through a single API.

With the goal of outperforming a larger LLM reference model in a quiz-based evaluation, participants engaged with three core domains of generative AI: Foundation models, responsible AI, and prompt engineering. The preliminary round featured an open leaderboard ranking the best-performing fine-tuned models from across the region. Each submitted model was tested against a larger baseline LLM using an automated, quiz-style evaluation of generative AI-related questions. The evaluation, conducted by an undisclosed LLM judge, prioritized both accuracy and comprehensiveness. A model’s win rate improved each time it outperformed the baseline LLM. The challenge required strategic planning beyond its technical nature. Participants had to maximize their limited training hours on SageMaker JumpStart while carefully managing a restricted number of leaderboard submissions. Initially capped at 5 hours, the limit was later expanded to 30 hours in response to community feedback. Submission count would also influence tiebreakers for finalist selection.

The top tuner from each country advanced to the Regional Grand Finale, held on May 29, 2025, in Singapore. There, finalists competed head-to-head, each presenting their fine-tuned model’s responses to a new set of questions. Final scores were determined by a weighted judging system:

  • 40% by an LLM-as-a-judge,
  • 40% by experts
  • 20% by a live audience.

A pragmatic approach to fine-tuning

Before diving into the technical details, a quick disclaimer: the approaches shared in the following sections are largely experimental and born from trial and error. They’re not necessarily the most optimal methods for fine-tuning, nor do they represent a definitive guide. Other finalists had different approaches because of different technical backgrounds. What ultimately helped me succeed wasn’t just technical precision, but collaboration, resourcefulness, and a willingness to explore how the competition might unfold based on insights from previous iterations. I hope this account can serve as a baseline or inspiration for future participants who might be navigating similar constraints. Even if you’re starting from scratch, as I did, there’s real value in being strategic, curious, and community-driven. One of the biggest hurdles I faced was time, or the lack of it. Because of a late confirmation of my participation, I joined the competition 2 weeks after it had already begun. That left me with only 2 weeks to plan, train, and iterate. Given the tight timeline and limited compute hours on SageMaker JumpStart, I knew I had to make every training session count. Rather than attempting exhaustive experiments, I focused my efforts on curating a strong dataset and tweaking select hyperparameters. Along the way, I drew inspiration from academic papers and existing approaches in LLM fine-tuning, adjusting what I could within the constraints.

Crafting synthetic brilliance

As mentioned earlier, one of the key learning sessions at the start of the competition introduced participants to SageMaker JumpStart and PartyRock, tools that make fine-tuning and synthetic data generation both accessible and intuitive. In particular, PartyRock allowed us to clone and customize apps to control how synthetic datasets were generated. We could tweak parameters such as the prompt structure, creativity level (temperature), and token sampling strategy (top-p). PartyRock also gave us access to a wide range of foundation models. From the start, I opted to generate my datasets using Claude 3.5 Sonnet, aiming for broad and balanced coverage across all three core sub-domains of the competition. To minimize bias and implement fair representation across topics, I curated multiple dataset versions, each ranging from 1,500 to 12,000 Q&A pairs, carefully maintaining balanced distributions across sub-domains. The following are a few example themes that I focused on:

  • Prompt engineering: Zero-shot prompting, chain-of-thought (CoT) prompting, evaluating prompt effectiveness
  • Foundation models: Transformer architectures, distinctions between pretraining and fine-tuning
  • Responsible AI: Dataset bias, representation fairness, and data protection in AI systems

To maintain data quality, I fine-tuned the dataset generator to emphasize factual accuracy, uniqueness, and applied knowledge. Each generation batch consisted of 10 Q&A pairs, with prompts specifically designed to encourage depth and clarity

Question prompt:

You are a quiz master in an AI competition preparing a set of challenging quiz bee questions about [Topic to generate] The purpose of these questions is to determine the better LLM between a fine-tuned LLaMA 3.2 3B Instruct and larger LLMs. Generate [Number of data rows to generate] questions on [Topic to generate], covering: 
	* Basic Questions (1/3) → Direct Q&A without reasoning. Must require a clear explanation, example, or real-world application. Avoid one-word fact-based questions.
	* Hybrid Questions (1/3) → Requires a short analytical breakdown (e.g., comparisons, trade-offs, weaknesses, implications). Prioritize scenario-based or real-world dilemma questions.
	* Chain-of-thought (CoT) Questions (1/3) → Requires multi-step logical deductions. Focus on evaluating existing AI methods, identifying risks, and critiquing trade-offs. Avoid open-ended "Design/Propose/Create" questions. Instead, use "Compare, Evaluate, Critique, Assess, Analyze, What are the trade-offs of…" 

Ensure the questions on [Topic to generate]: 
	* Are specific, non-trivial, and informative.
	* Avoid overly simple questions (e.g., mere definitions or fact-based queries).
	* Encourage applied reasoning (i.e., linking theoretical concepts to real-world AI challenges).

Answer prompt:

You are an AI expert specializing in generative AI, foundation models, agentic AI, prompt engineering, and responsible AI. Your task is to generate well-structured, logically reasoned responses to a list of [Questions], ensuring that all responses follow a chain-of-thought (CoT) approach, regardless of complexity, and formatted in valid JSONL. Here are the answering guidelines: 
	* Every response must be comprehensive, factually accurate, and well-reasoned.
 	* Every response must use a step-by-step logical breakdown, even for seemingly direct questions.
For all questions, use structured reasoning:
	* For basic Questions, use a concise yet structured explanation. Simple Q&As should still follow CoT reasoning, explaining why the answer is correct rather than just stating facts.
	* For hybrid and CoT questions, use Chain of Thought and analyze the problem logically before providing a concluding statement.
	* If applicable, use real-world examples or research references to enhance explanations.
	* If applicable, include trade-offs between different AI techniques.
	* Draw logical connections between subtopics to reinforce deep understanding.

Answering prompt examples:


	* Basic question (direct Q&A without reasoning) → Use concise yet comprehensive, structured responses that provide a clear, well-explained, and well-structured definition and explanation without unnecessary verbosity.
	* Applications. Highlight key points step-by-step in a few comprehensive sentences.
	* Complex CoT question (multi-step reasoning) → Use CoT naturally, solving each step explicitly, with in-depth reasoning 

For question generation, I set the temperature to 0.7, favoring creative and novel phrasing without drifting too far from factual grounding. For answer generation, I used a lower temperature of 0.2, targeting precision and correctness. In both cases, I applied top-p = 0.9, allowing the model to sample from a focused yet diverse range of likely tokens, encouraging nuanced outputs. One important strategic assumption I made throughout the competition was that the evaluator LLM would prefer more structured, informative, and complete responses over overly creative or brief ones. To align with this, I included reasoning steps in my answers to make them longer and more comprehensive. Research has shown that LLM-based evaluators often score detailed, well-explained answers higher, and I leaned into that insight during dataset generation.

Refining the submissions

SageMaker JumpStart offers a wide array of hyperparameters to configure, which can feel overwhelming, especially when you’re racing against time and unsure of what to prioritize. Fortunately, the organizers emphasized focusing primarily on epochs and learning rate, so I honed in on those variables. Each training job with a single epoch took approximately 10–15 minutes, making time management critical. To avoid wasting valuable compute hours, I began with a baseline dataset of 1,500 rows to test combinations of epochs and learning rates. I explored:

  • Epochs: 1 to 4
  • Learning rates: 0.0001, 0.0002, 0.0003, and 0.0004

After multiple iterations, the combination of two epochs and a learning rate of 0.0003 yielded the best result, achieving a 53% win rate on my 13th leaderboard submission. Encouraged by this, I continued using this combination for several subsequent experiments, even as I expanded my dataset. Initially, this strategy appeared to work. With a dataset of approximately 3,500 rows, my model reached a 57% win rate by my 16th submission. However, as I further increased the dataset to 5,500, 6,700, 8,500, and eventually 12,000 rows, my win rate steadily declined to 53%, 51%, 45%, and 42% respectively. At that point, it was clear that only increasing dataset size wasn’t enough, in fact, it might have been counterproductive without revisiting the hyperparameters. With only 5 training hours remaining and 54 submissions logged, I found myself stuck at 57%, while peers like the top tuner from the Philippines were already reaching a 71% win rate.

Lessons from the field

With limited time left, both for training and leaderboard submissions, I turned to cross-country collaboration for support. One of the most insightful conversations I had was with Michael Ismail Febrian, the top tuner from Indonesia and the highest scorer in the elimination round. He encouraged me to explore LoRA (low-rank adaptation) hyperparameters, specifically:

  • lora_r
  • lora_alpha
  • target_modules

Michael also suggested enriching my dataset by using API-generated responses from more capable teacher models, especially for answering PartyRock-generated questions. Looking back at my existing fine-tuning pipeline, I realized a critical weakness: the generated answers were often too concise or shallow. Here’s an example of a typical Q&A pair from my earlier dataset:

Instruction: What factors influence model hallucination?

Response: Let's analyze this step by step:
	* Training factors:
		* Data quality and coverage
		* Training methodology
		* Model architecture
	* Deployment factors:
		* Prompt design
		* Temperature settings
		* Context window utilization
	* Mitigation strategies:
		*Retrieval augmentation
		* Fact-checking mechanisms
		* Confidence thresholds
		
Example: GPT-3 showing lower hallucination rates (15% vs 21%) when using retrieval-augmented generation compared to base model.

While this structure is clean and organized, it lacked deeper explanation for each point, something models like ChatGPT and Gemini typically do well. I suspect this limitation came from token constraints when generating multiple responses in bulk. In my case, I generated 10 responses at a time in JSONL format under a single prompt, which might have led PartyRock to truncate outputs. Not wanting to spend on paid APIs, I discovered OpenRouter.ai, which offers limited access to large models, albeit rate-limited. With a cap of roughly 200 Q&A pairs per day per account, I got creative—I created multiple accounts to support my expanded dataset. My teacher model of choice was DeepSeek R1, a popular option known for its effectiveness in training smaller, specialized models. It was a bit of a gamble, but one that paid off in terms of output quality.

As for LoRA tuning, here’s what I learned:

  • lora_r and lora_alpha determine how much and how complex new information the model can absorb. A common rule of thumb is setting lora_alpha to 1x or 2x of lora_r.
  • target_modules defines which parts of the model are updated, often the attention layers or the feed-forward network.

I also consulted Kim, the top tuner from Vietnam, who flagged my 0.0003 learning rate as potentially too high. He, along with Michael, suggested a different strategy: increase the number of epochs and reduce the learning rate. This would allow the model to better capture complex relationships and subtle patterns, especially as dataset size grows. Our conversations underscored a hard-learned truth: data quality is more important than data quantity. There’s a point of diminishing returns when increasing dataset size without adjusting hyperparameters or validating quality—something I directly experienced. In hindsight, I realized I had underestimated how vital fine-grained hyperparameter tuning is, especially when scaling data. More data demands more precise tuning to match the growing complexity of what the model needs to learn.

Last-minute gambits

Armed with fresh insights from my collaborators and hard-won lessons from previous iterations, I knew it was time to pivot my entire fine-tuning pipeline. The most significant change was in how I generated my dataset. Instead of using PartyRock to produce both questions and answers, I opted to generate only the questions in PartyRock, then feed those prompts into the DeepSeek-R1 API to generate high-quality responses. Each answer was saved in JSONL format, and, crucially, included detailed reasoning. This shift significantly increased the depth and length of each answer, averaging around 900 tokens per response, compared to the much shorter outputs from PartyRock. Given that my earlier dataset of approximately 1,500 high-quality rows produced promising results, I stuck with that size for my final dataset. Rather than scale up in quantity, I doubled down on quality and complexity. For this final round, I made bold, blind tweaks to my hyperparameters:

  • Dropped the learning rate to 0.00008
  • Increased the LoRA parameters:
    • lora_r = 256
    • lora_alpha = 256
  • Expanded LoRA target modules to cover both attention and feed-forward layers:
    q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

These changes were made with one assumption: longer, more complex answers require more capacity to absorb and generalize nuanced patterns. I hoped that these settings would enable the model to fully use the high-quality, reasoning-rich data from DeepSeek-R1.With only 5 hours of training time remaining, I had just enough for two full training runs, each using different epoch settings (3 and 4). It was a make-or-break moment. If the first run underperformed, I had one last chance to redeem it. Thankfully, my first test run achieved a 65% win rate, a massive improvement, but still behind the current leader from the Philippines and trailing Michael’s impressive 89%. Everything now hinged on my final training job. It had to run smoothly, avoid errors, and outperform everything I had tried before. And it did. That final submission achieved a 77% win rate, pushing me to the top of the leaderboard and securing my slot for the Grand Finale. After weeks of experimentation, sleepless nights, setbacks, and late-game adjustments, the journey, from a two-week-late entrant to national champion, was complete.

What I wish I had known sooner

I won’t pretend that my success in the elimination round was purely technical—luck played a big part. Still, the journey revealed several insights that could save future participants valuable time, training hours, and submissions. Here are some key takeaways I wish I had known from the start:

  • Quality is more important than quantity: More data doesn’t always mean better results. Whether you’re adding rows or increasing context length, you’re also increasing the complexity that the model must learn from. Focus on crafting high-quality, well-structured examples rather than blindly scaling up.
  • Fast learner compared to Slow learner: If you’re avoiding deep dives into LoRA or other advanced tweaks, understanding the trade-off between learning rate and epochs is essential. A higher learning rate with fewer epochs might converge faster, but could miss the subtle patterns captured by a lower learning rate over more epochs. Choose carefully based on your data’s complexity.
  • Don’t neglect hyperparameters: One of my biggest missteps was treating hyperparameters as static, regardless of changes in dataset size or complexity. As your data evolves, your model settings should too. Hyperparameters should scale with your data.
  • Do your homework: Avoid excessive guesswork by reading relevant research papers, documentation, or blog posts. Late in the competition, I stumbled upon helpful resources that I could have used to make better decisions earlier. A little reading can go a long way.
  • Track everything: When experimenting, it’s easy to forget what worked and what didn’t. Maintain a log of your datasets, hyperparameter combinations, and performance outcomes. This helps optimize your runs and aids in debugging.
  • Collaboration is a superpower: While it’s a competition, it’s also a chance to learn. Connecting with other participants, whether they’re ahead or behind, gave me invaluable insights. You might not always walk away with a trophy, but you’ll leave with knowledge, relationships, and real growth.

Grand Finale

The Grand Finale took place on the second day of the National AI Student Challenge, serving as the culmination of weeks of experimentation, strategy, and collaboration. Before the final showdown, all national champions had the opportunity to engage in the AI Student Developer Conference, where we shared insights, exchanged lessons, and built connections with fellow finalists from across the ASEAN region. During our conversations, I was struck by how remarkably similar many of our fine-tuning strategies were. Across the board, participants had used a mix of external APIs, dataset curation techniques, and cloud-based training systems like SageMaker JumpStart. It became clear that tool selection and creative problem-solving played just as big a role as raw technical knowledge. One particularly eye-opening insight came from a finalist who achieved an 85% win rate, despite using a large dataset—something I had initially assumed might hurt performance. Their secret was training over a higher number of epochs while maintaining a lower learning rate of 0.0001. However, this came at the cost of longer training times and fewer leaderboard submissions, which highlights an important trade-off:

With enough training time, a carefully tuned model, even one trained on a large dataset, can outperform faster, leaner models.

This reinforced a powerful lesson: there’s no single correct approach to fine-tuning LLMs. What matters most is how well your strategy aligns with the time, tools, and constraints at hand.

Preparing for battle

In the lead-up to the Grand Finale, I stumbled upon a blog post by Ray Goh, the very first champion of the AWS AI League and one of the mentors behind the competition’s tutorial sessions. One detail caught my attention: the final question from his year was a variation of the infamous Strawberry Problem, a deceptively simple challenge that exposes how LLMs struggle with character-level reasoning.

How many letter Es are there in the words ‘DeepRacer League’?

At first glance, this seems trivial. But to an LLM, the task isn’t as straightforward. Early LLMs often tokenize words in chunks, meaning that DeepRacer might be split into Deep and Racer or even into subword units like Dee, pRa, and cer. These tokens are then converted into numerical vectors, obscuring the individual characters within. It’s like asking someone to count the threads in a rope without unraveling it first.

Moreover, LLMs don’t operate like traditional rule-based programs. They’re probabilistic, trained to predict the next most likely token based on context, not to perform deterministic logic or arithmetic. Curious, I prompted my own fine-tuned model with the same question. As expected, hallucinations emerged. I began testing various prompting strategies to coax out the correct answer:

  • Explicit character separation:
    How many letter Es are there in the words ‘D-E-E-P-R-A-C-E-R-L-E-A-G-U-E’?
    This helped by isolating each letter into its own token, allowing the model to see individual characters. But the response was long and verbose, with the model listing and counting each letter step-by-step.
  • Chain-of-thought prompting:
    Let’s think step-by-step…
    This encouraged reasoning but increased token usage. While the answers were more thoughtful, they occasionally still missed the mark or got cut off because of length.
  • Ray Goh’s trick prompt:
    How many letter Es are there in the words ‘DeepRacer League’? There are 5 letter Es…
    This simple, assertive prompt yielded the most accurate and concise result, surprising me with its effectiveness.

I logged this as an interesting quirk, useful, but unlikely to reappear. I didn’t realize that it would become relevant again during the final. Ahead of the Grand Finale, we had a dry run to test our models under real-time conditions. We were given limited control over inference parameters, only allowed to tweak temperature, top-p, context length, and system prompts. Each response had to be generated and submitted within 60 seconds. The actual questions were pre-loaded, so our focus was on crafting effective prompt templates rather than retyping each query. Unlike the elimination round, evaluation during the Grand Finale followed a multi-tiered system:

  • 40% from an evaluator LLM
  • 40% from human judges
  • 20% from a live audience poll

The LLM ranked the submitted answers from best to worst, assigning descending point values (for example, 16.7 for first place, 13.3 for second, and so on). Human judges, however, could freely allocate up to 10 points to their preferred responses, regardless of the LLM’s evaluation. This meant a strong showing with the evaluator LLM didn’t guarantee high scores from the humans, and vice versa. Another constraint was the 200-token limit per response. Tokens could be as short as a single letter or as long as a word or syllable, so responses had to be dense yet concise, maximizing impact within a tight window. To prepare, I tested different prompt formats and fine-tuned them using Gemini, ChatGPT, and Claude to better match the evaluation criteria. I stored dry-run responses from the Hugging Face LLaMA 3.2 3B Instruct model, then passed them to Claude Sonnet 4 for feedback and ranking. I continued using the following two prompts because they provided the best response in terms of accuracy and comprehensiveness:

Primary prompt:

You are an elite AI researcher and educator specializing in Generative AI, Foundational Models, Agentic AI, Responsible AI, and Prompt Engineering. Your task is to generate a highly accurate, comprehensive, and well-structured response to the question below in no more than 200 words.

Evaluation will be performed by Claude Sonnet 4, which prioritizes:
	* Factual Accuracy – All claims must be correct and verifiable. Avoid speculation.
	* Comprehensiveness – Cover all essential dimensions, including interrelated concepts or mechanisms.
	* Clarity & Structure – Use concise, well-organized sections (e.g., brief intro, bullet points, and/or transitions). Markdown formatting (headings/lists) is optional.
	* Efficiency – Every sentence must deliver unique insight. Avoid filler.
	* Tone – Maintain a professional, neutral, and objective tone.
	
Your response should be dense with value while remaining readable and precise.

Backup prompt:

You are a competitive AI practitioner with deep expertise in [Insert domain: e.g., Agentic AI or Prompt Engineering], answering a technical question evaluated by Claude Sonnet 4 for accuracy and comprehensiveness. You must respond in exactly 200 words.

Format your answer as follows: 
	* Direct Answer (1–2 sentences) – Immediately state the core conclusion or definition.
	* Key Technical Points (3–4 bullet points) – Essential mechanisms, distinctions, or principles.
	* Practical Application (1–2 sentences) – Specific real-world use cases or design implications.
	* Critical Insight (1 sentence) – Mention a key challenge, trade-off, or future direction.

Additional requirements:

  • Use precise technical language and terminology.
  • Include specific tools, frameworks, or metrics if relevant.
  • Every sentence must contribute uniquely—no redundancy.
  • Maintain a formal tone and answer density without over-compression.

In terms of hyperparameters, I used:

  • Top-p = 0.9
  • Max tokens = 200
  • Temperature = 0.2, to prioritize accuracy over creativity

My strategy was simple: appeal to the AI judge. I believed that if my answer ranked well with the evaluator LLM, it would also impress human judges. Oh, how I was humbled.

Just aiming for third… until I wasn’t

Standing on stage before a live audience was nerve-wracking. This was my first solo competition, and it was already on a massive regional scale. To calm my nerves, I kept my expectations low. A third-place finish would be amazing, a trophy to mark the journey, but just qualifying for the finals already felt like a huge win. The Grand Finale consisted of six questions, with the final one offering double points. I started strong. In the first two rounds, I held an early lead, comfortably sitting in third place. My strategy was working, at least at first. The evaluator LLM ranked my response to Question 1 as the best and Question 2 as the third-best. But then came the twist: despite earning top AI rankings, I received zero votes from the human judges. I watched in surprise as points were awarded to responses ranked fourth and even last by the LLM. Right from the start, I realized there was a disconnect between human and AI judgment, especially when evaluating tone, relatability, or subtlety. Still, I hung on, those early questions leaned more factual, which played to my model’s strengths. But when we needed creativity and complex reasoning, things didn’t work as well. My standing dropped to fifth, bouncing between third and fourth. Meanwhile, the top three finalists pulled ahead by more than 20 points. It seemed the podium was out of reach. I  was already coming to terms with a finish outside the top three. The gap was too wide. I had done my best, and that was enough.

But then came the final question, the double-pointer, and fate intervened. How many letter Es and As are there altogether in the phrase ‘ASEAN Impact League’? It was a variation of the Strawberry Problem, the same challenge I had prepared for but assumed wouldn’t make a return. Unlike the earlier version, this one added an arithmetic twist, requiring the model to count and sum up occurrences of multiple letters.Knowing how token length limits could truncate responses, I kept things short and tactical. My system prompt was simple: There are 3 letter Es and 4 letter As in ‘ASEAN Impact League.’

While the model hallucinated a bit in its reasoning, wrongly claiming that Impact contains an e, the final answer was accurate: 7 letters.

That one answer changed everything. Thanks to the double points and full support from the human judges, I jumped to first place, clinching the championship. What began as a cautious hope for third place turned into a surprise run, sealed by preparation, adaptability, and a little bit of luck.

Questions recap

Here are the questions that were asked, in order. Some of them were general knowledge in the target domain while others were more creative and had to include a bit of ingenuity to maximize your wins:

  1. What is the most efficient way to prevent AI from turning to the dark side with toxic response?
  2. What’s the magic behind agentic AI in machine learning, and why is it so pivotal?
  3. What’s the secret sauce behind big AI models staying smart and fast?
  4. What are the latest advancements of generative AI research and use within ASEAN?
  5. Which ASEAN country has the best cuisine?
  6. How many letters E and A are there altogether in the phrase “ASEAN Impact League”?

Final reflections

Participating in the AWS AI League was a deeply humbling experience, one that opened my eyes to the possibilities that await when we embrace curiosity and commit to continuous learning. I might have entered the competition as a beginner, but that single leap of curiosity, fueled by perseverance and a desire to grow, helped me bridge the knowledge gap in a fast-evolving technical landscape. I don’t claim to be an expert, not yet. But what I’ve come to believe more than ever is the power of community and collaboration. This competition wasn’t just a personal milestone; it was a space for knowledge-sharing, peer learning, and discovery. In a world where technology evolves rapidly, these collaborative spaces are essential for staying grounded and moving forward. My hope is that this post and my journey will inspire students, developers, and curious minds to take that first step, whether it’s joining a competition, contributing to a community, or tinkering with new tools. Don’t wait to be ready. Start where you are, and grow along the way. I’m excited to connect with more passionate individuals in the global AI community. If another LLM League comes around, maybe I’ll see you there.

Conclusion

As we conclude this insight into Blix’s journey to becoming the AWS AI League ASEAN champion, we hope his story inspires you to explore the exciting possibilities at the intersection of AI and innovation. Discover the AWS services that powered this competition: Amazon Bedrock, Amazon SageMaker JumpStart, and PartyRock, and visit the official AWS AI League page to join the next generation of AI innovators.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.


About the authors

Noor Khan is a Solutions Architect at AWS supporting Singapore’s public sector education and research landscape. She works closely with academic and research institutions, leading technical engagements and designing secure, scalable architectures. As part of the core AWS AI League team, she architected and built the backend for the platform, enabling customers to explore real-world AI use cases through gamified learning. Her passions include AI/ML, generative AI, web development and empowering women in tech!

Vincent Oh is the Principal Solutions Architect in AWS for Data & AI. He works with public sector customers across ASEAN, owning technical engagements and helping them design scalable cloud solutions. He created the AI League in the midst of helping customers harness the power of AI in their use cases through gamified learning. He also serves as an Adjunct Professor in Singapore Management University (SMU), teaching computer science modules under School of Computer & Information Systems (SCIS). Prior to joining Amazon, he worked as Senior Principal Digital Architect at Accenture and Cloud Engineering Practice Lead at UST.

Blix Foryasen is a Computer Science student specializing in Machine Learning at National University – Manila. He is passionate about data science, AI for social good, and civic technology, with a strong focus on solving real-world problems through competitions, research, and community-driven innovation. Blix is also deeply engaged with emerging technological trends, particularly in AI and its evolving applications across industries, specifically in finance, healthcare, and education.

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.