Episode

Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

Podcast
MLOps.community
Published
Dec 2, 2025
Duration seconds
1756
Processing state
failed
Canonical source
https://podcasters.spotify.com/pod/show/mlops/episodes/Hardening-Agents-for-E-commerce-Scale-From-RL-Alignment-to-Reliability--Panel-2-e3bp2j1
Audio
https://anchor.fm/s/174cb1b8/podcast/play/112019489/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-11-2%2F413593912-44100-2-c4bd04f78b8e6.mp3
JSON
/v1/public/podcasts/mlops-community/episodes/hardening-agents-for-e-commerce-scale-from-rl-alignment-to-reliability-panel-2
Markdown
/podcast/mlops-community/hardening-agents-for-e-commerce-scale-from-rl-alignment-to-reliability-panel-2.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/mlops-community/episodes/hardening-agents-for-e-commerce-scale-from-rl-alignment-to-reliability-panel-2/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/mlops-community/hardening-agents-for-e-commerce-scale-from-rl-alignment-to-reliability-panel-2.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Thanks to Prosus Group for collaborating on the Agents in Production Virtual Conference 2025. Abstract // The discussion centers on highly technical yet practical themes, such as the use of advanced post-training techniques like Direct Preference Optimization (DPO) and Parameter-Efficient Fine-Tuning (PEFT) to ensure LLMs maintain stability while specializing for e-commerce domains. We compare the implementation challenges of Computer-Using Agents in automating legacy enterprise systems versus the stability issues faced by conversational agents when inputs become unpredictable in production. We will analyze the role of cloud infrastructure in supporting the continuous, iterative training loops required by Reinforcement Learning-based agents for e-commerce! Bio // Paul van der Boor (Panel Host) // Paul van der Boor is a Senior Director of Data Science at Prosus and a member of its internal AI group. Arushi Jain (Panelist) // Arushi is a Senior Applied Scientist at Microsoft, working on LLM post-training for Computer-Using Agent (CUA) through Reinforcement Learning. She previously completed Microsoft’s competitive 2-year AI Rotational Program (MAIDAP), building and shipping AI-powered features across four product teams. She holds a Master’s in Machine Learning from the University of Michigan, Ann Arbor, and a Dual Degree in Economics from IIT Kanpur. At Michigan, she led the NLG efforts for the Alexa Prize Team, securing a $250K research grant to develop a personalized, active-listening socialbot. Her research spans collaborations with Rutgers School of Information, Virginia Tech’s Economics Department, and UCLA’s Center for Digital Behavior. Beyond her technical work, Arushi is a passionate advocate for gender equity in AI. She leads the Women in Data Science (WiDS) Cam…