The Workplace AI Science crew is a part of OPG. We construct programs that are leveraged throughout M365 and particularly inside Phrase, Excel, and PowerPoint. The crew’s current initiatives have included: PPT Summarization, Audio Overviews (Podcast), SPOCK Eval, Knowledge Pipeline, Pure Language to Workplace JS, and CUA.
PPT Summarization: The Workplace AI Science crew constructed the primary fine-tuned SLM inside M365. The fine-tuned Phi-3 Imaginative and prescient SLM improved p95 latency of PPT Visible Abstract characteristic from 13 seconds to 2 seconds, whereas maintaining quality (opens in new tab) on par with GPT-4o-v. This optimization resulted in 75 occasions fewer GPUs getting used in comparison with GPT-4o-v and nearly 9 occasions the variety of PowerPoint customers receiving a visible abstract. The fine-tuned SLM additionally powers PPT Visible Q&A, making it each sooner and cheaper. The crew additionally launched PPT Interactive Abstract, which permits customers to drill into visible summaries in additional element, resulting in over 50% decline in thumbs down per 100k tries over 3 months, 30% interactivity clicking on chevron to go deeper, and a 17.6% enhance in weekly return charge. The crew is presently fine-tuning 4o-mini-vision with the objective of changing remaining non-English visitors to GPT-4o-v with this smaller mannequin and evaluating Phi-4 Imaginative and prescient for English.
Audio Overviews: The crew is constructing the Audio Overview Ability that introduces a podcast-like expertise for consuming paperwork and artifacts. The characteristic is presently within the dogfood section for MSIT, with manufacturing rollout scheduled for Could 7 onwards. Customers might be ready to generate Audio Overviews from App Chat entry factors in Phrase Win32 & Net, Copilot Notebooks (together with OneNote), and different apps like Outlook Net, OneDrive Net and ODSP Cellular. Newest human evaluation (opens in new tab) scores general transcript high quality for the single file audio overview at 4.08/5.00 in contrast to three.76/5.00 for NotebookLM, and with automated evaluation (opens in new tab), the crew improved the general rating from an preliminary 4.09 to 4.65 with a two-step design leveraging GPT-4o and o3-mini. Extra particulars, together with analysis in opposition to a number of recordsdata for the Copilot Notebooks state of affairs and positive aspects from shifting to GPT-4.1, could be discovered here (opens in new tab).
SPOCK (AugLoop Eval): In collaboration with AugLoop, the Workplace AI Science crew developed a number of key options that allow agility in evaluating App Copilot state of affairs high quality metrics. By the tip of FY25Q3, 22 situations have been onboarded throughout Phrase, PPT, Workplace AI, and SharePoint, with Excel onboarding in-progress. The platform presently reliably runs 300 eval jobs and 30,000 assessments every day. The automated state of affairs analysis turnaround time in contrast to handbook run has considerably decreased from days to 2-4 hours. SPOCK now helps intent detection, Leo Metrics, BizChat 1K Question, Python, and Typescript buyer evaluators; mannequin swap and FlexV3 eval are coming in This fall. Moreover, the v-team is automating the App Copilot High quality Dashboard (ÆVAL – Copilot Evaluation (opens in new tab)), offering a complete overview of the standard of App Copilot situations.
Knowledge Pipeline: The crew additionally created an internet, self-serve, on-demand ADF pipeline for mining Workplace paperwork from the web. This permits companions to kick off large-scale information mining jobs for particular languages and doc varieties and options customized metadata extractors for extracting task-dependent doc representations. By leveraging Bing’s precrawled 40B URL RetroIndex, doc discovery is quick and environment friendly. OAI Science and a number of other companion groups (Phrase+Editor, PPT Science, Phrase Designer, Designer, MSAI) are already using the information for finetuning and check set creation.
Pure Language to Workplace JS: The Workplace AI Science crew is working to finetune o* household mannequin for frequent Workplace situations like inserting slides from one other PowerPoint file, inserting headers and footers in Phrase, or creating and discovering merged ranges in Excel.
CUA: The crew additionally just lately launched into an exploration of Laptop Consumer Agent (CUA) centered on understanding consumer intent and adapting in actual time. Leveraging plan help with the Workplace data base, the crew roughly doubled the duty completion charge in opposition to OSWorld PPT situations. The crew is engaged on fine-tuning the CUA mannequin to enhance process completions for Workplace apps.
For extra contact: Amanda Gunnemo or Vishal Chowdhary