Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Thoughtworks Research

 

Thoughtworks Research aims to advance the field of AI reliability by developing new theories, algorithms, methodologies and tools. Our open-source libraries, accelerators and research publications will empower you to build more reliable AI systems.

Q4 2025

This article introduces a Lie algebra framework for evaluating LLM-generated summaries, offering a mathematically grounded way to assess how well summaries preserve semantic structure and meaning

Q2 2025

Calculating uncertainty in generative AI

Runyan Tan Parag Mahajani

Uncertainty is inherent in AI systems. This article explores how models can quantify and manage uncertainty — including both statistical (aleatoric) and epistemic sources — using tools like Bayesian neural networks and dropout to improve reliability and trust.

Drawing on recent keynotes from AI’s top minds, this article outlines the emerging frontiers in the field — from exploring alternatives to transformer models and evolving hardware, to agentic and physical AI, open-source momentum, and AI ‘factories.’

Q1 2025

Evaluating LLMs using semantic entropy

Karrtik Iyer Parag Mahajani

Exploring how Semantic Entropy, a meaning-based uncertainty metric, offers a more reliable way to evaluate LLMs — especially for detecting confabulations — than traditional lexical or token-based measures

Q4 2024

LLM benchmarks evals and tests

Shayan Mohanty John Singleton Parag Mahajani

Understanding benchmarks, evals, and tests in the context of LLMs — arguing that benchmarks compare models, evals probe real-world behavior, and tests validate system reliability

Q4 2023

Examines structural and conceptual uncertainties in LLMs, offering methods to better predict model behavior and improve the reliability of generated responses

Q3 2023

This article presents a surprisingly effective embedding-based method to estimate the importance of individual tokens in LLM prompts — a lightweight proxy for attribution that compares favorably to more complex techniques

Q3 2021

A gentle introduction to machine teaching, a paradigm that shifts focus from making models smarter to empowering experts to teach them more efficiently — easing the bottleneck of domain expertise in AI workflows

Explains how probabilistic machine learning and weak supervision enable subject matter experts to collaboratively label data using heuristics, enhancing model performance through iterative refinement