Thoughtworks research

Thoughtworks Research

Thoughtworks Research aims to advance the field of AI reliability by developing new theories, algorithms, methodologies and tools. Our open-source libraries, accelerators and research publications will empower you to build more reliable AI systems.

Q4 2025

Evaluating LLM-generated summaries using the Lie algebra framework

Manikandan Ravikiran Parag Mahajani

This article introduces a Lie algebra framework for evaluating LLM-generated summaries, offering a mathematically grounded way to assess how well summaries preserve semantic structure and meaning

Q2 2025

Calculating uncertainty in generative AI

Runyan Tan Parag Mahajani

Uncertainty is inherent in AI systems. This article explores how models can quantify and manage uncertainty — including both statistical (aleatoric) and epistemic sources — using tools like Bayesian neural networks and dropout to improve reliability and trust.

The next frontiers in AI — according to industry leaders

Runyan Tan Parag Mahajani

Drawing on recent keynotes from AI’s top minds, this article outlines the emerging frontiers in the field — from exploring alternatives to transformer models and evolving hardware, to agentic and physical AI, open-source momentum, and AI ‘factories.’

Q1 2025

Evaluating LLMs using semantic entropy

Karrtik Iyer Parag Mahajani

Exploring how Semantic Entropy, a meaning-based uncertainty metric, offers a more reliable way to evaluate LLMs — especially for detecting confabulations — than traditional lexical or token-based measures

Q4 2024

LLM benchmarks evals and tests

Shayan Mohanty John Singleton Parag Mahajani

Understanding benchmarks, evals, and tests in the context of LLMs — arguing that benchmarks compare models, evals probe real-world behavior, and tests validate system reliability

Q4 2023

Decoding LLM uncertainties for better predictability

Shayan Mohanty

Examines structural and conceptual uncertainties in LLMs, offering methods to better predict model behavior and improve the reliability of generated responses

Q3 2023

A surprisingly effective way to estimate token importance in LLM prompts

Shayan Mohanty

This article presents a surprisingly effective embedding-based method to estimate the importance of individual tokens in LLM prompts — a lightweight proxy for attribution that compares favorably to more complex techniques

Q3 2021

A gentle introduction to machine teaching

Shayan Mohanty

A gentle introduction to machine teaching, a paradigm that shifts focus from making models smarter to empowering experts to teach them more efficiently — easing the bottleneck of domain expertise in AI workflows

Probabilistic machine learning and weak supervision

Shayan Mohanty

Explains how probabilistic machine learning and weak supervision enable subject matter experts to collaboratively label data using heuristics, enhancing model performance through iterative refinement