Understanding, and shaping, the growing role of artificial intelligence in legal reasoning, adjudication, and regulation
Artificial intelligence is rapidly becoming a central actor in the legal domain. Even when large language models (LLMs) do not replace human decision-makers, they increasingly support them, potentially functioning as a powerful benchmark that can influence judges, lawyers, regulators, and the public. This raises a foundational question: What happens to the rule of law when “legal judgment” is partly outsourced to systems that reason differently than humans do?
A core theme of my work in this area is how AI handles legal questions that do not have a single, demonstrably correct answer—especially cases that involve tension between formal legal rules and considerations of justice and equity. In Law, Justice, and Artificial Intelligence, we compare the decisions of GPT-4o, Claude Sonnet 4, and Gemini 2.5 Flash with those of laypersons and legal professionals (including judges), across six vignette-based experiments comprising about 50,000 decisions. We show, that unlike humans, LLMs do not “balance” law and equity. When instructed to follow the law, they largely ignore justice; when instructed to decide based on justice, they tend to disregard the legal rules. Moreover, requiring reasons or providing precedents has little effect on their responses. We also demonstrate that certain prompts can reduce formalism somewhat, but LLMs remain substantially more rigid than humans.
Beyond documenting these behavioral patterns, my research seeks to develop a principled framework for evaluating AI-driven legal decision-making in discretionary contexts—where legal systems rely on standards, competing principles, and judgment rather than mechanical rule application. I propose evaluating such systems along five dimensions: accuracy, consistency, bias, sensitivity, and persuasiveness.
A related strand of my work examines how AI affects legal behavior and legal norms outside the courtroom—including in contract settings where people often weigh formal entitlements against relational and moral considerations. In experimental work on contractual empathy, we find that human decision-makers sometimes waive rights and even share losses when a counterparty faces genuine hardship. By contrast, GPT-4o almost never engages in genuine loss-sharing and treats strict enforcement as both legally and normatively correct—revealing an “empathy gap” between human and AI judgment.