When Do Auditors Rely on (Bad) AI Advice?

Publication Summary

Generative AI systems (“GenAI”) can provide auditors with natural-language recommendations that resemble professional advice. Such tools have the potential to support audit judgments. However, a lack of transparency in their processes and reasoning also raises practical questions: when, and to what extent, should auditors rely on AI-generated advice?

Because GenAI recommendations are not directly explainable, auditors must rely on indirect cues to assess credibility. In practice, a key indirect cue is AI performance. Firms and software providers commonly disclose stated accuracy levels, either framed in terms of accuracy (“the AI system is 95% accurate”) or in terms of error (“the AI system has a 5% error rate”).

Framing AI performance in terms of either “accuracy” or “error” may affect auditors’ reliance in unanticipated ways. The key issue for audit practice is whether these performance cues support appropriate calibration. That is, auditors should use sound advice but remain skeptical of weak output.

In this practitioner report, we summarize evidence from an experimental study with practicing auditors examining the effects of stated accuracy and performance framing on reliance on high-quality and low-quality GenAI advice. Our findings show that performance communication influences reliance decisions, with implications for the design and implementation of GenAI in judgment-intensive audit tasks.