Document

thumbnail of Human-Centred Evaluation Approaches for Autonomous Agents

Author
Benjamin Bowers, Catherine Harvey, Robert Houghton
Abstract
The present research explores existing approaches and methods used to study and evaluate novel system components with increasingly autonomous capabilities, such as AI agents, in safety-critical domains. We report a large-scale review of the human–autonomy teaming (HAT) literature and extend the input-mediator-output model of team effectiveness to guide human-centred assessment, culminating in the IMO-A framework. We then test existing metrics and understand approaches to evaluation used by Human Factors researchers using a novel method which leverages an AI-generated video generation tool to develop underwater maritime scenarios across levels of autonomy. Our final recommendations serve as a roadmap for progressing HAT evaluation from fragmented, study-specific measurement choices toward a standardised, IMO-A guided, autonomy-appropriate and multi-method evidence base that can be translated into practitioner-ready early-phase HSI protocols for the safe integration of autonomous agents in safety-critical systems.