| Document | Author Benjamin Bowers, Catherine Harvey, Robert Houghton |
| Abstract The present research explores existing approaches and methods used to study and evaluate novel system components with increasingly autonomous capabilities, such as AI agents, in safety-critical domains. We report a large-scale review of the human–autonomy teaming (HAT) literature and extend the input-mediator-output model of team effectiveness to guide human-centred assessment, culminating in the IMO-A framework. We then test existing metrics and understand approaches to evaluation used by Human Factors researchers using a novel method which leverages an AI-generated video generation tool to develop underwater maritime scenarios across levels of autonomy. Our final recommendations serve as a roadmap for progressing HAT evaluation from fragmented, study-specific measurement choices toward a standardised, IMO-A guided, autonomy-appropriate and multi-method evidence base that can be translated into practitioner-ready early-phase HSI protocols for the safe integration of autonomous agents in safety-critical systems. |