When we look at a visual scene, salient features, such as a region of high contrast or a familiar object, attract our attention. Here, we examined how image saliency-based on computational models developed to predict human eye movements-influences how human observers expect the visual attention of others to be directed. We created animations in which a cone-shaped object (the agent) 'looks' at complex images displayed in a 3D scene. The agent's looking behaviour was controlled by eye movements recorded from human observers while looking at the same images as the agent or different images. Expectations about looking behaviour were assessed by asking participants (N = 24) to judge whether the agent's gaze was matched to the scene. We find that participants can detect a mismatch between the agent's looking behaviour and its visual environment, based on how the pattern of fixations displayed by the agent align with the visual content of the scene. Discrimination sensitivity was modulated by the overlap between the agent's gaze and salient image features: for example, participants struggled to identify a mismatch when the mismatched gaze aligned by chance with salient features of the displayed image. Further analysis suggests that participants are likely using a combination of low- through to high-level image features to determine the expected gaze behaviour of the agent. Our findings highlight that image saliency models are useful for understanding not only how a person engages with their environment but also their expectations for how others should interact with the environment.