of 0.989 and recall of 0.888 on this task.
While previous approaches focused on artificially
generated corruption, our study presents a workflow
for collecting real display corruption of various types
from a vast range of video games while using Struc-
tural Similarity Index Measure (SSIM) to ensure the
collected images are visually diverse. We presented
a two-stage training procedure and demonstrated its
effectiveness through a variety of neural networks
achieving high validation performance. We presented
the distribution of correct and incorrect predictions on
corrupted images for our top-performing model and
argued that, with sufficient training data per corrup-
tion type, a DCNN can be successfully trained to de-
tect a wide variety of graphical malfunctions. Finally,
we showed that Grad-CAM can be leveraged to pro-
vide interpretability in our neural net’s predictions.
The main shortcoming of our pipeline is the gameplay
being highly scripted throughout the game tests. In
the future, we plan on extending our method to pro-
vide a gameplay testing experience close to human
behavior. This allows for more in-game exploration,
thus providing more visually varied gameplay scenar-
ios and a potentially diverse source of corrupted im-
Moreover, our current method detects corruption
in individual images without accounting for adja-
cent game frames. Incorporating video understanding
with Long-term Recurrent Convolutional Networks
(J. Donahue and Darrell, 2017) could provide valu-
able insights on the game context and potentially im-
prove the overall performance.
In this study, we treated the task at hand as a bi-
nary classification problem. Given sufficient data per
corruption category, a DCNN could be trained as a
multi-label classifier to effectively detect each corrup-
tion subtype separately. In GPU testing workflow, this
could further minimize the amount of manual triage
required upon the detection of visual corruption.
