Although the way tones are acquired by second or foreign language learners has attracted some scholarly attention, detailed knowledge of the factors that promote efficient learning is lacking. In this article, we look at the effect of visual cues (comparing audio-only with audio-visual presentations) and speaking style (comparing a natural speaking style with a teaching speaking style) on the perception of Mandarin tones by non-native listeners, looking both at the relative strength of these two factors and their possible interactions. Both the accuracy and reaction time of the listeners were measured in a task of tone identification. Results showed that participants in the audio-visual condition distinguished tones more accurately than participants in the audio-only condition. Interestingly, this varied as a function of speaking style, but only for stimuli from specific speakers. Additionally, some tones (notably tone 3) were recognized more quickly and accurately than others.