Your Multimodal Speech Model Says I Have a Face for Radio
Researchers pair identical audio with different faces and show speech models change transcription quality depending on who appears on screen. Error rates shift by gender and ethnicity. If you build audio-visual systems, you now have to test for bias introduced by the extra modality, not just the microphone.
Maya K. Nachesa, Vlad Niculae