Advanced seminar on audio information processing

Assistant: Dipl.-Ing. Matthieu Kuntz and the AIP team
Turnus: Winter and summer semester
Target Group: Wahlmodul zur fachlichen Ergänzung (Master EI)
Doctoral seminar
Schedule: 2 SWS
Exam: oral
Time & Place: Thursday, 09:45 - 11:15 hours, N6507
Dates: Start on 25.04.2019 (topic selection)


The seminar is targeted at advanced students, PhD candidates and post-docs in the field of audio-information processing. Scientific publications on current topics in audio-information processing are presented in a small group and discussed in depth ("journal club"). Each participant will present at last one publication (usually two) and lead the discussion. To prepare for the discussion each participant will read the material prior to each seminar meeting. The focus of the seminar is on understanding and discussing the content. Participants get to know current topics in audio-information processing, train the comprehension of English-language scientific publications and practice scientific discourse as well as leading a discussion.

Previous knowledge expected: Leture Psychoacoustics and Audiological Applications

Every student is responsible for proper registration for the exam in TUMOnline.

Seminar topics for SS 19

Spectro-temporal templates unify the pitch percepts of resolved and unresolved harmonics

Advisor:Dr. Clara Hollomey
Publication:Shamma, S., & Dutta, K. (2019). Spectro-temporal templates unify the pitch percepts of resolved and unresolved harmonics. The Journal of the Acoustical Society of America 145(2), 615-629
Abstract:Pitch is a fundamental attribute in auditory perception involved in source identification and segregation, music, and speech understanding. Pitch percepts are intimately related to harmonic resolvability of sound. When harmonics are well-resolved, the induced pitch is usually salient and precise, and several models relying on autocorrelations or harmonic spectral templates can account for these percepts. However, when harmonics are not completely resolved, the pitch percept becomes less salient, poorly discriminated, with upper range limited to a few hundred hertz, and spectral templates fail to convey percept since only temporal cues are available. Here, a biologically-motivated model is presented that combines spectral and temporal cues to account for both percepts. The model explains how temporal analysis to estimate the pitch of the unresolved harmonics is performed by bandpass filters implemented by resonances in dendritic trees of neurons in the early auditory pathway. It is demonstrated that organizing and exploiting such dendritic tuning can occur spontaneously in response to white noise. This paper then shows how temporal cues of unresolved harmonics may be integrated with spectrally resolved harmonics, creating spectro-temporal harmonic templates for all pitch percepts. Finally, the model extends its account of monaural pitch percepts to pitches evoked by dichotic binaural stimuli.

Localization of broadband sounds carrying interaural time differences: Effects of frequency, reference location, and interaural coherence

Advisor:Dr. Clara Hollomey
Publication:Buchholz, J., Le Goff, N., Dau, T. (2018). Localization of broadband sounds carrying interaural time differences: Effects of frequency, reference location, and interaural coherence. The Journal of the Acoustical Society of America 144, 2225;
Abstract:The auditory processes involved in the localization of sounds in rooms are still poorly understood. The present study investigated the auditory system's across-frequency processing of interaural time differences (ITDs) and the impact of the interaural coherence (IC) of the stimuli in ITD discrimination and localization. First, ITD discrimination thresholds were measured as a function of signal frequency, reference ITD, and IC using critical-band wide noises. The resulting data were fitted with a set of analytical functions and ITD weights were derived using concepts from signal detection theory. Inspired by the weighted-image model [Stern, Zeiberg, and Trahiotis. (1988). J. Acoust. Soc. Am. 84, 156–165], the derived ITD weights were then integrated in a simplified localization model using an optimal combination of ITD information across frequency. To verify this model, a series of localization experiments were conducted using broadband noise in which ITD and IC were varied across frequency. The model predictions were in good agreement with the experimental data, supporting the assumption that the auditory system performs a weighted integration of ITD information across frequency to localize a sound source. The results could be valuable for the design of new paradigms to measure localization in more complex acoustic conditions and may provide constraints for future localization models.

Objective analysis of ambisonics for hearing aid applications: Effect of listener’s head, room reverberation, and directional microphones

Advisor:Dr. Clara Hollomey
Publication:Oreinos, C., Buchholz, J. (2015). Objective analysis of ambisonics for hearing aid applications: Effect of listener’s head, room reverberation, and directional microphones. The Journal of the Acoustical Society of America,137: 3447 []
Abstract:Recently, an increased interest has been demonstrated in evaluating hearing aids (HAs) inside controlled, but at the same time, realistic sound environments. A promising candidate that employs loudspeakers for realizing such sound environments is the listener-centered method of higher-order ambisonics (HOA). Although the accuracy of HOA has been widely studied, it remains unclear to what extent the results can be generalized when (1) a listener wearing HAs that may feature multimicrophone directional algorithms is considered inside the reconstructed sound field and (2) reverberant scenes are recorded and reconstructed. For the purpose of objectively validating HOA for listening tests involving HAs, a framework was developed to simulate the entire path of sounds presented in a modeled room, recorded by a HOA microphone array, decoded to a loudspeaker array, and finally received at the ears and HA microphones of a dummy listener fitted with HAs. Reproduction errors at the ear signals and at the output of a cardioid HA microphone were analysed for different anechoic and reverberant scenes. It was found that the diffuse reverberation reduces the considered time-averaged HOA reconstruction errors which, depending on the considered application, suggests that reverberation can increase the usable frequency range of a HOA system.

The relationship between speech recognition, behavioural listening effort, and subjective ratings

Advisor:Dr. Ľuboš Hládek
Publication:Erin M. Picou & Todd A. Ricketts (2018) The relationship between speech recognition, behavioural listening effort, and subjective ratings, International Journal of Audiology, 57:6, 457-467, DOI:
Abstract:Objective: The purpose of this study was to evaluate the reliability and validity of four subjective questions related to listening effort. A secondary purpose of this study was to evaluate the effects of hearing aid beamforming microphone arrays on word recognition and listening effort. Design: Participants answered subjective questions immediately following testing in a dual-task paradigm with three microphone settings in a moderately reverberant laboratory environment in two noise configurations. Participants rated their: (1) mental work, (2) desire to improve the situation, (3) tiredness, and (4) desire to give up. Data were analysed using repeated measures and reliability analyses. Study sample: Eighteen adults with symmetrical sensorineural hearing loss participated. Results: Beamforming differentially affected word recognition and listening effort. Analysis revealed the same pattern of results for behavioural listening effort and subjective ratings of desire to improve the situation. Conversely, ratings of work revealed the same pattern of results as word recognition performance. Ratings of tiredness and desire to give up were unaffected by hearing aid microphone or noise configuration. Conclusions: Participant ratings of their desire to control the listening situation appear to reliable subjective indicators of listening effort that align with results from a behavioural measure of listening effort.

Categorization of Natural Dynamic Audiovisual Scenes

Advisor:Dr. Ľuboš Hládek
Publication:Rummukainen O, Radun J, Virtanen T, Pulkki V (2014) Categorization of Natural Dynamic Audiovisual Scenes. PLOS ONE 9(5): e95848.

This work analyzed the perceptual attributes of natural dynamic audiovisual scenes in two consec-utive experiments. First, we presented 30 naive participants with 19 natural scenes depicting urban environments reproduced with an immersive audiovisual display utilizing surrounding visual projec-tions and spatial audio reproduction. The aim was to assess the perceptual dimensionality of natural scenes, and to identify significant perceptual attributes by means of a similarity categorization task and an interview. A two-dimensional perceptual map of the stimulus scenes and perceptual attributes was formed, and the exploratory results show the amount of movement and perceived noisiness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. Our second experiment was organized with 23 naive participants to assess the modality contributions in three salient perceptual attributes through pairwise unimodal and bimodal scene discrimination tasks with short (< 500 ms) natural scene exposures. The chosen attributes were movement, noisiness and openness. Both visual and auditory information were found to affect scene discrimination in all the attributes, and bimodal discrimination was found superior to either of the unimodal accuracies in most cases. We propose that the study of natural scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. The stimulus scenes are available as a public database of spherical video recordings and A-format audio recordings.

How aging impacts the encoding of binaural cues and the perception of auditory space

Advisor:Dr. Ľuboš Hládek
Publication:Eddins, A.C., Ozmeral, E.J., Eddins, D.A., How aging impacts the encoding of binaural cues and the perception of auditory space, Hearing Research (2018), doi: 10.1016/ j.heares.2018.05.001

Over the years, the effect of aging on auditory function has been investigated in animal models and humans in an effort to characterize age-related changes in both perception and physiology. Here, we review how aging may impact neural encoding and processing of binaural and spatial cues in human listeners with a focus on recent work by the authors as well as others. Age-related declines in monaural temporal processing, as estimated from measures of gap detection and temporal fine structure discrimination, have been associated with poorer performance on binaural tasks that require precise temporal processing. In lateralization and localization tasks, as well as in the detection of signals in noise, marked age-related changes have been demonstrated in both behavioral and electrophysiological measures and have been attributed to declines in neural synchrony and reduced central inhibition with advancing age. Evidence for such mechanisms, however, are influenced by the task (passive vs. attending) and the stimulus paradigm (e.g., static vs. continuous with dynamic change). That is, cortical auditory evoked potentials (CAEP) measured in response to static interaural time differences (ITDs) are larger in older versus younger listeners, consistent with reduced inhibition, while continuous stimuli with dynamic ITD changes lead to smaller responses in older compared to younger adults, suggestive of poorer neural synchrony. Additionally, the distribution of cortical activity is broader and less asymmetric in older than younger adults, consistent with the hemispheric asymmetry reduction in older adults model of cognitive aging. When older listeners attend to selected target locations in the free field, their CAEP components (N1, P2, P3) are again consistently smaller relative to younger listeners, and the reduced asymmetry in the distribution of cortical activity is maintained. As this research matures, proper neural biomarkers for changes in spatial hearing can provide objective evidence of impairment and targets for remediation. Future research should focus on the development and evaluation of effective approaches for remediating these spatial processing deficits associated with aging and hearing loss.

The auditory system and human sound localization behavior

Advisor:Dr. Ľuboš Hládek
Publication: Book by John van Opstal

The Auditory System and Human Sound-Localization Behavior provides a comprehensive account of the full action-perception cycle underlying spatial hearing. It highlights the interesting properties of the auditory system, such as its organization in azimuth and elevation coordinates. Readers will appreciate that sound localization is inherently a neuro-computational process (it needs to process on implicit and independent acoustic cues). The localization problem of which sound location gave rise to a particular sensory acoustic input cannot be uniquely solved, and therefore requires some clever strategies to cope with everyday situations. The reader is guided through the full interdisciplinary repertoire of the natural sciences: not only neurobiology, but also physics and mathematics, and current theories on sensorimotor integration (e.g. Bayesian approaches to deal with uncertain information) and neural encoding.

Influence of working memory and attention on sound-quality ratings

Advisor:Dipl.-Ing. Matthieu Kuntz
Publication:Huber, R., Rählmann, S., Bisitz, T., Meis, M., Steinhauser, S., Meister, H., (2019). Influence of working memory and attention on sound-quality ratings. The Journal of the Acoustical Society of America 145(3), 1283-1292

This study investigated the potential influence of cognitive factors on subjective sound-quality ratings. To this end, 34 older subjects (ages 61–79) with near-normal hearing thresholds rated the perceived sound quality of speech and music stimuli that had been distorted by linear filtering, nonlinear processing, and multiband dynamic compression. In addition, all subjects performed the Reading Span Test (RST) to assess working memory capacity (WMC), and the test d2-R (a visual test of letter and symbol identification) was used to assess the subjects’ selective and sustained attention. The quality-rating scores, which reflected the susceptibility to signal distortions, were characterized by large interindividual variances. Linear mixed modelling with age, high-frequency pure tone threshold, RST, and d2-R results as independent variables showed that individual speech-quality ratings were significantly related to age and attention. Music-quality ratings were significantly related to WMC. Taking these factors into account might lead to improved sound-quality prediction models. Future studies should, however, address the question of whether these effects are due to procedural mechanisms or actually do show that cognitive abilities mediate sensitivity to sound-quality modifications.

Crispness, speech intelligibility, and coloration of reverberant recordings played back in another reverberant room (Room-In-Room)

Advisor:Dipl.-Ing. Matthieu Kuntz
Publication:Haeussler, van de Par (2019). Crispness, speech intelligibility, and coloration of reverberant recordings played back in another reverberant room (Room-In-Room). The Journal of the Acoustical Society of America 145(2), 931-944

This work examines the acoustical and perceptual consequences that can be found in a transfer chain consisting of a sound recorded in one room which is played back over a loudspeaker in another room. The total resulting “Room-In-Room” (RinR) response can be modelled as a convolution of the Room Impulse Response of the first and second room. Due to the convolution an increase in the reverberation time, pulse density, and a change of the temporal envelope of the early reflections can be observed, compared to a single room. In the spectral domain, the convolution results in an increase in spectral modulation strength, responsible for coloration. The listening test investigating the perceptual consequences of RinR found a decrease in perceived crispness due to reproduction in a playback room, especially for highly reverberant conditions. When within normal sized rooms the reverberation time and total source-receiver distance were kept constant, RinR and a single room condition showed no reduction in crispness. On the other hand, a strong increase in the perceived coloration was measured. Furthermore, a decrease in speech intelligibility has been found for RinR conditions, compared to single rooms (Speech Reception Threshold of 2–3 dB).

Object-based attention in complex, naturalistic auditory streams

Advisor:Norbert Kolotzek, M.Sc.
Publication:Marinato, G., Baldauf, D. (2019). Object-based attention in complex, naturalistic auditory streams. Nature 9:2854

In vision, objects have been described as the ‘units’ on which non-spatial attention operates in many natural settings. Here, we test the idea of object-based attention in the auditory domain within ecologically valid auditory scenes, composed of two spatially and temporally overlapping sound streams (speech signal vs. environmental soundscapes in Experiment 1 and two speech signals in Experiment 2). Top-down attention was directed to one or the other auditory stream by a non-spatial cue. To test for high-level, object-based attention effects we introduce an auditory repetition detection task in which participants have to detect brief repetitions of auditory objects, ruling out any possible confounds with spatial or feature-based attention. The participants’ responses were significantly faster and more accurate in the valid cue condition compared to the invalid cue condition, indicating a robust cue-validity effect of high-level, object-based auditory attention.

Temporal dynamics and uncertainty in binaural hearing revealed by anticipatory eye Movements

Advisor:Norbert Kolotzek, M.Sc.
Publication:Winn, M.B., Kan, A., Litovsky, R. (2019) Temporal dynamics and uncertainty in binaural hearing revealed by anticipatory eye Movements. The Journal of the Acoustical Society of America 145(2), 676-691

Accurate perception of binaural cues is essential for left-right sound localization. Much literature focuses on threshold measures of perceptual acuity and accuracy. This study focused on supra-threshold perception using an anticipatory eye movement (AEM) paradigm designed to capture subtle aspects of perception that might not emerge in behavioral-motor responses, such as the accumulation of certainty, and rapid revisions in decision-making. Participants heard interaural timing differences (ITDs) or interaural level differences in correlated or uncorrelated narrowband noises, respectively. A cartoon ball moved behind an occluder and then emerged from the left or right side, consistent with the binaural cue. Participants anticipated the correct answer (before it appeared) by looking where the ball would emerge. Results showed quicker and more steadfast gaze fixations for stimuli with larger cue magnitudes. More difficult stimuli elicited a wider distribution of saccade times and greater number of corrective saccades before final judgment, implying perceptual uncertainty or competition. Cue levels above threshold elicited some wrong-way saccades that were quickly corrected. Saccades to ITDs were earlier and more reliable for low-frequency noises. The AEM paradigm reveals the time course of uncertainty and changes in perceptual decision-making for supra-threshold binaural stimuli even when behavioral responses are consistently correct.


The registration for the seminar can be done via TUMOnline.