Hauptseminar Audio-Informationsverarbeitung

Dozent:
Assistenten: Dr. Ľuboš Hládek und AIP-Mitarbeiter
Turnus: Winter- und Sommersemester
Zielgruppe: Wahlmodul Hauptseminar, Master EI, EI7764
ECTS: 5
Umfang: 2 SWS
Prüfung: Bewerteter Vortrag und Diskussionsbeitrag, sowie schriftliche Ausarbeitung (Hausarbeit)
Zeit & Ort: Mittwoch,    14:00 Uhr - 15:30 Uhr, N0116
Termine: 24.04.2019 Themenvergabe
08.05.2019 Seminar Vortragstechniken
Termine für Studentenvorträge: 26.06.2019, 03.07.2019, 10.07.2019

Inhalt

Wechselnde Schwerpunktthemen zu aktuellen Fragen aus der Audio-Informationsverarbeitung, beispielsweise zur Signalverarbeitung von Musik und Sprache, zur Psychoakustik, zu auditorischen Modellen oder zur Raumakustik.
Die Studenten arbeiten einen Vortrag zu einem ausgewählten Thema aus, tragen ihn vor und üben sich im Beantworten von Fragen ähnlich einer Konferenzsituation. Ziel des Seminars ist es, die Studierenden fachlich und insbesondere in Vortragstechniken und Literaturrecherche zu schulen. Material zum Themeneinstieg wird bereitgestellt, von dem aus vertiefend recherchiert werden soll. Dadurch lernen die Teilnehmer aktuelle Fragestellungen aus der Audio-Informationsverarbeitung kennen und sie schulen das Lesen von englischsprachigen wissenschaftlichen Veröffentlichungen. Weiterhin erstellen die Studierenden eine schriftliche Ausarbeitung zum Vortragsthema, die zusammen mit den kommentierten Folien abgegeben und bewertet wird.

Seminarthemen SS 2019

HRTF individualization: Approaches and perceptual benefits

Betreuerin:Dr. Clara Hollomey
Beschreibung:

As sound arrives at the listener, the size and shape of the head, ears, ear canal, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Humans use this information for the localization of sounds. The head-related transfer functions summarize these reflections and can thus be used to describe how a sound from a specific point will arrive at the ear. Therefore, they are useful for simulating virtual acoustic environments and 3D sound.

However, HRTFs are highly individual, and the perceived realism of the virtual acoustic environment depends strongly on the similarity of the HRTFs that have been used for rendering the sound scene to those of the listener. Therefore, several approaches to HRTF individualization have been proposed.

The presentation should give an overview on these approaches, their advantages and drawbacks. Moreover, their computational requirements should be contrasted with their degree of achievable perceptual accuracy.

This presentation will preferably be performed in English. The student who accepts this topic will not be rated regarding his/her English speaking ability. The student will however be marked on how well he/she understands the topic and how well the presentation is structured to clearly convey the desired message (as would be the case when presenting in any language).

Literatur:

Guezenoc and Sequier (2018): HRTF Individualization: A Survey. Proceedings of the 145th Audio Engineering Society Convention

Mokhtari, P., Nishimura, R., and Takemoto,H., “Toward HRTF personalization: an auditory-perceptual evaluation of simulated and measured HRTFs,” Proceedings of the 14th International Conference on Auditory Display, Paris, France,2008.

Hölzl, J., “A Global Model for HRTF Individualization by Adjustment of Principal Component Weights,” Master Thesis, Graz University of Technology, 2014.

Seeber, B. U. and Fastl, H., “Subjective selection of non-individual head-related transfer functions,” International Conference on Auditory Display(ICAD), Boston, MA, USA, 2003.

Interpolation of Head-Related Transfer Functions (HRTFs): Approaches and perceptual implications

Betreuerin:Dr. Clara Hollomey
Beschreibung:

As sound arrives at the listener, the size and shape of the head, ears, ear canal, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Humans use this information for the localization of sounds. The head-related transfer functions summarize these reflections and can thus be used to describe how a sound from a specific point will arrive at the ear. Therefore, they are useful for simulating virtual acoustic environments and 3D sound.

However, realistic synthesis in virtual auditory displays requires the interpolation of the frequency and phase responses of HRTFs to enable a rendering of small changes in sound source positions. Several methods for approximating spatially continuous HRTFs, comprising statistical (e.g. using PCA), numerical and analytical approaches (e.g. interpolation via spherical harmonics), each with their own advantages and drawbacks, have been proposed.

The presentation should give an overview on these approaches, their advantages and drawbacks. Moreover, their computational requirements should be contrasted with their degree of achievable perceptual accuracy.

This presentation will preferably be performed in English. The student who accepts this topic will not be rated regarding his/her English speaking ability. The student will however be marked on how well he/she understands the topic and how well the presentation is structured to clearly convey the desired message (as would be the case when presenting in any language).

Literatur:

Kistler and Wightman (1997) A model of head‐related transfer functions based on principal components analysis and minimum‐phase reconstruction. The Journal of the Acoustical Society of America 91, 1637 (1992)

Zieglwanger, Majdak and Kreuzer (2015) Numerical calculation of listener-specific head-related transfer functions and sound localization: Microphone model and mesh discretization The Journal of the Acoustical Society of America 138, 208

Romigh et al. (2015) Efficient Real Spherical Harmonic Representations of Head-Related Transfer Functions. IEEE Journal of selected Topics in Signal Processing 9(5), 921-930

Blauert, Jens; Spatial Hearing 2nd ed.; MIT Press, Cambridge, MA; 1997- The spherical head model

What we know and do not know about listener envelopment

Betreuerin:Dr. Clara Hollomey
Beschreibung:

The perceived spatial impression and/or spaciousness is an important aspect of describing the perceptual characteristics of rooms. It is usually defined by two attributes: the apparent source width (ASW), and the listener envelopment (LEV). While the concept of ASW refers to the perceived width of distinct sound sources, listener envelopment addresses if, and how the listener feels surrounded by the sound, rather than listening to it as if through a window.

While ASW perception has been the subject of extensive research, and consequently is reasonably well understood, this is not the case for LEV. Objective measures have been suggested for both, such as the Lateral Energy Fraction (LF), the Interaural Cross-Correlation Coefficient (IACC), and, specifically targeted at the LEV, the Late Lateral Level (GLL).

Yet, what exactly designates listener envelopment, and where the boundaries between ASW and LEV are, remains largely elusive. Research is still ongoing about which factors influence LEV, and which objective measure gauges spatial impression the best. The presentation should comprise an outline on the research performed on LEV, the associated challenges, and how the results confirm and/or contradict each other.

This presentation will preferably be performed in English. The student who accepts this topic will not be rated regarding his/her English speaking ability. The student will however be marked on how well he/she understands the topic and how well the presentation is structured to clearly convey the desired message (as would be the case when presenting in any language).

Literatur:

Berg, Jan; Nyberg, Dan (2008) Listener Envelopment — What Has Been Done and What Future Research Is Needed? AES 124th Convention, Amsterdam, The Netherlands

J. Bradley and G. Soulodre, “Objective Measures of Listener Envelopment,” J. Acoust. Soc. Am., vol. 98, pp. 2590-2597 (1995).

Morimoto, Masayuki; Jinya, Munehiro; Nakagawa, Koichi. Effects of frequency characteristics of reverberation time on listeners envelopment. Journal of the Acoustical Society of America, Volume 122, Issue 3, pp. 1611-1615, September, 2007

Stirnat and Ziemer (2019) Spaciousness in Music: the Tonmeister's Intention and the Listener's Perception. Proceedings of the International Symposium on Sound pp 42-51

Auditive Wahrnehmung von Geschwindigkeit

Betreuer:Norbert Kolotzek, M.Sc.
Beschreibung:

Im alltäglichen Leben sind wir häufig mit sich bewegenden Objekten konfrontiert, die wir sowohl visuell als auch akustisch erfassen können. Um sich zum Beispiel im Straßenverkehr sicher bewegen zu können, ist es von großer Wichtigkeit die Geschwindigkeit sich nahender Objekte beurteilen zu können. Sich bewegende Objekte bieten eine Vielzahl an zeitvarianten akustische Cues, die das menschliche Gehör zur Wahrnehmung und zur Einschätzung der Geschwindigkeit verwenden kann, wie zum Beispiel sich zeitlich ändernde interaurale Zeit- und Pegeldifferenzen der beiden Ohrsignale oder aber Frequenzänderungen, die auch als Dopplereffekt bekannt sind.

In der Präsentation soll ein Überblick über die wichtigsten akustischen Cues zur Geschwindigkeitswahrnehmung bei normalhörenden Personen gegeben werden und dabei auch auf die Gewichtung der einzelnen akustischen Merkmale für die menschliche Wahrnehmung der Geschwindigkeit eingegangen werden.

Literatur:

Carlile, S., Best, C. (2002). Discrimination of sound velocity in human listeners. J. Acoust. Soc. Am., 111, 1026 – 1035

Jorasz, U., Dooley, G. (1996). The perceptibility of the frequency drop caused by the Doppler effect for simulated sound source motion in the median plane”, Archives of Acoustics, 21, 149 – 157

Kaczmarek, T. (2005). Auditory perception of sound source velocity”, J. Acoust. Soc. Am., 117(5), 3149 – 3156.

Lutfi, R., Wang, W. (1999). Correlational analysis of cues for the discrimination of auditory motion”, J. Acoust. Soc. Am., 106, 919 – 928

Automatische Umgebungsklassifizierung in Hörgeräten

Betreuer:Norbert Kolotzek, M.Sc.
Beschreibung:

Während dem normalen Alltag sind wir vielen, unterschiedlichen Hörsituationen ausgesetzt., wie z.B. eine Unterhaltung auf offener Straße, einem Telefonat im ruhigen Büro, einem Abendessen im lauten Restaurant oder einem Konzert in einem sehr halligen Raum. Unser Gehör passt sich diesen Situationen an. Allerdings sind Schwerhörige darauf angewiesen, dass diese Aufgabe von ihrem Hörgerät übernommen wird. Moderne Hörgeräte sind in der Lage, durch Merkmalsextraktion und Klassifizierung verschiedene akustische Umgebungen automatisch zu erkennen und die jeweils verwendeten Algorithmen der Hörgeräte darauf anzupassen.

Ziel dieser Präsentation ist es, zunächst die verwendeten Merkmale und Klassifizierungsalgorithmen, die in Hörgeräten Verwendung finden, vorzustellen und zu erläutern. Es soll außerdem darauf eingegangen werden, welchen Einfluss die automatische Anpassung an die jeweilige Hörsituation auf die Hörwahrnehmung hat.

Literatur:

Karbasi, M., Ahadi, S.M., Bahmanian, M. (2011). Environmental Sound Classification using Spectral Dynamic Features. IEEE Information, Commun. and Signal Processing, 8, 1-5.

Lamarche, L., Giguère, C., Gueaieb, W., Aboulnasr, T., Othman, H. (2010). Adaptive environment classification system for hearing aids. J. Acoust. Soc. Am., 127, 3124-3135.

Rennies, J., Schepker, H., Holube, I., and Kollmeier, B. (2014). Listening effort and speech intelligibility in listening situationsaffected by noise and reverberation. J. Acoust. Soc. Am., 136, 2642–2653.

Xia, J., Xu, B., Pentony, S., Xu, J., and Swaminathan, J. (2018). Effects of reverberation and noise on speech intelligibility in normal-hearing and aided hearing-impaired listeners. J. Acoust. Soc. Am., 143, 1523-1533.

Binaurales Sprachverstehen in rauschbehafteten Umgebungen

Betreuer:Norbert Kolotzek, M.Sc.
Beschreibung:

Das Verstehen von Sprache ist eines der wichtigsten Bestandteile der Kommunikation. Das Sprachverstehen kann jedoch beeinträchtigt sein, zum Beispiel durch eine laute Umgebung mit vielen Störschallen. Es wurden bereits einige Modellansätze in der Literatur vorgestellt, die sowohl für Normalhörende als auch für Schwerhörige die Sprachverständlichkeit in bestimmten Hörsituationen prädizieren.

Ziel der Präsentation ist, einen Überblick über die physiologisch motivierten Modelle zur Sprachwahrnehmung zu geben und die erzielten Ergebnisse mit denen aus Hörexperimenten zu vergleichen.

Literatur:

Hauth, C.F., Brand, T. (2018). Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences., Trends in Hearing, 22, 1-10.

Lavandier, M. and Culling, J. F. (2010). Prediction of binaural speech intelligibility against noise in rooms. J. Acoust. Soc. Am., 127, 387-399.

Lavandier, M., Jelfs, S., Culling, J.F., Watkins, A.J., Raimond, A.P., and Makin, S.J. (2012) Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. J. Acoust. Soc. Am., 131, 218-231.

Rhebergen, K.S., Lyzenga, J., Dreschler, W.A., and Festen, J.M. (2010). Modelling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. J. Acoust. Soc. Am., 127, 1570-1583.

How do we perceive auditory distance?

Betreuer:Dr. Ľuboš Hládek
Beschreibung:

People perceive direction of sound due to tiny differences of sound levels and times of arrival of the sound to the ears. People have a sense of sound elevation due to spectral filtering of our ears, which produce specific sound coloration when the sound comes from above or from behind. However, how do we perceive auditory egocentric distance? That is the distance of the sound from the listener. What are the psychoacoustic cues for distance. Brain can use many cues to understand auditory distance of sounds but how brain uses these cues is a matter of live scientific debate. Here we focus on a few studies that focus on reverberation related cues.

The presentation should cover an overview of psychoacoustics of auditory distance perception and a more detailed description of the selected papers.

The presentation and the report will be in English language.

Literatur:

Bronkhorst, A. W., & Houtgast, T. (1999). Auditory distance perception in rooms. Nature, 397(11 February), 517–520.

Zahorik, P. (2002). Assessing auditory distance perception using virtual acoustics. The Journal of the Acoustical Society of America, 111(4), 1832.

Larsen, E., Iyer, N., Lansing, C. R., & Feng, A. S. (2008). On the minimum audible difference in direct-to-reverberant energy ratio. The Journal of the Acoustical Society of America, 124(1), 450–461.

Kolarik, A. J., Moore, B. C. J., Zahorik, P., Cirstea, S., & Pardhan, S. (2016). Auditory distance perception in humans: a review of cues, development, neuronal bases, and effects of sensory loss. Attention, Perception, and Psychophysics, 78(2), 373–395.

Gaze-controlled hearing aids

Betreuer:Dr. Ľuboš Hládek
Beschreibung:

Listening to speech in noisy restaurant is a challenge for hearing impaired people since they often need higher signal-to-noise ratio (SNR) than predicted by their audiogram. One of the ways to improve SNR for hearing aid users is to use directional microphones. They, however, come with a constraint, that they amplify sounds only directly in front of the listener. In conversations, people often want to listen in direction where they look because this may indicate listening attention.
This topic will focus on several previous studies that investigated possibilities of using gaze-controlled directional microphones for hearing aids, which were tested in different speech tests.

The presentation and the report will be in English language.

Literatur:

Hart, J., Onceanu, D., Sohn, C., Wightman, D., & Vertegaal, R. (2009). The Attentive Hearing Aid: Eye Selection of Auditory Sources for Hearing Impaired Users. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5726, pp. 19–35).

Roverud, E., Best, V., Mason, C. R., Streeter, T., & Kidd, G. (2017). Evaluating the Performance of a Visually Guided Hearing Aid Using a Dynamic Auditory-Visual Word Congruence Task. Ear and Hearing, 1–14.

Best, V., Roverud, E., Streeter, T., Mason, C. R., & Kidd, G. (2017). The Benefit of a Visually Guided Beamformer in a Dynamic Speech Task. Trends in Hearing, 21, 1–11.

Effect of head movements in sound localization

Betreuer:Dr. Ľuboš Hládek
Beschreibung:Self-motion provides essential cues for orienting behavior, such as sound localization and speech perception, in everyday acoustic scenes. For instance, self-motion helps to organize sounds in the median plane because of the pinna filtering effects. This has been researched in so called front-back illusion paradigm in which frequency specific sounds can restrict the ability to resolve the front-back signals, which leads to an illusory percept of a sound in front of a person, despite the sound source is behind. However, this happens only if for the sounds of specific spectral content and when the sound moves with the head in a specific way. Moreover, head movements are also important for speech perception because of the binaural and energetic effects. To what extent people exploit the head orientation in everyday listening effects is a matter of debate.

The presentation and the report will be in English language.

Literatur:

Brimijoin, W. O., & Akeroyd, M. A. (2012). The role of head movements and signal spectrum in an auditory front/back illusion. I-Perception, 3(3), 179–181. doi.org/10.1068/i7173sas

Brimijoin, W. O., McShefferty, D., & Akeroyd, M. A. (2012). Undirected head movements of listeners with asymmetrical hearing impairment during a speech-in-noise task. Hearing Research, 283(1–2), 162–168. doi.org/10.1016/j.heares.2011.10.009

Brimijoin, W. O. (2018). Angle-Dependent Distortions in the Perceptual Topology of Acoustic Space. Trends in Hearing, 22, 1–11.

Grange, J. A., & Culling, J. F. (2016). The benefit of head orientation to speech intelligibility in noise. The Journal of the Acoustical Society of America, 139(2), 703–712. doi.org/10.1121/1.4941655

Causal inference modeling of ventriloquism effect

Betreuer:Dr. Ľuboš Hládek
Beschreibung:Audio-visual spatial integration is a perceptual mechanism for events when a sound and a visual stimulus at two different locations are perceived as one object. The perception of such stimuli depends on the spatial and temporal proximity of stimuli and other perceptual parameters such as salience of the individual components of the audio-visual stimulus. One way of understanding this phenomenon is to assume that the brain combines these stimuli in the optimal way only in certain conditions. Causal inference modeling is a framework inspired by Bayesian statistics which takes into account perceptual noise of the underlying stimuli (or their salience) which inherently determines whether the two components of the AV complex will be perceived as one or two events.

The presentation and the report will be in English language.

Literatur:

Körding, K. P., Beierholm, U. R., Ma, W. J., Quartz, S. R., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception. PLoS One, 2(9), e943.

Wozny, D. R., & Shams, L. (2011). Computational characterization of visually induced auditory spatial adaptation. Frontiers in Integrative Neuroscience, 5(November), 75.

Odegaard, B., Wozny, D. R., & Shams, L. (2016). The effects of selective and divided attention on sensory precision and integration. Neuroscience Letters, 614, 24–28.

Mendonça, C., Mandelli, P., & Pulkki, V. (2016). Modeling the perception of audiovisual distance: Bayesian causal inference and other models. PLoS ONE, 11(12), 1–18.

Multiple sound source localization using microphone arrays

Betreuer:Dipl.-Ing. Matthieu Kuntz
Beschreibung:

Microphone arrays consist of several microphones that can be setup linearly, in a circle or on the surface of a sphere. They can be used for example to record spatial audio, to reproduce it binaurally or via loudspeaker arrays, or to filter a specific signal based on its location. Spatial filtering algorithms are present in most antenna systems, in order to enhance a specific signal in a noisy environment. For these algorithms, the location of the source has to be known beforehand, or estimated ‘blindly’, the microphone signals being the only source of information present. Many techniques have been developed for the localization of a single source, but do not give good results when several sources are present.

The goal of this presentation is to give an overview and compare the accuracy of different techniques used for multiple sound source localization, while focusing on circular microphone arrays.

This presentation will preferably be performed in English. The student who accepts this topic will not be rated regarding his/her English speaking ability. The student will however be marked on how well he/she understands the topic and how well the presentation is structured to clearly convey the desired message (as would be the case when presenting in any language).

Literatur:

Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A. (2013). Real-time multiple sound source localization and counting using a circular microphone array. IEEE/ACM Transactions on Audio, Speech and Language Processing, 21(10), 2193-2206.

Liu, H., Yang, B., Pang C. (2017). Multiple sound source localization based on TDOA clustering and multi-path matching pursuit. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 3241-3245.

Sundar, H., Sreenivas, T.V., Seelamantula, C.S. (2018). TDOA-based multiple acoustic source localization without association ambiguity. IEEE/ACM Transactions on Audio, Speech and Language Processing, 26(11), 1976-1990.

Anmeldung

Die Anmeldung zum Hauptseminar Audio-Informationsverarbeitung erfolgt über TUM-Online. Das Hauptseminar ist auf neun Teilnehmer beschränkt. Bei mehr Bewerbern kommen Sie zunächst auf die Warteliste.

hoch