Hearing aids in the era of foundation models

zaud000063 10.3205/zaud000063 urn:nbn:de:0183-zaud0000639 Short Report Hearing aids in the era of foundation models Hörgeräte im Zeitalter der Grundmodelle Triantafyllopoulos Triantafyllopoulos Andreas A

Department for Clinical Medicine, TUM School of Medicine and Health, Klinikum rechts der Isar (Public Sector Institution), Technical University of Munich, Ismaninger Str. 22, 81675 Munich, GermanyCHI – Chair of Health Informatics, Technical University of Munich, MRI, Munich, GermanyMCML – Munich Center for Machine Learning, Munich, Germany

andreas.triantafyllopoulos@tum.de author Schuller Schuller Björn W. BW

CHI – Chair of Health Informatics, Technical University of Munich, MRI, Munich, Germany MCML – Munich Center for Machine Learning, Munich, Germany MDSI – Munich Data Science Institute, Munich, Germany GLAM – Group on Language, Audio & Music, Imperial College, London, United Kingdom

author German Medical Science GMS Publishing House

Düsseldorf

610 20241217 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). 2628-9083 6 GMS Zeitschrift für Audiologie - Audiological Acoustics GMS Z Audiol (Audiol Acoust) 28 Die jüngste Einführung von Grundmodellen (FMs) hat die Welt im Sturm erobert. Von großen Sprachmodellen (LLMs) bis hin zur Analyse und Generierung von Bild- und Audiodateien haben FMs einen Paradigmenwechsel in der künstlichen Intelligenz (KI) hervorgerufen, bei dem Anwender vom herkömmlichen überwachten maschinellen Lernen zu Textanfragen und kontextbezogenem Lernen übergehen. Dies hat ebenfalls Auswirkungen auf die Hörgeräteforschung, insbesondere auf die Verwendung solcher Modelle zur Geräuschunterdrückung und zur Verbesserung der Sprachqualität. Obwohl die Anwendung von FMs in diesen Kontext bisher minimal bis nicht existent ist, hauptsächlich aufgrund der prohibitiven Rechenkomplexität der Modelle, gibt es dennoch Möglichkeiten, von den Fortschritten durch FMs auf indirekte Weise zu profitieren. Wir überprüfen diese Ansätze in dem vorliegenden Beitrag. The recent introduction of foundation models (FMs) has taken the world by storm. Ranging from large language models (LLMs) to image and audio analysis and generation, FMs have introduced a new paradigm in artificial intelligence (AI), one where practitioners transition from standard supervised machine learning to prompting and in-context learning. This has implications for hearing aid research, and specifically for the use of such models for noise attenuation and speech enhancement. Even though the uptake of FMs is minimal to non-existent for this application domain, mainly due to the prohibitive computational complexity of those models, there are nevertheless ways to benefit from FM advances in an indirect way. We review these approaches in the present contribution. IntroductionHearing aids aim to compensate for hearing loss by processing the input audio stream and manipulating it in such away so as to partially recover lost hearing. While recovering hearing covers multiple facets of the human experience, such as being able to partake in conversations or enjoy music, recovering the ability to understand human speech is understandably one of the main prior</PlainText></TextGroup>ities for hearing aid devices. Their key operating principles leverage advances in a wide array of fields, from physics, to electronics, (psycho)acoustics, digital signal processing (DSP), statistics, and – increasingly – artificial intelligence (AI) <TextLink reference="1"></TextLink>. In particular, AI features prominently as a complement, or even substitute, to DSP components <TextLink reference="2"></TextLink>, primarily the ones tackling noise reduction and attenuation <TextLink reference="3"></TextLink>, <TextLink reference="4"></TextLink>. In a new frontier, <Mark2>foundation models</Mark2> (FMs) have appeared as a novel class of models in the broader AI community <TextLink reference="5"></TextLink>, but have not yet found their way into hearing aid research. Foundation models (FMs) differ from traditional deep neural networks (DNNs) in that they exhibit <Mark2>emergent properties</Mark2>, i.e., capabilities that they were not explicitly trained to perform but that can be uncovered through the successful use of <Mark2>prompting</Mark2> <TextLink reference="6"></TextLink>.</Pgraph><Pgraph>Prompts can be thought of as a mixture of <Mark2>cues</Mark2> and <Mark2>instructions</Mark2> provided to a model. Instructions pertain to the task that should be solved; cues add additional context that can be leveraged to improve performance. For instance, a large language model (LLM) might be asked to classify the sentiment of a target sentence (“The weather is nice today.”). On top of the sentence to be classified, the input query must be prefaced with an instruction (“Predict the sentiment of the following sentence.”) and can be further constrained according to the specifications of the user (“Predict the sentiment of the following sentence. Select one from positive, negative, neutral. Answer in one word.”).</Pgraph><Pgraph>Auditory FMs operate on similar principles as LLMs, albeit with audio as a primary or secondary input <TextLink reference="7"></TextLink>. Text prompts now become audio prompts. The input may be an audio snippet while instructions are typically left as text; this provides an intuitive interface for downstream users. A typical application is audio manipulation (e.g., for denoising, inpainting, or voice transformation). The input query corresponds to an <Mark2>[AUDIO]</Mark2> snippet, with <Mark2>[AUDIO]</Mark2> symbolising a discretised and compressed representation of the audio that needs to be manipulated by the foundation model. This is then prefaced/followed by a textual instruction and additional cues. For example, the input may be: “The following audio consists of one target male speaker and background music. Remove all background music and keep all target speech intact. <Mark2>[AUDIO]</Mark2>.” Importantly, this interleaving between audio and language (or even other modalities) relies in mapping all inputs to a joint embedding space that the FM can process. As most existing off-the-shelf FMs are LLMs, this is typically achieved by adding an audio module that process the input <Mark2>[AUDIO]</Mark2>, followed by a mapping module that translates the output of the audio module to the space shared by text (i.e., <Mark2>tokens)</Mark2>, and the entire prompt is fed into an FM for further processing. Similarly, the output is treated as an audio stream (and may or may not be fed into additional audio modules for decoding).</Pgraph></TextBlock> <TextBlock linked="yes" name="Foundation models for hearing aids"> <MainHeadline>Foundation models for hearing aids</MainHeadline><Pgraph>Having reviewed the basic principles of FMs, we now turn to the critical question of how they can be used in the context of hearing aids. At first glance, computational complexity is an obvious factor prohibiting their uptake. FMs employ billions of model parameters <TextLink reference="5"></TextLink>, with t<TextGroup><PlainText>he “sma</PlainText></TextGroup>ll” versions of those models typically featuring 7<TextGroup><PlainText>-bi</PlainText></TextGroup>llion parameters. Even at the most extreme level of quantisation currently possible for AI models (4-bits), this still results in 3.5 GB of memory just to load the model, without accounting for the storage of intermediate computations, or, indeed, the runtime to pass an input through the model. Obviously, the deployment of such a model in a hearing aid is a long time away. However, there are ways to leverage FMs which circumvent the hardware showstopper discussed above. The key insight lies in offloading those compute-intensive models to external devices. Given the proliferation of smartphones and their integration in the hearing-aid ecosystem, as well as the emergence of new, distributed sensing paradigms like “Auracast” <TextLink reference="8"></TextLink>, there are nowadays complementary devices that can record and process audio with significantly higher compute capabilities than hearing-aid devices <TextLink reference="9"></TextLink>.</Pgraph><Pgraph>An overview of the process is shown in Figure 1 <ImgLink imgNo="1" imgType="figure"/>. In a nutshell, FMs are employed to do what they excel at – general world understanding, which is then co-opted to improve denoising performance. The key motivation for using FMs like that is that the world is slowly-changing, at least in the terms relevant for a hearing aid user. Coupled with the fact that most people nowadays carry a smartphone connected to the Internet, this allows for offloading the running of the FM to a device outside the hearing aid. This can relay the necessary information – essentially a model of the surrounding environment – from the FM back to the hearing aid, which can then utilise that information to improve its signal processing. While exotic at first glance, this procedure can enable us to leverage the advances in FMs without waiting for accompanying improvements in hardware. In the next two sections, we conceptualise how that might become possible, beginning with an introduction of audio FMs, and proceeding with a perspective on how they can be employed in hearing aid practice. </Pgraph><Pgraph>Naturally, running these models externally introduces an additional latency that precludes online usage. However, there lies immense potential in their ability to understand the underlying environment even in an offline setting. In particular, the ultimate goal of a hearing aid is hearing loss compensation, which, when it comes to speech understanding, is partially achieved through speech enhancement and noise attenuation. The latter is contingent on the type of noise prevalent. Oftentimes, this noise is quasi-stationary, as in the typical examples of babble noise, restaurants, transportations, or music. These types of noises change slowly – slowly enough that a large foundation model only needs to sense them sporadically (e.g., every few seconds or even minutes). They can be applied on periodic recordings of the environment to generate a detailed characterisation of it which can be provided to the hearing aid and condition its denoising algorithm.</Pgraph><Pgraph>Examples of this type of conditioning have already proven successful for general speech denoising <TextLink reference="10"></TextLink>, whereby a <Mark2>fingerprint</Mark2> of the background noise is used as additional information to improve noise attenuation. This fingerprint is processed by a separate encoder – which, in principle, can be more complex than the main branch as it only needs to be run rarely – and its output is used for the conditioning of a main denoising network. While previous works have used standard neural networks for this fingerprint encoder, performance can be largely improved by relying on the more advanced class of FMs now available. Similarly, this process can be used to enrol the target speaker to be enhanced – a form of personalisation that is well-known in the literature.</Pgraph><Pgraph>Beyond automatically understanding the background audio type, however, FMs can be used to foster a more intuitive and adaptive interaction with the user of the hearing aid. As mentioned, auditory FMs can seamlessly combine audio with linguistic queries – the latter can be provided in real time by the user, who could dynamically adjust the parameters of the hearing aid to match their current need. We note that such “profiles” are already available as part of smartphone apps that allow for the configuration of a hearing aid – however, the use of descriptive, natural language can provide a more timely and granular adaptation, as well as introduce a trial-and-error component, with the user iterating through queries.</Pgraph><Pgraph>In summary, we expect FMs to gradually make their way into the next generations of hearing aids as supplements that run on external devices. They have the capacity to serve as a powerful sidekick to the speech enhancement and denoising capabilities of hearing aids, thus paving the way for better hearing loss compensation.</Pgraph></TextBlock> <TextBlock linked="yes" name="Notes"> <MainHeadline>Notes</MainHeadline><SubHeadline>Conference presentation</SubHeadline><Pgraph>This contribution was presented at the 26<Superscript>th</Superscript> Annual Conference of the German Society of Audiology.</Pgraph><SubHeadline>Competing interests</SubHeadline><Pgraph>The authors declare that they have no competing interests.</Pgraph></TextBlock> <References linked="yes"> <Reference refNo="5"> <RefAuthor>Bommasani R</RefAuthor> <RefAuthor>Hudson DA</RefAuthor> <RefAuthor>Adeli E</RefAuthor> <RefAuthor>Altman R</RefAuthor> <RefAuthor>Arora S</RefAuthor> <RefAuthor>von Arx S</RefAuthor> <RefAuthor>Bernstein MS</RefAuthor> <RefAuthor>Bohg J</RefAuthor> <RefAuthor>Bosselut A</RefAuthor> <RefAuthor>Brunskill E</RefAuthor> <RefAuthor>Brynjolfsson E</RefAuthor> <RefAuthor>Buch S</RefAuthor> <RefAuthor>Card D</RefAuthor> <RefAuthor>Castellon R</RefAuthor> <RefAuthor>Niladri C</RefAuthor> <RefAuthor>Chen A</RefAuthor> <RefAuthor>Creel K</RefAuthor> <RefAuthor>Davis JQ</RefAuthor> <RefAuthor>Dorottya D</RefAuthor> <RefAuthor>Demszky</RefAuthor> <RefAuthor>Donahue C</RefAuthor> <RefAuthor>Doumbouya M</RefAuthor> <RefAuthor>Durmus E</RefAuthor> <RefAuthor>Ermon S</RefAuthor> <RefAuthor>Etchemendy J</RefAuthor> <RefAuthor>Ethayarajh K</RefAuthor> <RefAuthor>Fei-Fei L</RefAuthor> <RefAuthor>Finn C</RefAuthor> <RefAuthor>Gale T</RefAuthor> <RefAuthor>Gillespie L</RefAuthor> <RefAuthor>Goel K</RefAuthor> <RefAuthor>Goodman N</RefAuthor> <RefAuthor>Grossman S</RefAuthor> <RefAuthor>Guha N</RefAuthor> <RefAuthor>Hashimoto T</RefAuthor> <RefAuthor>Henderson P</RefAuthor> <RefAuthor>Hewitt J</RefAuthor> <RefAuthor>Ho DE</RefAuthor> <RefAuthor>Hong J</RefAuthor> <RefAuthor>Hsu K</RefAuthor> <RefAuthor>Huang J</RefAuthor> <RefAuthor>Icard T</RefAuthor> <RefAuthor>Jain S</RefAuthor> <RefAuthor>Jurafsky D</RefAuthor> <RefAuthor>Kalluri P</RefAuthor> <RefAuthor>Karamcheti S</RefAuthor> <RefAuthor>Keeling G</RefAuthor> <RefAuthor>Khani F</RefAuthor> <RefAuthor>Khattab O</RefAuthor> <RefAuthor>Koh PW</RefAuthor> <RefAuthor>Krass M</RefAuthor> <RefAuthor>Krishna R</RefAuthor> <RefAuthor>Kuditipudi R</RefAuthor> <RefAuthor>Kumar A</RefAuthor> <RefAuthor>Ladhak F</RefAuthor> <RefAuthor>Lee M</RefAuthor> <RefAuthor>Lee T</RefAuthor> <RefAuthor>Leskovec J</RefAuthor> <RefAuthor>Levent I</RefAuthor> <RefAuthor>Li XL</RefAuthor> <RefAuthor>Li X</RefAuthor> <RefAuthor>Ma T</RefAuthor> <RefAuthor>Malik A</RefAuthor> <RefAuthor>Manning CD</RefAuthor> <RefAuthor>Mirchandani S</RefAuthor> <RefAuthor>Mitchell E</RefAuthor> <RefAuthor>Munyikwa Z</RefAuthor> <RefAuthor>Nair S</RefAuthor> <RefAuthor>Narayan A</RefAuthor> <RefAuthor>Narayanan D</RefAuthor> <RefAuthor>Newman B</RefAuthor> <RefAuthor>Nie A</RefAuthor> <RefAuthor>Niebles JC</RefAuthor> <RefAuthor>Nilforoshan H</RefAuthor> <RefAuthor>Nyarko J</RefAuthor> <RefAuthor>Ogut G</RefAuthor> <RefAuthor>Orr L</RefAuthor> <RefAuthor>Papadimitriou I</RefAuthor> <RefAuthor>Park JS</RefAuthor> <RefAuthor>Piech C</RefAuthor> <RefAuthor>Portelance E</RefAuthor> <RefAuthor>Potts C</RefAuthor> <RefAuthor>Raghunathan A</RefAuthor> <RefAuthor>Reich R</RefAuthor> <RefAuthor>Ren H</RefAuthor> <RefAuthor>Rong F</RefAuthor> <RefAuthor>Roohani Y</RefAuthor> <RefAuthor>Ruiz C Ryan J</RefAuthor> <RefAuthor>Ré C</RefAuthor> <RefAuthor>Sadigh D</RefAuthor> <RefAuthor>Sagawa S</RefAuthor> <RefAuthor>Santhanam K</RefAuthor> <RefAuthor>Shih A</RefAuthor> <RefAuthor>Srinivasan K</RefAuthor> <RefAuthor>Tamkin A</RefAuthor> <RefAuthor>Taori R</RefAuthor> <RefAuthor>Thomas AW</RefAuthor> <RefAuthor>Tramèr F</RefAuthor> <RefAuthor>Wang RE</RefAuthor> <RefAuthor>Wang W</RefAuthor> <RefAuthor>Wu B</RefAuthor> <RefAuthor>Wu J</RefAuthor> <RefAuthor>Yuhuai W</RefAuthor> <RefAuthor>Xie SM</RefAuthor> <RefAuthor>Yasunaga M</RefAuthor> <RefAuthor>You J</RefAuthor> <RefAuthor>Zaharia M</RefAuthor> <RefAuthor>Zhang M</RefAuthor> <RefAuthor>Zhang T</RefAuthor> <RefAuthor>Zhang X</RefAuthor> <RefAuthor>Zhang Y</RefAuthor> <RefAuthor>Zheng L</RefAuthor> <RefAuthor>Zhou K</RefAuthor> <RefAuthor>Liang P</RefAuthor> <RefTitle>On the Opportunities and Risks of Foundation Models</RefTitle> <RefYear>2021</RefYear> <RefJournal>ArXiv</RefJournal> <RefPage>arXiv:2108.07258</RefPage> <RefTotal>Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Niladri C, Chen A, Creel K, Davis JQ, Dorottya D, Demszky, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman N, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass M, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani S, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko J, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani Y, Ruiz C Ryan J, Ré C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan K, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Yuhuai W, Xie SM, Yasunaga M, You J, Zaharia M, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P. On the Opportunities and Risks of Foundation Models. ArXiv. 2021:arXiv:2108.07258. DOI: 10.48550/arXiv.2108.07258</RefTotal> <RefLink>https://doi.org/10.48550/arXiv.2108.07258</RefLink> </Reference> <Reference refNo="1"> <RefAuthor>Dillon H</RefAuthor> <RefTitle></RefTitle> <RefYear>2001</RefYear> <RefBookTitle>Hearing aids</RefBookTitle> <RefPage></RefPage> <RefTotal>Dillon H. Hearing aids. New York: Thieme Medical Publishers Inc.; 2001.</RefTotal> </Reference> <Reference refNo="3"> <RefAuthor>Hamacher V</RefAuthor> <RefAuthor>Chalupper J</RefAuthor> <RefAuthor>Eggers J</RefAuthor> <RefAuthor>Fischer E</RefAuthor> <RefAuthor>Kornagel U</RefAuthor> <RefAuthor>Puder H</RefAuthor> <RefAuthor>Rass U</RefAuthor> <RefTitle>Signal processing in high-end hearing aids: State of the art, challenges, and future trends</RefTitle> <RefYear>2005</RefYear> <RefJournal>EURASIP Journal on Advances in Signal Processing</RefJournal> <RefPage>1-15</RefPage> <RefTotal>Hamacher V, Chalupper J, Eggers J, Fischer E, Kornagel U, Puder H, Rass U. Signal processing in high-end hearing aids: State of the art, challenges, and future trends. EURASIP Journal on Advances in Signal Processing. 2005;18:1-15. DOI: 10.1155/ASP.2005.2915</RefTotal> <RefLink>https://doi.org/10.1155/ASP.2005.2915</RefLink> </Reference> <Reference refNo="9"> <RefAuthor>Kaufmann TB</RefAuthor> <RefAuthor>Foroogozar M</RefAuthor> <RefAuthor>Liss J</RefAuthor> <RefAuthor>Berisha V</RefAuthor> <RefTitle>Requirements For Mass Adoption Of Assistive Listening Technology By The General Public</RefTitle> <RefYear>2023</RefYear> <RefBookTitle>Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)</RefBookTitle> <RefPage>1-5</RefPage> <RefTotal>Kaufmann TB, Foroogozar M, Liss J, Berisha V. Requirements For Mass Adoption Of Assistive Listening Technology By The General Public. In: IEEE, editor. Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). Piscataway, NJ: IEEE; 2023. p. 1-5. DOI: 10.1109/ICASSPW59220.2023.10193566</RefTotal> <RefLink>https://doi.org/10.1109/ICASSPW59220.2023.10193566</RefLink> </Reference> <Reference refNo="7"> <RefAuthor>Liu H</RefAuthor> <RefAuthor>Chen Z</RefAuthor> <RefAuthor>Yuan Y</RefAuthor> <RefAuthor>Mei X</RefAuthor> <RefAuthor>Liu X</RefAuthor> <RefAuthor>Mandic D</RefAuthor> <RefAuthor>Wang W</RefAuthor> <RefAuthor>Plumbley MD</RefAuthor> <RefTitle>AudioLDM: Text-to-Audio Generation with Latent Diffusion Models</RefTitle> <RefYear>2023</RefYear> <RefBookTitle>Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA</RefBookTitle> <RefPage>21450-74</RefPage> <RefTotal>Liu H, Chen Z, Yuan Y, Mei X, Liu X, Mandic D, Wang W, Plumbley MD. AudioLDM: Text-to-Audio Generation with Latent Diffusion Models. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, editors. Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. MLResearchPress; 2023. p. 21450-74</RefTotal> </Reference> <Reference refNo="10"> <RefAuthor>Liu S</RefAuthor> <RefAuthor>Keren G</RefAuthor> <RefAuthor>Parada-Cabaleiro E</RefAuthor> <RefAuthor>Schuller B</RefAuthor> <RefTitle>N-HANS: A neural network-based toolkit for in-the-wild audio enhancement</RefTitle> <RefYear>2021</RefYear> <RefJournal>Multimed Tools App</RefJournal> <RefPage>28365-89</RefPage> <RefTotal>Liu S, Keren G, Parada-Cabaleiro E, Schuller B. N-HANS: A neural network-based toolkit for in-the-wild audio enhancement. Multimed Tools App. 2021 Jul;80(6):28365-89. DOI: 10.1007/s11042-021-11080-y</RefTotal> <RefLink>https://doi.org/10.1007/s11042-021-11080-y</RefLink> </Reference> <Reference refNo="4"> <RefAuthor>Schröter H</RefAuthor> <RefAuthor>Rosenkranz T</RefAuthor> <RefAuthor>Escalante-B AN</RefAuthor> <RefAuthor>Maier A</RefAuthor> <RefTitle>Low latency speech enhancement for hearing aids using deep filtering</RefTitle> <RefYear>2022</RefYear> <RefJournal>IEEE/ACM Transactions on Audio, Speech, and Language Processing</RefJournal> <RefPage>2716-28</RefPage> <RefTotal>Schröter H, Rosenkranz T, Escalante-B AN, Maier A. Low latency speech enhancement for hearing aids using deep filtering. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2022;30:2716-28. DOI: 10.1109/TASLP.2022.3198548</RefTotal> <RefLink>https://doi.org/10.1109/TASLP.2022.3198548</RefLink> </Reference> <Reference refNo="2"> <RefAuthor>Wang D</RefAuthor> <RefTitle>Deep Learning Reinvents the Hearing Aid: Finally, wearers of hearing aids can pick out a voice in a crowded room</RefTitle> <RefYear>2017</RefYear> <RefJournal>IEEE Spectr</RefJournal> <RefPage>32-7</RefPage> <RefTotal>Wang D. Deep Learning Reinvents the Hearing Aid: Finally, wearers of hearing aids can pick out a voice in a crowded room. IEEE Spectr. 2017 Mar;54(3):32-7. DOI: 10.1109/MSPEC.2017.7864754</RefTotal> <RefLink>https://doi.org/10.1109/MSPEC.2017.7864754</RefLink> </Reference> <Reference refNo="6"> <RefAuthor>Wei J</RefAuthor> <RefAuthor>Tay Y</RefAuthor> <RefAuthor>Bommasani R</RefAuthor> <RefAuthor>Raffel C</RefAuthor> <RefAuthor>Zoph B</RefAuthor> <RefAuthor>Borgeaud S</RefAuthor> <RefAuthor>Yogatama D</RefAuthor> <RefAuthor>Bosma M</RefAuthor> <RefAuthor>Zhou D</RefAuthor> <RefAuthor>Metzler D</RefAuthor> <RefAuthor>Chi EH</RefAuthor> <RefAuthor>Hashimoto T</RefAuthor> <RefAuthor>Vinyals O</RefAuthor> <RefAuthor>Liang P</RefAuthor> <RefAuthor>Dean J</RefAuthor> <RefAuthor>Fedus W</RefAuthor> <RefTitle>Emergent Abilities of Large Language Models</RefTitle> <RefYear>2022</RefYear> <RefJournal>Transactions on Machine Learning Research</RefJournal> <RefPage>1-30</RefPage> <RefTotal>Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, Yogatama D, Bosma M, Zhou D, Metzler D, Chi EH, Hashimoto T, Vinyals O, Liang P, Dean J, Fedus W. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. 2022 Jun 26;(08):1-30. DOI: 10.48550/arXiv.2206.07682</RefTotal> <RefLink>https://doi.org/10.48550/arXiv.2206.07682</RefLink> </Reference> <Reference refNo="8"> <RefAuthor>Bluetooth Market Development</RefAuthor> <RefTitle></RefTitle> <RefYear>2024</RefYear> <RefBookTitle>An Overview of Auracast™ Broadcast Audio</RefBookTitle> <RefPage></RefPage> <RefTotal>Bluetooth Market Development. An Overview of Auracast™ Broadcast Audio. Bluetooth SIG, Inc.; 2024 [last accessed 2024 Sep 16]. Available from: https://www.bluetooth.com/bluetooth-resources/overview-of-auracast-broadcast-audio/</RefTotal> <RefLink>https://www.bluetooth.com/bluetooth-resources/overview-of-auracast-broadcast-audio/</RefLink> </Reference> </References> <Media> <Tables> <NoOfTables>0</NoOfTables> </Tables> <Figures> <Figure format="png" height="426" width="756"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Figure 1: Schematic overview of how multimodal foundation models can be used in conjunction with hearing aids. The auditory component of the FM continuously monitors the background audio, while the textual component receives user feedback. The FM processes both types of inputs periodically and sends information to the hearing aid, which uses it to adapt its denoising process.</Mark1></Pgraph></Caption> </Figure> <NoOfPictures>1</NoOfPictures> </Figures> <InlineFigures> <NoOfPictures>0</NoOfPictures> </InlineFigures> <Attachments> <NoOfAttachments>0</NoOfAttachments> </Attachments> </Media> </OrigData> </GmsArticle>