AI3611 Deep Learning: A Practical Course on Perception and Cognition
basic information
Associate Professor
X-LANCE LABDepartment of Computer Science and Engineering
Shanghai Jiao Tong University
Address : 3-225 SEIEE Building, 800 Dongchuan Road, Shanghai 200240, China
Email : mengyuewu@sjtu.edu.cn
Teaching
Research
SJTU X-LANCE Lab 上海交通大学跨媒体语言智能实验室丰富音频研究组
Environment Sound:
- Sound event and scene detection
- Audio caption, bridging the gap between audio analysis and natural language description
- Audio-visual event detection
Human Speech: medical application
- Speech emotion analysis
- Depression/parkinson’s/Alzheimer’s disease detection
- Acoustic-based disease diagnosis, e.g. coughing, voice, heartsound…
Activities
Fellow, 123
Member, 123
Publications
Selected Journal Papers
- Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu and Kai Yu. OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue. Transactions of the Association for Computational Linguistics, 2022, zc825-chen-tacl2022.pdf
- Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu and Kai Yu. Voice activity detection in the wild: A data-driven approach using teacher-student training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1542-1555, 2021, hedi7-dinkel-taslp2021-2.pdf
- Heinrich Dinkel, Mengyue Wu and Kai Yu. Towards Duration Robust Weakly Supervised Sound Event Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 887-900, 2021, hedi7-dinkel-taslp2021.pdf
Selected Conference Papers
- Guangwei Li, Xuenan Xu, Mengyue Wu and Kai Yu. Category-Adapted Sound Event Enhancement with Weakly Labeled Data. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 851-855, category-adapted_sound_event_enhancement_with_weakly_labeled_data.pdf
- Guangwei Li, Xuenan Xu, Mengyue Wu and Kai Yu. Navigating Audio-Visual Event Detection Across Mismatched Modalities. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 1975-1979, navigating_audio-visual_event_detection_across_mismatched_modalities.pdf
- Siyu Lou, Xuenan Xu, Mengyue Wu and Kai Yu. Audio-Text Retrieval in Context. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 4793-4797, syl92-lou-icassp22.pdf
- Xuenan Xu, Mengyue Wu and Kai Yu. Diversity-controllable and Accurate Audio Captioning Based on Neural Condition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 971-975, xnx98-xu-icassp22.pdf
- Wen Wu, Mengyue Wu and Kai Yu. Climate and Weather: Inspecting Depression Detection via Emotion Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 6262-6266, myw19-wu-icassp22-1.pdf
- Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu and Kenny Q. Zhu. Can Audio Captions Be Evaluated with Image Caption Metrics? IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore, 2022, 981-985, myw19-wu-icassp22-2.pdf
- Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Zhu. Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), syc20-chen-emnlp22.pdf
- Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Zhu. Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), syc20-chen-ijcai22.pdf
- Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, and Kai Yu. 2022. D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2438–2459, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. (EMNLP 2022), ybw00-yao-emnlp22.pdf
- Pingyue Zhang, Mengyue Wu, Heinrich Dinkel and Kai Yu. DEPA: Self-Supervised Audio Embedding for Depression Detection. In Proceedings of the 29th ACM International Conference on Multimedia (ACM-MM), Virtual Event, China, 2021, 135-143, myw19-wu-mm2021.pdf
- Zhiling Zhang, Zelin Zhou, Haifeng Tang, Guangwei Li, Mengyue Wu and Kenny Q. Zhu. Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), Queensland, Australia, 2021, 3652-3656, myw19-wu-cikm2021.pdf
- Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu. A Lightweight Framework for Online Voice Activity Detection in the Wild. Proc. Interspeech 2021, 371-375, doi: 10.21437/Interspeech.2021-1977, xnx98-xu-is2021.pdf
- Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu and Kai Yu. Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3063–3074, August 1–6, 2021, 2021.findings-acl.270.pdf
- Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Zeyu Xie and Kai Yu. Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Ontario, Canada, 2021, 905-909, xnx98-xu-icassp21-1.pdf
- Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu. Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Ontario, Canada, 2021, 606-610, xnx98-xu-icassp21-2.pdf
- Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu. Audio Caption in a Car Setting with a Sentence-Level Loss. In The 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, China, 2021, 1-5, xnx98-xu-iscslp2021.pdf
- Yefei Chen, Heinrich Dinkel, Mengyue Wu and Kai Yu. Voice activity detection in the wild via weakly supervised sound event detection. In 21st Annual Conference of the International Speech Communication Association (InterSpeech), Shanghai, China, 2020, 3665-3669, hedi7-dinkel-is2020.pdf
- Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu. A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning. In The 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), Tokyo, Japan, 2020, 225-229, xnx98-xu-dcase2020.pdf
- Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu and Weiyao Lin. Multiple Sound Sources Localization from Coarse to Fine. The European Conference on Computer Vision (ECCV), Glasgow, 2020, hedi7-dinkel-eccv2020.pdf
- Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu and Weiyao Lin. A Two-Stage Framework for Multiple Sound-Source Localization. CVPR Sight and Sound Workshop, 2020, hedi7_dinkel_cvprw2020.pdf
- Mengyue Wu, Heinrich Dinkel and Kai Yu. Audio Caption: Listen and Tell. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, 830-834, myw19-wu-icassp2019.pdf