Natural Language Processing for Q&A in Indigenous/Vernacular Languages

Te reo Māori, Malay, and Singlish, all play central roles in the culture and identity of the Māori and Singaporean communities. By revitalising and promoting the understanding of these languages, we can preserve and provide access to the vast indigenous and local knowledge they express.

Recent advances in data science and deep learning for natural language processing (NLP) have opened up exciting new possibilities in major languages such as English and Chinese. However, there is no software system that can systematically integrate listening to, speaking, and reading in less widely used languages such as te reo Māori, Malay, or Singlish.

This project will develop novel speech processing and NLP techniques for machine translation and Q&A to create an intelligent conversational Q&A system in te reo Māori, Malay, and Singlish. The multi-lingual Q&A system integrates listening to, speaking, and reading te reo Māori/Malay/Singlish to benefit learners and users.  It will explore ways in which data science and AI systems can help gain knowledge expressed in these languages effectively, thereby broadening both the preservation and impact of our cultural heritage with technology.

This project will be led by Prof See-Kiong Ng and be conducted by a collaboration between the Singapore data science research team at the Institute of Data Science  and the School of Computing at the National University of Singapore (NUS), and the multi-institutional NZ data science research team led by renowned professors from Massey University, the University of Auckland, and the University of Waikato. The jointly developed intelligent conversational Q&A system will provide state-of-the-art speech recognition, machine translation, Q&A, text-to-speech synthesis for language learning and revitalisation efforts, and more effective public systems and outreach services. The application then has the potential to extend to the many other indigenous and vernacular languages in the world.

Singapore Host Institution: National University of Singapore (NUS)
New Zealand Host Institution: Massey University
Principal Investigators: Professor See Kiong Ng, School of Computing, NUS and Professor Ruili Wang, Massey University
Co-Investigators: Associate Professor Stéphane Bressan, School of Computing, NUS (SG), Professor Michael Witbrock, The University of Auckland (NZ), Associate Professor Te Taka Keegan, The University of Waikato (NZ)