The significance of data excellence has long been under-emphasized in Machine Learning (ML) research. Historically, ML has focused primarily on developing effective and efficient models on a given benchmark dataset. Hardly as much emphasis is given to data, arguably one of the most important ingredients of ML. Spending the majority of their time cleaning the data, data scientists dealing with real-world datasets may argue that a small set of quality data could be more crucial than the ML model choice.
Zhaoxuan works on data-centric technologies that boost data quality and data excellence for impactful ML. He developed data valuation methods to assess a dataset’s contribution to the predictive performance, which can be useful for data cleaning, summarization and selection. Data valuation is at the same time a core component in Collaborative Machine Learning (CML) which encourages the participation of multiple organizations to solve a common ML task. Designing mechanisms that incentivize cooperation among self-interested parties could increase the overall welfare of society and potentially change the landscape of interactions among organizations that are practicing ML techniques in society.