12月16日:Divesh Srivastava
发布时间:2017-12-10  阅读次数:584

 

报告题目:Data Glitches = Constraint Violations – Empirical Explanations

报告人: Divesh Srivastava, ACM Fellow, the head of Database Research,

AT&T Labs-Research

主持人:Professor Xuemin Lin

报告时间:2017年12月16日09:50-10:40

报告地点:华东师大中北校区理科大楼A510

 

报告摘要:

Data glitches are unusual observations that do not conform to data quality expectations, be they semantic or syntactic, logical or statistical.  By naively applying integrity constraints, potentially large amounts of data could be flagged as being violations. Ignoring or repairing significant amounts of the data could fundamentally bias the results and conclusions drawn from analyses. In the context of Big Data where large volumes and varieties of data from disparate sources are integrated, it is likely that significant portions of these violations are actually legitimate usable data.  We conjecture that empirical glitch explanations – concise characterizations of subsets of violating data – could be used to (a) identify legitimate data and release them back into the pool of clean data, thereby reduce cleaning-related statistical distortion of the data; and (b) refine existing integrity constraints and generate improved domain knowledge.  We present a few real-world case studies in support of our conjecture, outline scalable techniques to address the challenges of discovering explanations, and demonstrate the utility of the explanations in reclaiming over 99% of the violating data.

 

报告人简介:

Divesh Srivastava is the head of Database Research at AT&T Labs-Research. He is a Fellow of the Association for Computing Machinery (ACM) and the managing editor of the Proceedings of the VLDB Endowment (PVLDB). His research interests and publications span a variety of topics in data management.  He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.

 

华东师范大学计算机科学与软件工程学院
www.sei.ecnu.edu.cn Copyright School of Computer Science and Software Engineering
院长信箱:yuanzhang@sei.ecnu.edu.cn | 院办电话:021-62232550 | 学院地址:上海中山北路3663号理科大楼| 招聘信息