Great video. (1) the message about the data quality applies to all data analysis, including the simplest form of analysis. (2) it is not obvious the some of the examples such as AI chat bot producing incorrect answers is solely due to bad data, it may stem from an inadequate model too, and (3) there is a large literature in statistics and econometrics thinking about how to work with contaminated data, given that’s sometimes the best we can do. It will be interesting to see whether methods are developed in the AI/ML literature to work with bad data too in building these sophisticated LLMs, etc.
Great video. (1) the message about the data quality applies to all data analysis, including the simplest form of analysis. (2) it is not obvious the some of the examples such as AI chat bot producing incorrect answers is solely due to bad data, it may stem from an inadequate model too, and (3) there is a large literature in statistics and econometrics thinking about how to work with contaminated data, given that’s sometimes the best we can do. It will be interesting to see whether methods are developed in the AI/ML literature to work with bad data too in building these sophisticated LLMs, etc.