John Rauser keynote: "Statistics Without the Agonizing Pain" -- Strata + Hadoop 2014

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ค. 2024
  • From the 2014 Strata Conference + Hadoop World in New York City.
    There are two essential skills for the data scientist: engineering and statistics. A great many data scientists are very strong engineers but feel like impostors when it comes to statistics. In this talk John will argue that the ability to program a computer gives you special access to the deepest and most fundamental ideas in statistics. John’s goal is to convince the non-statistician engineers in the audience that the road to statistical fluency is much, much shorter than they think.
    About John Rauser:
    John has been extracting value from large datasets for over 20 years at hedge funds, small data-driven startups, Amazon, and now Pinterest. He has deep experience in machine learning, data visualization, on-line experimentation, website performance and real-time fault analysis. An empiricist at heart, “Just do the experiment!” is his favorite call to arms.
    Watch more from Strata + Hadoop World 2014: goo.gl/UUfrR7
    Find out more about the conference: strataconf.com/stratany2014
    Don't miss an upload! Subscribe! goo.gl/szEauh
    Stay Connected to O'Reilly Media by Email - goo.gl/YZSWbO
    Follow O'Reilly Media:
    plus.google.com/+oreillymedia
    / oreilly
    / oreillymedia
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 18

  • @harmagician1
    @harmagician1 9 ปีที่แล้ว +7

    I would add that Permutation testing (resampling without replacement) is more suited to smaller samples and bootstrapping (resampling with replacement) is suitable for larger samples. Ronald Fisher used permutation testing logic to support his argument for the t-test. He urged people use the t-test because there were no devices available back then that could create the resamples needed to run a test. (Collingridge, D.S. (2012). A primer on quantitized data analysis and permutation testing. Journal of Mixed Method Research)

  • @jmrbear
    @jmrbear 9 ปีที่แล้ว +18

    Bill Venables To clarify, I think mathematical statistics is beautiful and useful, it's just a terrible way to introduce statistical thinking (given modern computational options). I had only 10 minutes, and so the talk had to stay totally on the rails, otherwise I'd have expanded on the origins of the analytical approach, why it was invented, and why it is still useful. For more on this topic see George Cobb's lovely paper, The Introductory Statistics Course: A Ptolemaic Curriculum: escholarship.org/uc/item/6hb3k0nz.

    • @WNVenables
      @WNVenables 9 ปีที่แล้ว

      You got their attention and raised the profile of statistics with an audience that probably needs to know much more about it than they currently do. That's an achievement I applaud. Many, if not most modern approaches to Statistics would use your route, actually. You do see my point, though, I hope. You can't really let it stop there. You need to understand what is going on as well as to appreciate it from a computational demonstration. And let's face it, it really isn't all that tough to do so once you know where you are going.
      My point about Bayesian approaches are that even to get to first base you need the concept of likelihood, which unless you have a theoretical understanding, really remains inaccessible.

    • @smarksc47
      @smarksc47 8 ปีที่แล้ว

      Cobb's paper is excellent. It goes well with your keynote. It should be noted, though, that many of the problems we have in statistics education are not unique to statistics, and can be found in many disciplines, courses and textbooks across academia. I believe this will change with time, but as we all know, change happens very slowly in academia.

  • @DaveJacoby
    @DaveJacoby 9 ปีที่แล้ว +5

    I've been saying for years that, when I was in CS, they changed the curriculum to make Statistics an elective, and I felt at the time I was dodging a bullet, but now I feel I shot myself in the foot. This video makes me think I'm more in the dodge category again.

    • @MarkSenn
      @MarkSenn 9 ปีที่แล้ว +2

      From your tagline at plus.google.com/+DaveJacoby/about:
      "If you can not measure it, you can not improve it.". Statistics is a tool to, among other things, intrepret what measurements really mean.

    • @DaveJacoby
      @DaveJacoby 9 ปีที่แล้ว +1

      Yeah, and his point is that there's a bunch of complexity in traditional statistics that we can sidestep and work with in ways we as programmers can more easily understand.

    • @diewithyourboots0n
      @diewithyourboots0n 9 ปีที่แล้ว +1

      I had stats my senior year in undergrad. And it was awful. I cried- literally. But you are a coder. I think you would do well in it.

    • @DaveJacoby
      @DaveJacoby 9 ปีที่แล้ว

      I should try it.

  • @madhousetoobah
    @madhousetoobah 9 ปีที่แล้ว +1

    I love this talk - spot on. Indeed solid understanding of the real stats behind things is highly desirable, but in terms of getting a better sense of the problem under consideration, this is a great approach for those with decent programming skills

    • @madhousetoobah
      @madhousetoobah 9 ปีที่แล้ว

      Here is a simple example of me following the approach recently - bit.ly/1wCcAnJ

  • @geocarvalhont
    @geocarvalhont 2 ปีที่แล้ว

    Absurdo, ri pra caralho e quando terminou eu achei que tinham sido 3 min de palestra.

  • @kelyr5368
    @kelyr5368 8 ปีที่แล้ว

    Hi all. Anyone can telle the name of the electronics tinkering toy of the girl at 11:17?

  • @autophile525i
    @autophile525i 5 ปีที่แล้ว +1

    How do you get the mosquitoes to drink the beer?

  • @giusbe8792
    @giusbe8792 2 ปีที่แล้ว +1

    Well, hell

  • @danieltorridoverde528
    @danieltorridoverde528 9 ปีที่แล้ว

    There is something that is not clearly said here. You don't need to remember the formula for the t-test, that a use for a computer. What is really important is the concept of density function and distribution function. In ten minutes or so one can explain those concepts and they will be applicable to many type of tests. Understanding the gist of what is an statistical test and how to interpreter the outcome is not difficult and you don't have to remember difficult formulas. That is what you see here is a straw man argument. If you only want to know the essence of statistical test, you can grasp it easily and one way to understand that is that formulas are like simulations. You should ask any statistician if you really what to use the formula that he show here, the answer is no way, you just use your computer but when you have the knowledge of the fundamental concept of density function, distribution function and sample mean all is more clear. Anyway, I understand that simulation is a good way of understanding problems and an easy way of testing difficult problems but in no way are difficult the essential concepts involved in a test for the mean.

  • @chrisstehlik7927
    @chrisstehlik7927 5 ปีที่แล้ว +2

    Instead of consulting wikipedia, you could just spend an hour learning about t-tests at Khan Academy.

  • @marcosantonioeuzebiodeoliv8515
    @marcosantonioeuzebiodeoliv8515 2 ปีที่แล้ว

    at 3:12 try to google plusone, find a sex shop...
    That's why i should be bat at statistics, don't know how to google