Dealing with nonlinear data: Polynomial regression and log transformations

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 54

  • @toryreads
    @toryreads 8 หลายเดือนก่อน +18

    Consider this me asking nicely (BEGGING) for the non-linear regression/Bayesian video! :D Also, arm twist! Arm twist!

  • @derekcaramella8730
    @derekcaramella8730 4 หลายเดือนก่อน +2

    How have I not found this channel sooner! Amazing stuff, binge watching this channel

  • @AurelODJO
    @AurelODJO 8 หลายเดือนก่อน +1

    Hello, i am statistician. i live in Africa and really appreciate this lessons. So Fun thanks to the Teacher.

  • @jackelsey7656
    @jackelsey7656 8 หลายเดือนก่อน +4

    Here's another vote for a nonlinear regression analysis video. That approach made sense for my dissertation research (inverse problems with mechanistic time-series models), and I'm curious what your perspective is. It seems to me like weighted least squares can work well in many heteroscedastic contexts if you assume residuals are independent and have a constant CoV.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      I agree, there are some heteroskedastic processes with parameters that can be estimated with weighted least squares.

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน +1

    I look at non-linear regression and Bayesian regression as logically independent classes of models. They can both involve using more fundamental principles, rather than just grabbing a recipe off the shelf as the other extreme, which I think is a valuable skill for a statistician to have.

  • @dominicl6712
    @dominicl6712 8 หลายเดือนก่อน +3

    Well done! Looking forward to the GLM video. I still did not fully understand the link functions there.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      FWIW the Wikipedia page on generalized linear models discusses the role of the link function explicitly.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน

      I've already made a video on GLMs: th-cam.com/video/SqN-qlQOM5A/w-d-xo.html

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน

    Plotting the residuals can be very beneficial for learning about the performance of a predictive model. There is a common pitfall worth mentioning though. The distribution of the residuals is not in general the likelihood distribution.
    Take for example the equation
    Y = X + epsilon
    where
    Y ~ Poisson(lambda)
    X ~ Poisson(mu)
    and
    epsilon ~ Poisson(tau).
    If you compute the residuals you will obtain a Skellam random variable rather than a Poisson random variable.

  • @Lello991
    @Lello991 8 หลายเดือนก่อน +3

    Hey Dustin! Speaking of non-linear data, what about a video on Generalized Additive (Mixed) Models? GA(M)Ms?! I'm sure it'd be sooooo useful for many of us!!!

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      Data itself is almost never linear, although in special cases it can be. In order for a Cartesian product of two sets to be linear it must be a function (i.e. left-total and right-unique) satisfying homogeneity of scaling of order one and additivity. The only data sets I have encountered that were linear were synthetic examples.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      I've used GAMs with time series data, and I can readily see GAMMs be similarly useful in hierarchical time series.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      I take it back. I don't think any finite sample can be linear since for any maximum point there will exist a scalar multiple of it in the real numbers that is not in the data set.

    • @jackelsey7656
      @jackelsey7656 8 หลายเดือนก่อน

      Bottom of the heap has a nice video on it.

  • @i9iveup
    @i9iveup 8 หลายเดือนก่อน +1

    I would recommend Fractional Polynomial Models that identify the best transformations of the covariates, with the obvious risk of overfitting and ambiguity in the interpretation of the coefficients.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      Oh neat, I didn't know that approach by name. I agree that over fitting is the largest risk with fractional polynomial models since they're a natural superset of polynomial models.

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน +1

    I have noticed the word "line" in "linear", but unfortunately the terminology is more complicated than Dustin presented. I'll give a couple of reasons:
    The first is that they are not synonyms in mathematics. All lines are linear, but not all linear functions are lines. For example, the derivative operator linear on the space of analytic functions, but it is not a line per se.
    The second is that statisticians were focused on the parameters when they coined the term "linear model". Conventionally "linear model" refers to a regression model which is linear in its conditional expection with respect to the unknown parameters. This makes both the example polynomial regression and log-transformed regression model in the video out to be special cases of linear models.

  • @StatisticsSupreme
    @StatisticsSupreme 8 หลายเดือนก่อน +2

    On the log transform, you say the estimate b is now on a log scale, yes. But that is not a problem for interpretation, when you transform it back to where it came from. Exponentiate that value and you are back and can interpret it as normal. So there is no real "cost" there. But overall nice video, as always :D

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน +1

      I agree.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน +1

      Also, since the logarithm is monotonic we can readily anticipate the direction of change in the conditional expectation when we consider a change in one of the predictors.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน +1

      Supposing for example the conditional expectation
      E[Y|X=x] = exp(m * x+b)
      then it is straightforward to take the derivative with respect to x via the chain rule of calculus:
      dE[Y|X=x]/dx = m * exp(m * x + b)
      Thus we can calculate how much Y is changing on average with respect to a change in x by knowing m, x, and b.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน +1

      Except that it's not a constant change anymore, meaning we can't say "for every change in our predictor, there is a x point change in y."

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      @@QuantPsych That's true.
      Since not everything is linear, or even most things, it is wise not to recoil from introducing non-linearity into models when it has warrant.
      Reading from Kit Yates', "How to expect the unexpected", I came across the term "linearity bias". It is informally defined as a cognitive bias of tending to assume that changes are linear. One concern I have about only (or predominantly) teaching models that are linear in the predictors is that it may enculcate or reinforce linearity bias in students. But I'm not read on the psychology or education literature to say if that concern has been addressed; just concerned for now.

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน

    It has been a long while since I have really thought about semi-partial correlation coefficients. But if memory serves it does not in general equal to the conditional correlation coefficient except under certain families of distributions. A sufficient criterion for distributional assumptions to hold such that the partial correlation equals the conditional correlation is when the joint distribution is in an exponential parametric family of distributions.

  • @Alias.Nicht.Verfügbar
    @Alias.Nicht.Verfügbar หลายเดือนก่อน +1

    thanks!

  • @TheMrSodo91
    @TheMrSodo91 8 หลายเดือนก่อน

    Thanks for your work as always, I am approaching Bayesian statistics so it would be great to see you going into bayesian regression. please please please!

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน +1

      It's on my to-do list :)

  • @adrianor.397
    @adrianor.397 8 หลายเดือนก่อน +1

    What package has the visualize() function? Great explanations, as usual!

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน +1

      flexplot

  • @1982757
    @1982757 8 หลายเดือนก่อน +1

    How do you interpret the coefficients of the polynomial model?

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      Hint: The conditional expectation of the predicted variable on a predictor will be monotonic in the coefficients under mild assumptions.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน +1

      It's not intuitive. It's the expected change in Y when the square of X increases by one unit. The only thing that's really intuitive is the sign (positive versus negative, indicating whether it's concave upward or downward, respectively). I usually don't bother interpreting it. I just look at the plot.

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      @@QuantPsych In the case of a quadratic polynomial the sign(um) of the leading coefficient tells us about concavity/convexity. This works because a twice-differentiable function of a single variable is convex if and only if its second derivative is nonnegative on its entire domain. A similar result holds for concavity. Polynomials are always twice-differentiable. Many polynomials are neither convex/concave over their entire domain. The second derivative of a quadratic is always the leading coefficient, which is why the inference is straightforward in this case. The leading term of higher-degree polynomials cannot reliably be used this way.
      When we're dealing with single-variable functions I'd give the same recommendation; just look at a plot. When you get into multivariable systems (which is typical of realistic systems) it is much more difficult to eyeball the convexity/concavity. I think that trying to visually infer concavity/convexity from PCA plots or parallel axis plots is unlikely to be reliable, for example. If you're lucky enough to have a function that is second-differentiable in all its inputs, then you can generalize the result given in single-variable calculus. It requires finding the stationary points using the gradients, then using the Hessian to (1) determine which points are optima and (2) then for the optimal points use the signum of the eigenvalues to evaluate convexity/concavity/neither.

    • @1982757
      @1982757 8 หลายเดือนก่อน

      In other words, it depends...@@QuantPsych

  • @igoryakovenko1343
    @igoryakovenko1343 8 หลายเดือนก่อน

    Great video! Any chance you be open to sharing a link to the dataset used so we can re create the exercise and try it ourselves? Thank you!

    • @galenseilis5971
      @galenseilis5971 8 หลายเดือนก่อน

      If the data is not available you can readily simulate data suitable for these cases if you just need them for exercise.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน

      Most of my datasets are here:
      quantpsych.net/data/
      That particular one is called depression_wide

    • @igoryakovenko1343
      @igoryakovenko1343 8 หลายเดือนก่อน

      Thank you for directing to the link. Am I completely blind or does the depression_wide set not contain any of the variables in the video (i.e., cancer related or rizz)? Sorry if I'm missing it somewhere.@@QuantPsych

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน

    The biggest limitation of polynomial regression is over fitting. Via the Stone-Weierstrass theorem we can say that a sufficient number of polynomial terms will fit as well as we like. In fact, many functions (including the exponential function; wink wink) have a Taylor series which is basically a polynomial with an infinite number of terms.

  • @1997aaditya
    @1997aaditya 6 หลายเดือนก่อน

    Why don't you use poly(var_name, n) instead, for orthogonal polynomials?

    • @QuantPsych
      @QuantPsych  5 หลายเดือนก่อน

      Because I can never remember how to do that.

  • @realdragon
    @realdragon หลายเดือนก่อน

    Still don't know how to actually calculate polynomial regression

    • @QuantPsych
      @QuantPsych  หลายเดือนก่อน

      3:38
      See line 25 of my R code

  • @galenseilis5971
    @galenseilis5971 8 หลายเดือนก่อน

    The phrasing "y=x^2 gives a polygon" must have been a brain fart. It happens. Polygons and polynomials are distinct mathematical concepts.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน

      Yes, the brain did indeed fart.

  • @bobbrian1641
    @bobbrian1641 8 หลายเดือนก่อน +1

    You ARE hot. And entertaining. Great video. I will have to learn this stuff eventually... Subscribed.

    • @QuantPsych
      @QuantPsych  8 หลายเดือนก่อน +1

      Ha! Flattered again :)