Spark in R Masterclass | Featuring sparklyr & shiny

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ส.ค. 2024

ความคิดเห็น • 19

  • @user-xg4yq9uc9u
    @user-xg4yq9uc9u ปีที่แล้ว

    correction @49:51 , it should be mutate(returns=(adjusted-lag)/lag) not mutate(returns=adjusted-lag/lag)

  • @ousmanesybodian5329
    @ousmanesybodian5329 2 ปีที่แล้ว +1

    Thanks for sharing Matt !

  • @siriyakcr
    @siriyakcr 2 ปีที่แล้ว +1

    Super cool thanks 😁😁😁😁

  • @brendenmorley2643
    @brendenmorley2643 2 ปีที่แล้ว +1

    Thank you. This will be used. Quick question, around 46:00 you pull on your data set, then you push it over to Sparkr, can you just pull it directly into your local Sparkr cluster without pulling into your R session first?

    • @BusinessScience
      @BusinessScience  2 ปีที่แล้ว

      You can. No need to pull anything in. But for the tutorial it speeds things up since zoom running slows my Mac down quite considerably.

  • @elchinmustafayev1826
    @elchinmustafayev1826 ปีที่แล้ว

    Hello. Thank you very much for the tutorial. I am currently struggling to connect to a big database (larger than my machines library) on snowflakes. But my attempts have been unfruitful. I am newbie to the topic, so I need your help?

  • @eugenepawan
    @eugenepawan 2 ปีที่แล้ว +1

    Superb

  • @samritpramanik2962
    @samritpramanik2962 ปีที่แล้ว

    How can I add a csv file of 20 GB (which is stored in my local disk) to this Spark connection for data analysis?

  • @vineetsansi
    @vineetsansi ปีที่แล้ว

    I was getting an error while uploading the Nasdaq data to the spark connect
    Error: "Warning: NAs introduced by coercion to integer rangeWarning: problem writing to connection"
    Using version 3.30
    nrows of data: 24,197,442
    It worked with mtcars but not with the nasdaq data that I collected from kaggle

    • @vineetsansi
      @vineetsansi ปีที่แล้ว

      And kept on running for 30 mins but still didn't complete. Is this an expected behavior of spark ?

    • @vineetsansi
      @vineetsansi ปีที่แล้ว

      Even ran for 3 hours and still it didn't complete. Any suggestions ?

    • @BusinessScience
      @BusinessScience  ปีที่แล้ว

      No clue what you are doing. I suspect there is an issue with your setup.

    • @vineetsansi
      @vineetsansi ปีที่แล้ว

      @@BusinessScience I think there was a problem with RAM as when I reduced the data from 20+ Mn rows to 1-2 Mn then it worked.
      But now I have noticed that I get a little different results when I use just r & when I use sparklyr data in data summarising.
      I think r is generating Inf & NaN but sparklyr Code is not. Code I tried:
      nasdaq_data_tbl %>%
      mutate(returns = (Close - Open)/ Open * 100) %>%
      group_by(ticker) %>%
      summarise(
      mean = round(mean(returns, na.rm = TRUE),2),
      sd = round(sd(returns, na.rm = TRUE),2),
      count = n(),
      last_date = max(Date, na.rm = T),
      .groups = "drop"
      ) %>%
      arrange(desc(mean))

    • @BusinessScience
      @BusinessScience  ปีที่แล้ว +1

      @@vineetsansi yes ram could’ve been the problem. It sounds like you are on the right track.

  • @eugenepawan
    @eugenepawan 2 ปีที่แล้ว

    could you please share the datafile for nasdaq

    • @BusinessScience
      @BusinessScience  2 ปีที่แล้ว

      Hi Pawan, sorry the data is in Learning Labs PRO. You are welcome to join. university.business-science.io/p/learning-labs-pro