What is AWS EMR | Extract and Transform Redfin data with AWS EMR | EMR Studio | Pyspark Notebook

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ต.ค. 2023
  • #dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin
    In this video, I explained what Amazon EMR (Elastic MapReduce) is all about and its benefits in processing big data. I then showed how you can create VPC and then spin up EMR clusters within this VPC. Later, I showed you how to create Amazon EMR studio and Jupyterlab after which I attached the Jupyter notebook to the provisioned cluster. I then showed how to write Pyspark code in the Jupyter notebook attached to the provisioned EMR to extract data from the Redfin data source, process it and load the transformed data as parquet file into an S3 bucket.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** Commands used in this video ****************
    Check out my github Repo
    github.com/YemiOla/data_engin...
    **************** USEFUL LINKS ****************
    1. Redfin Analytics|python ETL pipeline with airflow|Data Engineering Project|Snowpipe|Snowflake|Part 1 • Redfin Analytics|pytho...
    2. Redfin Analytics|python ETL pipeline with airflow|Data Engineering Project|Snowpipe|Snowflake|Part 2 • Redfin Analytics|pytho...
    3. Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 1 • Zillow Data Analytics ...
    4. www.redfin.com/news/data-center/
    5. docs.aws.amazon.com/emr/lates...
    6. PostgreSQL Playlist: • Tutorial 1 - What is D...
    7. Apache Airflow Playlist • How to build and autom...
    DISCLAIMER: This video and description have affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
    #dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 23

  • @jeremyyap3474
    @jeremyyap3474 7 หลายเดือนก่อน +1

    Your videos really helped me a lot in learning about cloud computing platforms. Please make more tutorials like this!!

    • @tuplespectra
      @tuplespectra  7 หลายเดือนก่อน

      Awesome! I'm glad our video is really helping you to learn a lot about cloud computing platforms. I encourage you to go explore my other data engineering project videos where we used other technologies as well in addition to AWS.

  • @oyekanemmanuel5636
    @oyekanemmanuel5636 8 หลายเดือนก่อน +2

    Thanks for this sir, I'm always looking forward to your mentorship

    • @tuplespectra
      @tuplespectra  8 หลายเดือนก่อน +1

      My pleasure. Thank you!

  • @sayedsamimahamed5324
    @sayedsamimahamed5324 5 หลายเดือนก่อน

    Another masterpiece!!! ♥

    • @tuplespectra
      @tuplespectra  5 หลายเดือนก่อน

      Thanks for your comment.

  • @sudippandit7051
    @sudippandit7051 หลายเดือนก่อน +1

    Great sessions!

  • @Hellotesttest
    @Hellotesttest 8 หลายเดือนก่อน

    Good tutorial for my learning, Thank you.

    • @tuplespectra
      @tuplespectra  8 หลายเดือนก่อน

      You are welcome! I'm glad it was helpful.

  • @maverick6111
    @maverick6111 7 หลายเดือนก่อน +1

    Hi tuplespectra, Great to see this project and learnt a lot. One Suggestion from my side is pls use some small dataset so that cost on AWS will be reduced.

    • @tuplespectra
      @tuplespectra  6 หลายเดือนก่อน

      Thanks for your comment. Suggestion noted.

  • @bindubala9560
    @bindubala9560 8 หลายเดือนก่อน

    Hello.. i have been following all your vedios and learning to build data pipelines as a beginner. Thanks and your vedios are a blessing for beginners. I request you to do vedio on how to run job on airflow from starting the spin up of emr do the ETL process and terminate the cluster as we do in companies. This would really help

    • @oyekanemmanuel5636
      @oyekanemmanuel5636 8 หลายเดือนก่อน

      I agree, I wish he can hold a full end to data warehousing course. I really loves how well he breaks the processes. Such facilitators are not that many on this platform

    • @tuplespectra
      @tuplespectra  8 หลายเดือนก่อน +1

      Thanks for your comment. Appreciate it! Your request is exactly what will be released in next 1 or 2 weeks. Work is going on it already. Thanks!

    • @tuplespectra
      @tuplespectra  8 หลายเดือนก่อน +2

      This comment really means a lot to me. It shows that our content is found valuable even to beginners. We hope to start working on creating courses next year, including data warehousing as time permits. Thanks so much.

    • @oyekanemmanuel5636
      @oyekanemmanuel5636 8 หลายเดือนก่อน

      @@tuplespectra I really can't wait sir, thanks so much for granting my request.

    • @avinash390
      @avinash390 3 หลายเดือนก่อน

      what did it cost you guys in AWS for this project?

  • @koladearisekola2246
    @koladearisekola2246 7 หลายเดือนก่อน

    Great video. I am really learning a lot from your channel. Can you please make a video using apache Kafka ?

    • @tuplespectra
      @tuplespectra  7 หลายเดือนก่อน

      Thanks! I'm glad you are learning a lot from us.

  • @yinpingguo1184
    @yinpingguo1184 5 หลายเดือนก่อน +1

    Hi Tuplespectra, great video! Can you let us know how much did AWS charge for this demo project? I plan to follow your video and do one in my personal AWS account

    • @avinash390
      @avinash390 3 หลายเดือนก่อน

      Did you get the cost?

  • @narendrajatti
    @narendrajatti 3 หลายเดือนก่อน

    how to update in pyspark?