My Favorite Books For Data Engineers - From Streaming To Software Engineering

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • Data engineers have a lot of different resources they can learn from.
    One great example is books.
    There are tons of great data engineering books that you can read to learn about spark, streaming, redshift, snowflake and more.
    Here are the books from this video
    The Data Warehouse Toolkit
    aatinegar.com/...
    Learning PySpark
    amzn.to/31MQqun
    Transaction Processing
    amzn.to/3oGIm6W
    Streaming Systems
    amzn.to/3IHh7RF
    Data Pipelines Pocket Guide
    amzn.to/3oQHk8N
    If you enjoyed this video, check out some of my other top videos.
    What Skills Do Data Engineers Need?
    • What Skills Do Data En...
    Data Engineering Project Ideas
    • 5 Data Sources for You...
    If you want to learn more about machine learning, check out DataCamps Machine Learining Course
    bit.ly/3BeLEml
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy...
    Or check out my blog
    www.theseattle...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consults on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

ความคิดเห็น • 56

  • @SeattleDataGuy
    @SeattleDataGuy  2 ปีที่แล้ว +4

    If you want to keep up with data technologies and get advice on how to set up your data stack, then sign up for my newsletter - seattledataguy.substack.com/

  • @djmali
    @djmali 2 ปีที่แล้ว +16

    This is a good list of material, one book I would recommend is the "Creating a data-driven organization" by Carl Anderson.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      I read a few blurbs from that book. I will need to dive in deeper!

  • @seoexperimentations6933
    @seoexperimentations6933 2 ปีที่แล้ว +15

    hey SGD, I've been learning Snowflake for the past two months and got certified last week.
    The 'Data Pipeline Pocket Reference' is an amazing book, even if it covers redshift concepts
    since it introduces the COPY command which is used in all major data warehouse and the final chapters on
    monitoring and scheduling cover basics of airflow too. All around my favorite data engineering book.
    For DataBricks/Azure I would recommend the newest "Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way". This book is insanely good!
    Goes through an end to end real-life project using delta lake. And explains everything really well.
    I'm sad you didn't include the data bible: "Designing Data Intensive Applications". It's a staple too

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Thanks for adding in all the books! Yeah my desire to not be the same might have bit me a bit for not including "Designing Data Intensive Applications". But if we spam it enough in the comments I think people will get that its worth the buy or read too.

    • @JimRohn-u8c
      @JimRohn-u8c 2 ปีที่แล้ว +2

      Are the books you recommended good if I’ll be working with Azure Synapse, Azure SQL Database, Azure Data Factory, and Azure Data Lake but no data bricks?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      @@JimRohn-u8c I don't have a book. But this guys channel is amazing.
      th-cam.com/users/Azure4Everyonevideos

  • @alexanderpotts8425
    @alexanderpotts8425 2 ปีที่แล้ว +2

    I have most of these but bought the pipelines pocket reference straight away! that's a gem, thanks for the rec

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +2

      Glad to here my books aren't way out of wack!

  • @djsadsa2933
    @djsadsa2933 2 ปีที่แล้ว +7

    May God bless you...

  • @itsallinyourhead3593
    @itsallinyourhead3593 2 ปีที่แล้ว +11

    how about 'Designing Data-Intensive Applications' by Martin Kleppmann ?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +5

      I actually own this book too. I feel like everyone references it, so I was like..do I really say that one too hahahaha

    • @itsallinyourhead3593
      @itsallinyourhead3593 2 ปีที่แล้ว +2

      makes sense then 😁

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      @@itsallinyourhead3593 Everyone loves that book

    • @danemarjanovic2415
      @danemarjanovic2415 2 ปีที่แล้ว +2

      @@SeattleDataGuy And I was like...why didn't he say one word about DE Holy Bible...what an outrage! :)

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      @@danemarjanovic2415 It's so cliche...got to be a little different hahaha.

  • @danielhooverc
    @danielhooverc 2 ปีที่แล้ว +5

    Hi Bro, your videos are excellent. Why don't you make some tutorial videos like Data pipeline, Data Modeling etc ? I believe that they will be more popular.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      Hey Daniel! I do plan to. I have been juggling a full role at facebook, consulting and content creation. I do plan to do this though

  • @otofori9802
    @otofori9802 2 ปีที่แล้ว +1

    thanks. ive been waiting for this!!!

  • @youtuber253
    @youtuber253 2 ปีที่แล้ว +3

    You should make a video about the snowflake-data bricks drama

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +2

      The keemstar of the tech world...(I almost wrote teamstar)

  • @preethievelynsadanandan5770
    @preethievelynsadanandan5770 ปีที่แล้ว

    Great video! Very helpful book recommendations. Could you also make a video on the drama of the Snowflake vs Databricks? lol

  • @focusEngineered
    @focusEngineered 2 ปีที่แล้ว +1

    Thanks so much.
    it's a so useful list.

  • @oresttokovenko
    @oresttokovenko 2 ปีที่แล้ว +7

    Hey Ben, any news on the resume review video? I'm eager for you to review my resume

    • @user-nd7yg9gv9t
      @user-nd7yg9gv9t 2 ปีที่แล้ว

      +1

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +4

      I did say I would..ok. I have the next few video ideas slotted out...but let me make a post about asking for resumes.

  • @pygeekrfoo820
    @pygeekrfoo820 2 ปีที่แล้ว +5

    O'Reilly and Amazon reviews not kind to Learning Pyspark

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      I just saw that. I bought this book because of Denny Lee and thought it was fine. I should also check out the o'reily spark book

  • @yinggamonkulsarapitak7948
    @yinggamonkulsarapitak7948 2 ปีที่แล้ว +2

    Great vid!. Thanks for sharing bro.
    Could you suggest books for one who start doing data pipeline, ETL for analytics products using AWS Services? My company just started migrating on-prem DWH to Cloud. So I need to ramp up AWS skills from zero which is really hard for someone who transition from analyst to data engineer. :( Thanks!

  • @RamirezGold
    @RamirezGold 2 ปีที่แล้ว +1

    Hey Benjamin, you're advocating The Data Warehouse Toolkit. Researching it, I found the blog entry "Learn from Google’s Data Engineers: Dimensional Data Modeling is Dead" of Galen B presumably from Google. His thesis basically is, that the resources saved by dimensional data modeling are reduced to a marginal degree by cost reduction of modern technology compared to the business value that engineers can generate elsewhere. How do you feel about this?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Hahaha, I actually reference his article in one of my videos. It's a little more nuanced than just "don't learn data modeling". I think it will continue to provide value even if we change some of the steps.

  • @prasanthnarayanan
    @prasanthnarayanan 2 ปีที่แล้ว +4

    Are there any good books (or other resources) that cover data observability and dataops for modern data pipelines ?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      You know, I haven't looked into a book on that specific subject

  • @JimRohn-u8c
    @JimRohn-u8c 2 ปีที่แล้ว +2

    Im an Analyst and I wanna transition to Data Engineering but I feel so overwhelmed with all the stuff I have to learn and all the books I need to read 😭

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Just start slowly. I put together a data engineering roadmap video to try to help people figure this out. Have you seen it?

    • @JimRohn-u8c
      @JimRohn-u8c 2 ปีที่แล้ว +1

      @@SeattleDataGuy I haven’t I’ll have to find that video!

  • @ankittjindal
    @ankittjindal 3 หลายเดือนก่อน

    Recommend me some books as I only have an idea of python and sql so..which book best for me as a beginner in data engineering field

  • @phuongdo8269
    @phuongdo8269 2 ปีที่แล้ว +1

    Great video! I'm new to the field so wonder which cloud-based engine do you recommend to learn?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      It shouldn't matter too much. Both AWS and GCP should give you credits to start.

  • @Pharaoization
    @Pharaoization 2 ปีที่แล้ว +3

    What's the Databricks drama about?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +2

      Here is the drama databricks.com/blog/2021/11/15/snowflake-claims-similar-price-performance-to-databricks-but-not-so-fast.html

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +2

      I mean...this shade ""Snowflake’s response: “lacking integrity”?""

    • @Pharaoization
      @Pharaoization 2 ปีที่แล้ว +1

      Thanks!

  • @ass2412
    @ass2412 2 ปีที่แล้ว +2

    Did you actually finish reading transaction processing? Coz damnn that's a lot of pages.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Not yet...currently use it more as a reference guide.

  • @0yustas0
    @0yustas0 2 ปีที่แล้ว +1

    Wow... 2021 December.... Spark 2.0 ^_^

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Guess you're not the only old person here 😅

  • @Crazy8xxx
    @Crazy8xxx 2 ปีที่แล้ว +3

    What the job equates to is you being responsible for loading data into a dimensional or transactional data model. You need to understand the methods of loading data. A data or integration architect will have already designed the system and you’ve been hired to fill it with data.
    Learn how to tell managers and stakeholders how long it’s going to take you. All these books are great but they aren’t going to help you in reality.

    • @rainwave5
      @rainwave5 2 ปีที่แล้ว +2

      Ok I feel like your comment makes absolutely no sense whatsoever and hopefully you can plug in some gaps in my understanding if there are any 🙂👍.
      So you say one must "understand the methods of loading data"
      Ok a couple of problems I find there.
      1) A data engineer doesn't just build pipelines and if so what proof is there of this?
      2) Were there not any books mentioned that have pipelines involved? I remember one called "Data Pipelines" that I even own myself.
      Also it could be possible as a data engineer that you're asked to modify a source system because the data or integration architect or whoever built it was trash. So I could see how some little tidbits of transaction processing could come in handy.
      Or maybe you build an API it could come in handy.
      So with that in mind your final statement that all these books are great yet won't help in reality really doesn't make any sense to me at all

  • @yogeshs9809
    @yogeshs9809 ปีที่แล้ว

    Any Scala Spark Book recommendation?

  • @RamirezGold
    @RamirezGold 2 ปีที่แล้ว +1

    Random KNOOOOOWLEDGE guy appears. That came as a surprise.