Python Libraries You Should Know As A Data Engineer - Python For Beginners

แชร์
ฝัง

ความคิดเห็น •

  • @SeattleDataGuy
    @SeattleDataGuy  ปีที่แล้ว

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/

  • @shravanshenoy3873
    @shravanshenoy3873 ปีที่แล้ว +19

    Beginner -
    1. Requests (and sftp)
    2. Psycopg2 and similar database libraries
    3. Beautifulsoup and scrapy
    4. Datetime
    5. Virtualenv
    Intermediate -
    6. Airflow
    7. Boto3 and similar libraries to interact with cloud
    8. Flask/Django
    Advanced (based on need to know) -
    9. Pyspark
    10. Pyarrow

    • @ChatGPT-ef6sr
      @ChatGPT-ef6sr ปีที่แล้ว

      Up

    • @seya2183
      @seya2183 10 หลายเดือนก่อน

      Warning e logging too

  • @RSKriegs
    @RSKriegs ปีที่แล้ว +11

    Some other cool libraries from my side:
    - Pandas - you've mentioned it but you haven't put it in a context that one should know I think (vide the case from your Facebook interviews) - I think its essential for any sort of data wrangling with Python.
    - NumPy - essential stuff for any sort of algebra if you want to dive deeper into ML
    - MyPy/Pydantic - for data validation & static typing
    - Pytest - for testing
    - matplotlib & seaborn - for data visualization in Python
    - any sort of file libraries for specific file formats like json, csv, avro-python etc.
    - ML libraries like scikit-learn
    - FastAPI as an alternative to Django/Flask
    - Selenium
    - argparse for scripting
    Although I haven't used most of these in my job on a regular basis - I think it doesn't hurt to know them :)

    • @data-dylan
      @data-dylan ปีที่แล้ว

      sympy is more of an algebra library. I think you meant numpy is a linear algebra library. This can be a good way of thinking about it for a beginner who wants to learn ML, but I find it gets used a lot for stuff where you want to try and represent continuous mathematics as closely as possible on a computer. For example, numpy would also be also be good for stuff like signal processing or creating a function of best fit for your data that can be plotted.

  • @matthewwiese6972
    @matthewwiese6972 ปีที่แล้ว

    Psycho pg2 is how I've heard folks say it too!

  • @hdr-tech4350
    @hdr-tech4350 ปีที่แล้ว +1

    Requests
    Psycopg
    Bigquery
    Beautifulsoup & scrapy
    Datetime
    Boto 3
    Flask
    Virtualenv
    Spark
    Pyarrow
    Pykafka
    Snowflake

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Thanks! I finally added in the agenda so these are now included.

  • @EH-it8pj
    @EH-it8pj ปีที่แล้ว +7

    I'm stuck in a "data engineer" position where all my boss will let me do is debug SQL script and it's killing me

    • @gavinkalaher7314
      @gavinkalaher7314 ปีที่แล้ว

      how long have you been there?

    • @jeffGordon852
      @jeffGordon852 ปีที่แล้ว +2

      QUIT

    • @playea123
      @playea123 ปีที่แล้ว +2

      Leave if you can. You are doing yourself no favors by wasting years at a job you don’t like and especially one that isn’t improving your skills

  • @lkellermann
    @lkellermann ปีที่แล้ว

    Watching the premiere... expecting to hear about the tenacity library here xD

  • @luizhenriquecudo125
    @luizhenriquecudo125 ปีที่แล้ว

    Great content as usual! I'd add json library to that

  • @shashankemani1609
    @shashankemani1609 ปีที่แล้ว +1

    amazing thank you!

  • @EbeneezerGumb
    @EbeneezerGumb ปีที่แล้ว

    good list, but most of your psycopg2 stuff prob would have been easier with sqlalchemy

  • @redrum4486
    @redrum4486 ปีที่แล้ว

    I have to use a shell script ti execute mysql queries then pass the resulrt as an argument in my python scripts >_< wish i could just use mysql connector

  • @data-dylan
    @data-dylan ปีที่แล้ว

    How can you know pandas every which direction, but not understand a dictionary? You wouldn't know how to construct a dataframe from a dictionary of lists (often my approach when webscraping) or know how to use the map function to change categorical names. Wes McKinney (who created pandas) even says that a pandas series data structure is similar to an ordered dictionary.

  • @SanjeevKumar-dr6qj
    @SanjeevKumar-dr6qj ปีที่แล้ว +1

    You are awesome.

  • @pcargolo1
    @pcargolo1 ปีที่แล้ว

    I've gone through possibly all python courses in Udemy but have never seen a course focused on Data Engineering and the good-to-know libraries. Some times there is one short chapter about one of them buth nothing complete. Anyone has any tips?

  • @gabrielkolletalves493
    @gabrielkolletalves493 ปีที่แล้ว +1

    Regarding to APIs I always thought we should learn how to pull from them, not actually create them. So where does Flask fits into all that?

    • @playea123
      @playea123 ปีที่แล้ว

      Depends on what product is built on top of your db/dw. You might need to build an api on top of your warehouse to power your product.

    • @gabrielkolletalves493
      @gabrielkolletalves493 ปีที่แล้ว

      @@playea123 Cool. And do you know what kind of custom API could run over a DW? I could only think such case in an OLTP context...

    • @playea123
      @playea123 ปีที่แล้ว +1

      @@gabrielkolletalves493 depends on how you model your DW. If you want something similar to an OLTP, Snowflake rolled out hybrid tables a few months ago

  • @alexanderpotts8425
    @alexanderpotts8425 ปีที่แล้ว

    hey! leave gcp libs alone 😂