Harry's Data Journey
Harry's Data Journey
  • 3
  • 9 331
End-to-End ML/Data Science Project (with XGBoost) | Car Insurance Claims Prediction
My Code: github.com/harryallum/Data-Science-Projects/tree/main/Car%20Insurance%20Claim%20Prediction
Dataset: www.kaggle.com/datasets/xiaomengsun/car-insurance-claim-data
📖 Relevant Articles 📖
Stratified Sampling: medium.com/analytics-vidhya/stratified-sampling-in-machine-learning-f5112b5b9cfe
KNN Imputation: medium.com/@kyawsawhtoon/a-guide-to-knn-imputation-95e2dc496e
⏳ TIMESTAMPS ⏳
00:00​​​ | Intro
01:03​ | Basic Data Cleaning
04:33 | Create Train/Test Split
07:45​ | Exploratory Data Analysis
10:19 | Advanced Data Cleaning & Preprocessing
18:10 | Classification Model Selection
20:16 | Feature Engineering
22:49 | Creating the Model Pipeline
26:56 | Model Tuning
29:13 | Classification Model Evaluation
30:28 | Regression
🔗 KEEP IN TOUCH 🔗
📸 Instagram: harrysdatajourney
💻 GitHub: github.com/harryallum
📝 LinkedIn: www.linkedin.com/in/harry-allum/
WHO AM I?
My name is Harry 👋 I'm an Electro-Mechanical Engineer and aspiring Data Scientist, documenting my journey of trying to land my first job in Data Science. Come and follow along as I document my journey! Along the way, I'll be talking about my favourite learning resources, online course reviews and original tutorials.
⭐️ Tags ⭐️
#XGBoost #ML #DataScienceProjects #DataScience #DataScienceForBeginners #DataScienceProjects #pythonprogramming #Python #​SQL #DataAnalyst #Beginners​ #Tutorial​ #Data​ #Analysis​ ​ #Programming​ #Coding #DataJourney
มุมมอง: 2 840

วีดีโอ

How to Create and Deploy a Multi-Page Python Dashboard with Plotly Dash | Data Portfolio Project
มุมมอง 6K6 หลายเดือนก่อน
My Dashboard: www.thepropertydashboard.co.uk/ Project GitHub: github.com/harryallum/Dash-Property-Dashboard Dataset: www.gov.uk/government/statistical-data-sets/price-paid-data-downloads Plotly Dash: dash.plotly.com/ Dash Bootstrap Components: dash-bootstrap-components.opensource.faculty.ai/ ⏳ TIMESTAMPS ⏳ 00:00​​​ | Intro 01:51​ | Data Processing 11:46​ | Creating Single Page Dashboards 22:48 ...
I want to be a DATA SCIENTIST.
มุมมอง 3857 หลายเดือนก่อน
My first video of a series documenting my journey of trying to break into Data Science as someone with no background in the field. ⏳ TIMESTAMPS ⏳ 00:00​​​ | Introduction 03:20​ | Education 08:06​ | Experience 09:06 | Why Data Science? 11:00 | The Challenge 12:03 | Progress So Far 🔗 KEEP IN TOUCH 🔗 📸 Instagram: harrysdatajourney 💻 GitHub: github.com/harryallum 📝 LinkedIn: www.link...

ความคิดเห็น

  • @unknown-gj2sl
    @unknown-gj2sl 2 วันที่ผ่านมา

    did you ever landed that DS role?

  • @michaelgeorgiou7738
    @michaelgeorgiou7738 18 วันที่ผ่านมา

    I'm completely new to all this only started working with python this month, I'm amazed, how did you make your vscode function like this? Is this setup specific to data engineering, specifically when you execute a function it appears below with a processing time indicator, amazed

    • @harrysdatajourney
      @harrysdatajourney 17 วันที่ผ่านมา

      I think you might be talking about Jupyter notebooks, or ipynb files. These are used a lot in different data fields. They let you run sections of code with annotations. Give it a search, I hope it helps!

    • @michaelgeorgiou7738
      @michaelgeorgiou7738 17 วันที่ผ่านมา

      @@harrysdatajourney Thanks a million for the reply, I'll check out Jupyter notebooks I'm sure that's what's I was looking for!

  • @clipstok788
    @clipstok788 24 วันที่ผ่านมา

    what is the name of vs code theme?

  • @ejiroerhue
    @ejiroerhue หลายเดือนก่อน

    I’m a fresh mechanical engineering graduate with an interest in data science. I really enjoyed your story and I look forward to witnessing your journey.

  • @user-vd9nd3gm4n
    @user-vd9nd3gm4n หลายเดือนก่อน

    Are you actually typing that fast? 😮

  • @TheMISBlog
    @TheMISBlog หลายเดือนก่อน

    Good Luck Harry, just Subscribed

  • @imfinitiamusic.4632
    @imfinitiamusic.4632 หลายเดือนก่อน

    Respect deserves a sub!!!

  • @vishukumar6477
    @vishukumar6477 หลายเดือนก่อน

    I am Aspiring Data Scientist it's very helpful and Awesome ' ✌

  • @thelogiclabio
    @thelogiclabio 2 หลายเดือนก่อน

    Great stuff Allum

  • @bilal-khan
    @bilal-khan 2 หลายเดือนก่อน

    A question, if you have a large number of features. How do you choose between different categorical encoding? Do you attend to features on individual basis and then decide what encoding should be used?

  • @pent1162
    @pent1162 2 หลายเดือนก่อน

    I got this error: "All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. 'Pipeline(steps=[('col_dropper', ColumnDropper(columns_to_drop=['red_vehicle']))])' (type <class 'sklearn.pipeline.Pipeline'>) doesn't." From Chatgpt, "According to the error message, the issue lies with the cols_to_drop_pipeline in the ColumnTransformer. In the ColumnTransformer, the output of cols_to_drop_pipeline should be directly discarded rather than being processed as a complete transformer." But, does anyone meet the error?

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      Your pipeline code looks fine. Can you share your code for the custom transformer?

    • @pent1162
      @pent1162 2 หลายเดือนก่อน

      I found the typo in class: "The ColumnDropper class in this code has a spelling error; transfrom should be changed to transform. This spelling error occurs in the initial definition of the ColumnDropper class." While I correct it, all is fine. :P

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      @@pent1162 Glad to hear it!

  • @learner8324
    @learner8324 2 หลายเดือนก่อน

    great content, eagerly waiting for the deployment part.....

  • @abhinavmallick3413
    @abhinavmallick3413 2 หลายเดือนก่อน

    would be amazing if you share the resources on python that you mentioned you were studying from!best of luck, and thank you in advance!

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      I’m planning on doing a video on this soon 😀

  • @hoangha6680
    @hoangha6680 2 หลายเดือนก่อน

    thanks for the video. Just a small suggestion that an "end-to-end" data science project also includes model deployment such as on a web-app, etc. I hope that your future 'end-to-end' DS project will also have this part.

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      Your right! I wanted to look at covering model deployment in a separate video as this one was already quite long. Thanks for the suggestion!

    • @hoangha6680
      @hoangha6680 2 หลายเดือนก่อน

      @@harrysdatajourney no prob, in my opinion, long and detailed videos attract me the most since they cover the full picture,.It doesn't matter if you have long video such as more than 1 hour ^^

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      @@hoangha6680good to know! Thanks for the feedback 😀

  • @DarkOceanShark
    @DarkOceanShark 2 หลายเดือนก่อน

    Thanks Harry, I am looking forward for more such content from you. :)

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      Thanks! Let me know if you have any suggestions on what you’d like me to cover next 😀

  • @shailendra_kunwar
    @shailendra_kunwar 2 หลายเดือนก่อน

    I was just learning about classification models and then I got this recommendation from You tube. Awesome video Harry.

    • @harrysdatajourney
      @harrysdatajourney 2 หลายเดือนก่อน

      Thanks Shailendra! Hope you found it helpful

  • @candypopz7865
    @candypopz7865 2 หลายเดือนก่อน

    As someone who wants to become a business insights analyst, this is very helpful. Thanks Harry! ❤

  • @santiagotabordagiraldo7759
    @santiagotabordagiraldo7759 3 หลายเดือนก่อน

    Hi Harry, thanks for your video, I have got a question while watching... Those 'pages' could be used just like simple pages on another web design, something like a Django project and still working the same?

  • @Louis-cm4er
    @Louis-cm4er 5 หลายเดือนก่อน

    Thanks for the vid !! it was perfect :)

  • @datawithtess
    @datawithtess 6 หลายเดือนก่อน

    Harry you just got a subcribe from me

  • @John-xi2im
    @John-xi2im 6 หลายเดือนก่อน

    the data (4.7 gb overall size) is too huge for my laptop processor (AMD Athlon silver 3050u with radeon graphics × 2 and graphics: RAVEN (raven, LLVM 15.0.7, DRM 3.54, 6.5.0-18-generic)) to complete the ddf.compute() step as the kernel keeps on dying on that stage. I guess I have to download 3 or 4 individual year files from uk.gov , concat them and then follow the plotly tutorial, as the real interesting thing is how to create multi page plotly dashboard !

    • @John-xi2im
      @John-xi2im 6 หลายเดือนก่อน

      def collating_yearly_data(): raw_data_df = pd.DataFrame() for fname in glob.glob(path): raw_data_df = pd.concat([raw_data_df, pd.read_csv(fname) return raw_data_df

    • @John-xi2im
      @John-xi2im 6 หลายเดือนก่อน

      using the above function (glob use is the only new thing in this), I am using 6 years data to move ahead with the project 👍

    • @harrysdatajourney
      @harrysdatajourney 6 หลายเดือนก่อน

      Yep! It's a very large dataset. Using just a few select years is a great approach if you're more interesting it creating the dashboard itself!

  • @nanshibukawa7576
    @nanshibukawa7576 6 หลายเดือนก่อน

    Have you ever used streamlit ?? if positive, which do you think is better?? streamlit or plotly dash

    • @harrysdatajourney
      @harrysdatajourney 6 หลายเดือนก่อน

      I haven't tried using Streamlit yet. I do plan on trying at some point soon, so I may cover it in a future video!

    • @ntran04299
      @ntran04299 หลายเดือนก่อน

      @@harrysdatajourney yes please! I'm looking into Streamlit myself too

  • @elio3232
    @elio3232 6 หลายเดือนก่อน

    Hi !!! Thanks for do this from th simply to complex. It's really helps. I have a question with a multipage App. I need that one click on an y-axes from a figure in a page A, trigger an update on a figure from another page B. What need to Do that?

    • @harrysdatajourney
      @harrysdatajourney 6 หลายเดือนก่อน

      Thanks! For you situation, I'd use dcc.Store, part of Plotly Dash. You can use this to store your selection on page A in the browser as JSON, then load the plot using this on page B. You can find the documentation on dcc.Store here: dash.plotly.com/dash-core-components/store

    • @elio3232
      @elio3232 6 หลายเดือนก่อน

      @harrysdatajourney thank you so much. I will try. and the I will tell you.

    • @elio3232
      @elio3232 6 หลายเดือนก่อน

      @@harrysdatajourney Hi again, i'm trying to use dcc.store y dcc.link in page A to trigger the page B when a click event ocurs in a figure from the page A. If in dcc.link i use target='_self' the page B loads succesfully and print the value stored from the click but if target='_blank' the page B it's opened in a tab (that's what i want) but seems that the value stored== None. I don't know why

  • @John-xi2im
    @John-xi2im 6 หลายเดือนก่อน

    after installing dask, while running dd.read_csv() method, pyarrow>= 10.0.1 import error is coming even though pyarrow versions above > 10 (all from 11 to 15) are already installed. Could it be the effect of an upstream problem (deprecation warning while importing dask, which is replaced by dask-expr)?

    • @harrysdatajourney
      @harrysdatajourney 6 หลายเดือนก่อน

      It might be worth just trying to install the version the error message is asking for with pip install pyarrow==10.0.1

    • @John-xi2im
      @John-xi2im 6 หลายเดือนก่อน

      @@harrysdatajourney thanks for your kind response, had to reinstall ubuntu , will try and let you know !

    • @John-xi2im
      @John-xi2im 6 หลายเดือนก่อน

      this time i tried and no pyarrow error occured (probably it was spyder ide that was the issue). 👍

    • @harrysdatajourney
      @harrysdatajourney 6 หลายเดือนก่อน

      Glad to hear it!

  • @thelogiclabio
    @thelogiclabio 7 หลายเดือนก่อน

    Great stuff Harry!