ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

Hugging Face Datasets #1 | Hosting Your Datasets (for Beginners)

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ส.ค. 2024
  • Introduction to Hugging Face datasets, how it works, and how to host your own simple datasets (JSONL, TSV, CSV, etc) for free via Hugging Face Datasets Hub
    Warp download:
    app.warp.dev/r...
    Git LFS Install:
    Mac:
    $ brew install git-lfs
    Debian/Ubuntu:
    $ curl -s packagecloud.i... | sudo bash
    $ sudo apt-get install git-lfs
    Windows:
    Get install from github.com/git...
    🤖 70% Discount on the NLP With Transformers in Python course:
    bit.ly/3DFvvY5
    🎉 Subscribe for Article and Video Updates!
    / subscribe
    / membership
    👾 Discord:
    / discord
    00:00 Intro
    04:36 Creating our own Datasets
    08:29 Creating JSONL for Hugging Face
    15:15 Uploading Datasets for Git
    19:10 LFS for Large Files
    21:56 Closing Notes

ความคิดเห็น • 17

  • @jamesbriggs
    @jamesbriggs  ปีที่แล้ว

    Timestamps:
    00:00 Intro
    04:36 Creating our own Datasets
    08:29 Creating JSONL for Hugging Face
    15:15 Uploading Datasets for Git
    19:10 LFS for Large Files
    21:56 Closing Notes

  • @Sara-he1fz
    @Sara-he1fz ปีที่แล้ว +9

    I sincerely appreciate this series of videos. There are a lot of tutorials for hugging face but non of them explain how to use our own datasets. They all use the benchmarks and make it useless to apply the models to our own dataset.I would appreciate it if you explain how we can upload our private data file on hugging face rather than the public version that you showed in the video. Because it requires authentication it is really worth explaining. Thank you so much

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว

      that's a really good idea, I will see if I can include in upcoming video or just add another quick one on authentication - thankyou!

  • @warock3058
    @warock3058 3 หลายเดือนก่อน

    Thank you very much you helped me massively upload my custom dataset for Fill-Mask task :D

  • @KrisTC
    @KrisTC 4 หลายเดือนก่อน

    Thanks....jumping to next video :) ...

  • @sanatbek819
    @sanatbek819 ปีที่แล้ว

    The video is useful. Keep continue brother.

  • @RaviTeja-zk4lb
    @RaviTeja-zk4lb 10 วันที่ผ่านมา

    Can we load a dataset from our private cloud?(This data I don't want to upload to hugging face) I don't find any examples

  • @lutune
    @lutune ปีที่แล้ว

    Hey james! I dont have enough time to watch all of these incredible videos! Can you please advice on the importance of learning Hugging Face?
    Also that browser is awesome! A friend suggested something similar to that back in the day, but never had much time to look into it. you seem like you use it well for your projects! Can you put together some more shorts on just kind of your set up, what tools you use day today for your job, and maybe snippits of advice on how to stay focused? I have a bit of a shiny object syndrom right now with all the AI stuff coming out.

  • @fizipcfx
    @fizipcfx ปีที่แล้ว

    i am currently trying to upload my dataset to huggingface rightnow you are so helpful

  • @hervezossou5413
    @hervezossou5413 9 หลายเดือนก่อน

    Please I gave a dataset which contains audio files and a metadata file in CSV format. How to upload all of this and gave in the format of hugging Face datasets? One column for input_id, one for audio, another for transcription or text and normalized text?

  • @susmitajaigade-gi4mn
    @susmitajaigade-gi4mn 5 หลายเดือนก่อน

    please create video on how to create text to image generator model in hugging face

  • @mohammedal-hitawi4667
    @mohammedal-hitawi4667 ปีที่แล้ว

    Thanks that is great

  • @user-hd9li6df4r
    @user-hd9li6df4r 6 หลายเดือนก่อน

    Hey james ' thank u for ur big effors can you tell me about jobs in platforms online like hugging face or langchain or together ai

  • @ariramkilowan8051
    @ariramkilowan8051 ปีที่แล้ว

    Do you know of best practises when it comes to hosting ones own dataset (in a GCP bucket for instance) but in the HF compatible apache arrow format. i.e. private data that can still easily be ingested by HF models without storing it on the Hub.

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +2

      yes will go through some of this in the next video and number 3