Efficient Training of Neural Networks with TPUs in Google Colab - Step-by-Step Setup with Code

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.พ. 2021
  • Inside my school and program, I teach you my system to become an AI engineer or freelancer. Life-time access, personal help by me and I will show you exactly how I went from below average student to making $250/hr. Join the High Earner AI Career Program here 👉 www.nicolai-nielsen.com/aicareer (PRICES WILL INCREASE SOON)
    You will also get access to all the technical courses inside the program, also the ones I plan to make in the future! Check out the technical courses below 👇
    _____________________________________________________________
    In this video 📝 I'll show you How To Train Neural Networks on TPUs in Google Colab. First, we will start talking about what a TPU is and why it is good for training Neural Networks. We will then go over how to set up the TPU in Google Colab and how to initialize it. When everything is set up we are going to create a CNN and train it on the MNIST dataset and see the performance of training the Neural Network on the TPUs.
    If you enjoyed this video, be sure to press the 👍 button so that I know what content you guys like to see.
    _____________________________________________________________
    🛠️ Freelance Work: www.nicolai-nielsen.com/nncode
    _____________________________________________________________
    💻💰🛠️ High Earner AI Career Program: www.nicolai-nielsen.com/aicareer
    ⚙️ Real-world AI Technical Courses: (www.nicos-school.com)
    📗 OpenCV GPU in Python: www.nicos-school.com/p/opencv...
    📕 YOLOv7 Object Detection: www.nicos-school.com/p/yolov7...
    📒 Transformer & Segmentation: www.nicos-school.com/p/transf...
    📙 YOLOv8 Object Tracking: www.nicos-school.com/p/yolov8...
    📘 Research Paper Implementation: www.nicos-school.com/p/resear...
    📔 CustomGPT: www.nicos-school.com/p/custom...
    _____________________________________________________________
    📞 Connect with Me:
    🌳 linktr.ee/nicolainielsen
    🌍 My Website: www.nicolai-nielsen.com/
    🤖 GitHub: github.com/niconielsen32
    👉 LinkedIn: / nicolaiai
    🐦 X/Twitter: / nielsencv_ai
    🌆 Instagram: / nicolaihoeirup
    _____________________________________________________________
    🎮 My Gear (Affiliate links):
    💻 Laptop: amzn.to/49LJkTW
    🖥️ Desktop PC:
    NVIDIA RTX 4090 24GB: amzn.to/3Uc7yAM
    Intel I9-14900K: amzn.to/3W4Z5Cb
    Motherboard: amzn.to/4aR6wBC
    32GB RAM: amzn.to/3Jt2XVR
    🖥️ Monitor: amzn.to/4aLP8hh
    🖱️ Mouse: amzn.to/3W501GH
    ⌨️ Keyboard: amzn.to/3xUGz5b
    🎙️ Microphone: amzn.to/3w1F1WK
    📷 Camera: amzn.to/4b4Ryr9
    _____________________________________________________________
    Tags:
    #TrainingNeuralNetwork #TPU #NeuralNetworks #DeepLearning #NeuralNetworksPython #NeuralNetworksTutorial #DeepLearningTutorial #Keras #Tensorflow #GoogleColab
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 21

  • @NicolaiAI
    @NicolaiAI  ปีที่แล้ว +1

    Join My AI Career Program
    www.nicolai-nielsen.com/aicareer
    Enroll in My School and Technical Courses
    www.nicos-school.com

  • @bigredwatermelon
    @bigredwatermelon ปีที่แล้ว

    Great content Nicolai. I had some issues when I tried setting up a TPU model, and your video has really helped me. Thank you for sharing this knowledge

  • @abhishekprajapat415
    @abhishekprajapat415 3 ปีที่แล้ว +7

    So before you start using TPUs let me tell you a couple of things:
    1) A TPU has multiple cores (a TPUv3.8 for example has 8 cores.) and hence you should choose your batch size in a multiple of 8. better if it's 128.
    2) A TPU has huge memory (a TPUv3.8 for example has 128GB memory) which means it can train an EfficientNetB7 which is a beast of a model.
    3) When training TPUs with Custom data. the data should be in GCS as only that is registered.
    4) You can't use ModelCheckpoint callback with TPU to save your model as the TPU is not located at your local runtime. ( save model weights after training.)
    5) While using TPU your learning rate could also be in a multiple of 8. say if you used 0.001 for GPU then use 0.008 for the TPU which has 8 cores.
    6) When using Custom data always make prefetch calls so that TPU doesn't sit idle.
    7) Last but not least. Use TPUs only when you truly need them.
    Thanks.

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว +2

      Thank you very much for a detailed list! I'm sure a lot of people can use that

    • @abhishekprajapat415
      @abhishekprajapat415 3 ปีที่แล้ว

      ​@@NicolaiAI I wanted to ask that is it necessary to upload your data to the GCS for using TPU with custom data. I haven't used TPUs in google. I used them on Kaggle only and it's quite easy there. If I have my data on the drive and I copy it to the session then can I use it alongside TPUs.
      Edit: I cross-checked it and yes it should be registered in the GCS.

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว

      @@abhishekprajapat415 i dont think so, i guess u have to use GCS which is not easy to use at all! And can also be very expensive if u don't know what u are doing over time and u don't stop the processes again correctly

    • @jonathanloganmoran
      @jonathanloganmoran ปีที่แล้ว

      @@NicolaiAI Hey everyone - fantastic info here.​ Are there any updates to @Abhishek Prajapat's question? I'm about to start a project with Cloud TPU API and want to avoid as much GCP costs as possible. My project entails fetching data from a GCS bucket that I don't own, and I want to use Colab as the host VM to avoid Compute Engine fees. Is this possible?

  • @moneyadmin
    @moneyadmin 3 ปีที่แล้ว +1

    You are really awesome!!! Thank you very much please continue with this excellent work

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว

      Thank you very much! It motivates me to keep going

  • @derkertherblack6177
    @derkertherblack6177 3 ปีที่แล้ว +1

    Every video is amazing, thank you very much

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว

      Thank you so much! I really appreciate it

  • @sarahrozas6485
    @sarahrozas6485 2 ปีที่แล้ว

    hello, i get the missing TPU error but I did change the hadware accelator to TPU. Do you have an idea why ?

  • @TheInsightBytes
    @TheInsightBytes 2 ปีที่แล้ว

    my parameter space only has 30K features, and when I get to the training part, it keeps saying "session crashed"... can you help me, please?

  • @TheFireblitz
    @TheFireblitz 3 ปีที่แล้ว +1

    Thanks for sharing your awesome tutorial.
    I've followed your video step by step to TPU but I've got error when I try to train my model with TPU strategy in Google Colab Free version.
    Unavailable: failed to connect to all address
    Is TPU strategy only available in Google Colab Pro?

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว +1

      Nope you can use tpu on the free colab aswell. Are u using a dataset from tensorflow that is already built in? If u have ur own dataset u need to have it in Google cloud service. All the data should be in Googles cloud so they can load the images from there to their TPUs

    • @TheFireblitz
      @TheFireblitz 3 ปีที่แล้ว

      @@NicolaiAI I'm using Google Cloud Storage Bucket to store my dataset.
      I compress my dataset inside a zip file and store them in a Google Cloud Storage bucket.
      I made custom data generator to read my datasets directly from the zip file without unzip it using python zipfile library and feed them to my tf keras model.
      Is reading the datasets directly from a zipfile without unzip it caused the error that I got?
      It's working fine if I use CPU or GPU runtime in Google Colab

    • @NicolaiAI
      @NicolaiAI  3 ปีที่แล้ว

      Then it should work with the setup from the video. When do u get the error? If u do the setup u should be able to see all TPUs available and their address as in the video

  • @pulkitratnaganjeer6950
    @pulkitratnaganjeer6950 3 ปีที่แล้ว

    Getting this error:
    (0) Unavailable: {{function_node __inference_train_function_100262}} failed to connect to all addresses

  • @hoaxuan7074
    @hoaxuan7074 3 ปีที่แล้ว

    AI462 neural networks.

  • @user-hq4ks4ot9n
    @user-hq4ks4ot9n 8 หลายเดือนก่อน

    Hi, my session keeps crashing and restarting when i run the code: resolver =
    tf.distribute.cluster_resolver.TPUClusterResolver(tpu_address)
    tf.config.experimental_connect_to_cluster(resolver)
    tf.tpu.experimental.initialize_tpu_system(resolver)
    please do help