220 - What is the best loss function for semantic segmentation?

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 มิ.ย. 2021
  • IoU and Binary Cross-Entropy are good loss functions for binary semantic segmentation. but Focal loss may be better.
    Focal loss is good for multiclass classification where some classes are easy and other difficult to classify. ​It is just an extension of the cross-entropy loss.​ It down-weights easy classes and focuses training on hard to classify classes.​ In summary, focal loss turns the model’s attention towards the difficult to classify examples.​
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 77

  • @Avalaxy
    @Avalaxy 2 ปีที่แล้ว +11

    Probably the best explanation I have ever seen on this topic. I love the simple clear code samples, clear visual examples and the good explanation. Thank you!

  • @rajpulapakura001
    @rajpulapakura001 4 หลายเดือนก่อน +1

    Best explanation for segmentation loss ever!

  • @maryamomar4106
    @maryamomar4106 2 ปีที่แล้ว +1

    I cloud say thank you a hundred times, and that wouldn't be enough. Thank you! you made my life a breeze.

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว

      Glad I could help! Keep watching :)

  • @alcasla90
    @alcasla90 2 ปีที่แล้ว

    Great work and great explanation. Thanks

  • @gactve2110
    @gactve2110 2 ปีที่แล้ว

    Great video! clear and on point!
    You have a new sub :)
    Thank you

  • @rbhambriiit
    @rbhambriiit 6 หลายเดือนก่อน

    Nice lecture. One suggestion: You should add the links to the research papers referred to in the description.

  • @bijoyalala5685
    @bijoyalala5685 3 ปีที่แล้ว +1

    Hello Sreeni Sir, thank you for your informative video about focal loss explanation. In my sementic segmentation model I have using
    dice_loss = sm.losses.DiceLoss()
    focal_loss = sm.losses.BinaryFocalLoss()
    total_loss = (1 * focal_loss) + dice_loss and Adam as optimizer with 0.0001 learning rate and momentum=0.9 . The loss is decreasing and keep aproaching towards negative value. After 100 epochs, training loss shows (-51.3400). Is the negative loss value is incorrect? Let me know some solution when loss is going to negative value.

  • @diegozegarra4973
    @diegozegarra4973 3 ปีที่แล้ว

    Thank you so much!

  • @utei9502
    @utei9502 2 ปีที่แล้ว +1

    Hi Sir, thank you so much for a very clean and easy-to-understand explanation!
    I wonder, for multi-class semantic segmentation, do you have results that compare segmentation networks' performance on actual data using CE and FL? Does FL help improving the overall accuracy as well as the speed of convergence? Actually, for accuracy assessment, is average IoU a good metric, or is there a better one, especially for data with imbalanced samples?

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว

      It would be a good test to see how fast the model converges with CE vs FL and the effect of each loss function on the overall and individual IoU values. This work has already been done by the original authors of the paper so I try not to repeat much work. I will trust their observations.
      I find IoU to be the best metric to evaluate semantic segmentation. I look at mean IoU during training but to evaluate, I always look at IoU for each class. That is the only way to find out the effectiveness of model at segmenting various classes.

  • @emanalajrami1919
    @emanalajrami1919 2 ปีที่แล้ว +1

    Hi Sir, thank you for this amazing explanation. I am using Unet for segmentation. how can I add a dropout to it? and I use Hausroff Dustance as a mertic but it did not improve that much even when I increase the size of the dataset for training? Di you have any idea to help? Thanks

  • @ryoungseobkwon9660
    @ryoungseobkwon9660 3 ปีที่แล้ว

    Thank you!

  • @krocodilnaohote1412
    @krocodilnaohote1412 2 ปีที่แล้ว

    Great stuff, thank you!

  • @onlyjimmy4ever391
    @onlyjimmy4ever391 2 ปีที่แล้ว

    LOVE UR CLASS SO MUCH SIR!

  • @arnabmishra827
    @arnabmishra827 ปีที่แล้ว

    What is the relative merits and demerits of using IoU vs. BCE loss for semantic segmentation Sir? Recently many of the research papers have considered using pixel-wise BCE loss as the primary loss functions for semantic segmentation / salient object detection tasks. Can you please explain.

  • @umeshpathak825
    @umeshpathak825 3 ปีที่แล้ว

    thank you so much sir

  • @moazeldefrawy4379
    @moazeldefrawy4379 2 ปีที่แล้ว

    Thank You!

  • @caiyu538
    @caiyu538 2 ปีที่แล้ว

    I understand the meaning of focal loss, what it is purpose and the library you mentioned. it looks that we do not need to learn the details how to implement it (add more weight to less probability classification ). I think most of them are like black box. We only need what it is for and input and output. Thank

  • @bikram2955
    @bikram2955 11 หลายเดือนก่อน

    Great explanation!

    • @DigitalSreeni
      @DigitalSreeni  11 หลายเดือนก่อน

      Glad you think so!

  • @rezatabrizi4390
    @rezatabrizi4390 3 ปีที่แล้ว

    thank you, I have a question about the lovasz loss function for binary segmentation, we can have a video about the application of this loss function.
    ? thank you so much

  • @rezatabrizi4390
    @rezatabrizi4390 3 ปีที่แล้ว

    thank you, I have a question about the Lovasz loss function for binary segmentation, can we have a video about the application of this loss function, please.?
    thank you so much

  • @davidvc4560
    @davidvc4560 ปีที่แล้ว

    excellent!

  • @reemawangkheirakpam8165
    @reemawangkheirakpam8165 3 ปีที่แล้ว

    thank you sir

  • @tonihullzer1611
    @tonihullzer1611 2 ปีที่แล้ว

    Is there a reason you do not threshold your Y pred values for the metric? Because at the end of the day, they have to be binarized in order to get a mask.

  • @user-lx9xw6fb5q
    @user-lx9xw6fb5q 3 ปีที่แล้ว

    Thank you for nice video. I have a simple question, for training UNET architecture(Semantic segmentation), do we have to prepare same size Images? Or is it okay to train with diverse size of images?? (I am using CVAT for making training images) Thank you so much :)

    • @utei9502
      @utei9502 2 ปีที่แล้ว

      For training, you'll need to crop the larger images to the same size as the smaller images, so that they can be concatenated into a 4D array and fed into the network.
      You may also want to crop images into smaller tiles, if you have limited GPU memory. If properly done, tiling also improves training performance.
      For inferencing though, you can feed images of different sizes into the network, one at a time.

  • @mingjuhe773
    @mingjuhe773 ปีที่แล้ว

    Thanks bro!

  • @pietheijn-vo1gt
    @pietheijn-vo1gt 2 ปีที่แล้ว +1

    7:15 I want to add something here. I have had very poor performance on datasets with class imbalance due to this +1 term. I think it's due to the fact that it works kind of like a 'smoothing' factor for your loss function. Only when I dropped this number to something like 0.1 or even lower could I get any representation of the smallest classes in my predictor, it was a huge revelation when I figured this out after a few days of headaches.

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว +3

      For class imbalance data, you may find focal loss to do a better job.

    • @AnwerShahzaib
      @AnwerShahzaib ปีที่แล้ว

      @@DigitalSreeni In one of your playlist where you used Brats2020 Dataset for sementic segmentation for the multiple classes. I have been on a project that's specific to only (t1ce and and it's mask[mask==2] = 1) , basically a binay segmentaion model to detect tumor core. The class imbalance problem persists after experimenting with BinaryFocalLoss, total loss (dice_loss + (1 * focal_loss)). The model overfit for the class 0 (background) and overfit in the initial phase of traning

    • @nuttapatchaovanapricha6844
      @nuttapatchaovanapricha6844 3 หลายเดือนก่อน

      @@DigitalSreenias z as cvv f see a

  • @lazy.researcher
    @lazy.researcher 2 ปีที่แล้ว

    you are genious

  • @marcospaulobatista2375
    @marcospaulobatista2375 2 ปีที่แล้ว

    Excellent video! Have you seen focal loss with GAN pix2pix segmentation?

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว +1

      No, I only saw binary cross entropy.

  • @PauloZiemer
    @PauloZiemer 3 ปีที่แล้ว

    Thanks

  • @user-gb6py4re3o
    @user-gb6py4re3o 3 ปีที่แล้ว

    Good!

  • @valeras9416
    @valeras9416 2 ปีที่แล้ว

    Thanks for awesome explanation. You saved my day!

  • @learnmore3647
    @learnmore3647 2 ปีที่แล้ว

    Hi ,
    Please, i want to know what is the correct implementation of the dice coefficient?
    This code :
    def dice_coef1(y_true, y_pred, smooth=1):
    intersection = K.sum(y_true * y_pred, axis=[1,2,3])
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
    dice = K.mean((2. * intersection + smooth)/(union + smooth), axis=0)
    return dice
    gives me 0.82, while this code :
    def dice_coef2(target, prediction, smooth=1):
    numerator = 2.0 * K.sum(target * prediction) + smooth
    denominator = K.sum(target) + K.sum(prediction) + smooth
    coef = numerator / denominator
    return coef
    gives me 0.94.
    Thank you.

  • @ExV6120
    @ExV6120 2 ปีที่แล้ว +1

    How about Focal loss vs Dice loss?

  • @florianhofstetter6859
    @florianhofstetter6859 3 ปีที่แล้ว

    Is there a loss function that takes into account the perspective of an image? For example, when segmenting lanes, pixels that are further on the horizon should be weighted higher because they cover a larger area of the road.

    • @DigitalSreeni
      @DigitalSreeni  3 ปีที่แล้ว +1

      As you know, an image is just a bunch of numbers and deep learning algorithms are application agnostic. This means, they do not understand perspective as we see it. There must be some analogous data that reflects the perspective. If so, we can create a custom loss function that reflects the perspective in the image. In your example, focal loss may be a good function as it is designed to focus more on wrongly classified data.

    • @florianhofstetter6859
      @florianhofstetter6859 3 ปีที่แล้ว

      @@DigitalSreeni Yes, I have information of the perspective. I could calculate the homography Matirx. Because it was always taken with the same camera, it does not change over the data set. I thought focal loss would be different for different classes. But my problem is about binary classification. - Lane or not lane on the road.

  • @briandavis9296
    @briandavis9296 2 ปีที่แล้ว

    Your code example may be oversimplifying the IoU loss. You use `x.sum()` which is generally going to be summing over the batch dimension. This gives a different result than the "proper" way of summing only in the spatial dimensions and then averaging/summing across the batch after the IoU is computed.

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว +1

      Yes, this is a simplification but works fine as a loss function that the optimizer is minimizing. You may want to get down to pixel granularity to compute IoU metrics but for loss function this works ok. But I do not recommend using IoU loss as there are much better approaches, for example focal loss.

  • @rs9130
    @rs9130 2 ปีที่แล้ว

    Hello Sreeni,
    Did you face problem of low validation score but high training score while using cross entropy loss, or any otherloss. Why does this happen? I even tried shuffling my data. I am using fcn model for multi class segmentation

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว +1

      If you are getting low validation score and high training score then your model is overfitting. I don't think it has to do with the loss function, it could be but you need to check other factors first. See if you can simplify the model (not too deep), use augmentation to generalize it, try early stopping, add dropout layers, etc.

  • @nouhamejri1698
    @nouhamejri1698 3 ปีที่แล้ว

    im using focal+dice loss for multiclass semantic segmentation im getting good results but loss is always 0.7 and doesn't decrease and if I use cross_entropy loss I got 0.01 loss what does this means?

    • @DigitalSreeni
      @DigitalSreeni  3 ปีที่แล้ว +1

      Please try a different optimizer or change the learning rate and also try changing the way you initialize the weights.

  • @salmahayani2710
    @salmahayani2710 3 ปีที่แล้ว

    Hello firstly thankx for this useful video , i wanna ask something about dice coef loss , i'm doing a semantic segmentation on 3D Ct scan (Luna16 database )using 3D unet i have a problem that my dice loss function blocked in 50% and don't decrease anymore for both training and validation, do you have any idea what could be the problem?
    Waiting for your answer :)

    • @petercappetto928
      @petercappetto928 2 ปีที่แล้ว

      Have you normalized your data? I am working with Luna 16 and a 3D U-Net, as well. I forgot to normalize the data and experienced a very high validation loss. Once I normalized the data, the network performance improved drastically.

    • @salmahayani2710
      @salmahayani2710 2 ปีที่แล้ว

      @@petercappetto928 yeahh of corse i did it (what i under from normalized is that all pixels are between 0and 1 or to transform
      images to binary for 0 or 1??) the problem is that my loss error when arrive to 0.50 don't get down anymore, i'm really blocked don't know if the problem with data or with my network ,i will be so grateful if u show me how to deal with this .

    • @talha_anwar
      @talha_anwar 2 ปีที่แล้ว

      @@petercappetto928 it is not mandatory to loss function go below 0.5, it's not something between 0 and 1

  • @nagavenik4862
    @nagavenik4862 ปีที่แล้ว

    Sir what is main Loss , attention loss, inter class loss

  • @jithinnetticadan4958
    @jithinnetticadan4958 3 ปีที่แล้ว

    Do you know why the validation accuracy is higher than training initially (a gap of 15% approx) but after reaching 65% the validation accuracy starts decreasing though training improves?

    • @chiragchauhan8429
      @chiragchauhan8429 3 ปีที่แล้ว

      I would suggest you to use a smaller network or try decreasing your learning rate to like 0.00001/0.0008 something.

    • @jithinnetticadan4958
      @jithinnetticadan4958 3 ปีที่แล้ว

      @@chiragchauhan8429 learning rate is set to 1e-6. If i increase it these values keep increasing and decreasing abruptly. By using smaller network do you mean I should start with 8 and go till 256(ie the bottleneck layer)?

    • @chiragchauhan8429
      @chiragchauhan8429 3 ปีที่แล้ว

      @@jithinnetticadan4958 By smaller network I mean decrease the number of layers. Keep the network simple and try experimenting with max pooling and average pooling. How many no. of layers you are using?

    • @jithinnetticadan4958
      @jithinnetticadan4958 3 ปีที่แล้ว

      Right Now 5 layers (including encoder and bottleneck) starting from 16 to 256 and learning rate set to 1e-6. Also will give a try using average pooling. 👍

    • @DigitalSreeni
      @DigitalSreeni  3 ปีที่แล้ว +1

      I recommend changing network architecture only if you see an overfitting problem or other issues at the end of training. The first few epochs of training may look weird but nothing to worry. Validation accuracy may be higher than training accuracy if the model is better representing validation data, this happens with small datasets or sometimes when validation data is indeed easy to segment. Try changing the random seed to split train and validation, this gives new validation data that may not be as easy to segment.

  • @ashwiniyadav464
    @ashwiniyadav464 2 ปีที่แล้ว

    Sir what is the best loss function for classification of x rays

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว

      Loss functions do not care about the application; it can be x-rays or CT or satellite images. I recommend focal loss for semantic segmentation.

    • @ashwiniyadav464
      @ashwiniyadav464 2 ปีที่แล้ว

      Sir please suggest me latest and best denoising techniques for denoising medical images

  • @muhammadroshan7315
    @muhammadroshan7315 2 ปีที่แล้ว

    Is focal loss best for binary example too?

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว

      Yes. Of course, depends on the problem itself but I have no reason to question it for binary.

  • @gunjannaik7575
    @gunjannaik7575 2 ปีที่แล้ว

    Why some people write loss as (1- coefficient value) and some as -coefficient value?

    • @DigitalSreeni
      @DigitalSreeni  2 ปีที่แล้ว

      If the coefficient value is a number smaller than 1 but still positive (e.g., 0.005, 0.1, 0.5, 0.9, etc.) then (1-coeff.) makes sense. If the coeff is a negative value (e.g., -5, -4, -3, etc.) then (-coefff) makes sense. The optimizer's job is to minimize this loss function.

  • @talha_anwar
    @talha_anwar 2 ปีที่แล้ว

    please cover other loss functions also