José, you're a beacon of light for the Java community! You (when it comes to Java) and Josh Long (when it comes to Spring Boot) never cease to amaze me with your interesting and in-depth explanations of all kinds of modern stuff in the Java world. Please never stop!
Thanks for this great presentation! It's really amazing how it goes into small details while the overall structure remains absolutely clear. The consistent pacing that keeps up the interest and allows to easily follow for more than 30 minutes is true craftsmanship. It would have been interesting to also see the equals method that is generated by Lombok. It's of course impossible to measure in a reliable way but I wouldn't be surprised if there are more Lombok than IDE generated equals methods out there in the world of Java business applications.
Hi José, you correctly point out that one has to measure and don't guess. I know that a lot of mistakes can be done writing JMH tests too. Could you also show the important parts of those tests in your examples too? I think this could be also very informative. Besides that it is always a great source of information!
Of course. There is no trick, the code is in the video at 15:18. All the classes I use are records with the equals() methods that are also showed. So there are 9 record classes with the different equals() methods. And then two runs, one with no glitch (4 data sets) and another one with gliches (6 data sets). The JMH configuration is the following: 5 warmup iterations, 10 measurement iterations, fork is 3, and each iteration is 400 milliseconds. All this depends on your machine and the errors you have. If it's too high, then you need to either have longer runs, or increase the measurement iterations. What may change your result is the locality of your data. These are records, so rather small objects, and I use ArrayList to store them. So the locality shouldn't be too bad.
José I also wanted to thank you for the cafe theme here. I'm in a position where I still need to avoid actual cafes due to Covid, and I really miss real cafe chats with colleagues. I know it's just a gimmick, but the cafe setting really resonates with me.
I like the format of the JEP Café, so first of all thank you for all the input you gave me for my daily work. In the last months the videos became longer and longer, so now it is more a lunch break than a coffee break for me... For me, it is a bit difficult to spend half an hour during work to watch a video, whilst i often can take ten minutes for a "coffee break". Dear José and team, do you think that maybe it is possible to come "back to the roots" and make shorter JEP Café videos?
Oh, dear.. José, I find it mildly dangerous that you're hyping versions of the equals method, that break its contact. Example: if B extends A, and B just inherits the fields on A, just adds functionality, then using instanceof or pattern matching, objects of class B can be equal objects of class A, but not the other way round, which is a breach of the equals contract and can cause hard to find bugs, for example in searching and sorting algorithms.
Remember when we had to implement the hashmap by hand in Modula-2 at uni? So long ago. No maps, no lists in the language. Only arrays. Now we have AI-assisted code generation... which I'm gonna try soon... I'm not trusting it but let's see what the little stochastic parrot tells me ...
It's a bad Java design that every object has hashCode. It should be an interface called Hashable, and then Set requiring an instance of hashable. Right now, you have to implement hashCode defensively, just in case somebody is going to put it into a hash set.
Hi José. Thanks for this valuable insight. I wonder if the same result would be achieved with a more complicated object to compare (with Strings maybe?). Then first checking for equality may be wise.
You are comparing apples with oranges when you compare implementations of equals that check for same class and others that use instanceof. This will give different results when you have subclasses!
We are using records here, so no subclass. And yes, when you subclass a class that has an equals() method, you should always carefully check it, and override it when needed. Replacing instanceof with a class check may look like you are solving your problem, but if you add state in your subclass, it will probably not.
@@JosePaumardYou are of course right for the records 👍🏻 For common classes I learned to compare class-instances if there may be subclasses. If an instance of a subclass is equal to an instance of its superclass it may probably break the transitivity of equals. So only use instanceof in final classes.
@@dirkj.3234I agree, this is the kind of thing you need to have in mind when you are designing your object model. If you cannot have a final class for some reason, then you can also protect yourself by making your equal method final (this is what is done with the JEP). But it still a weak protection, as someone else can easily remove it. And you'll end up with objects that are equal when they are not of the same type. What's important to keep in mind imho is that instanceof (and pattern matching) are not only checking the exact type, they'll be true for the subtypes. Learning a solution is nice, but it's better to understand the root cause of the problem ;)
@@JosePaumard Thanks. I'm pretty sure that I know how it's working 😉 instanceof gives true for every instance of a subclass and that can be a problem if the subclass contains additional fields.
Is try-catch expensive as if statements, What happens if you eliminate all if statements. public boolean equals(Object o) { try { Point p = (Point)o; return p.x == x && p.y == y; } catch(ClassCastException e) { return false; } }
The object identity check is important for larger records, especially if they contain nested records. Checking the contents would be O(n) in the total number of fields, while checking identity is O(1). I suspect checking the identity is only slow with records composed entirely of primitive fields.
That's a good point. It would be great to have the source code for this benchmark so it would be easy to extend it to more complex objects and see if it makes a difference.
I hope there is special sector for JPA entity too. I've been following JPA Buddy advices when generating equals/hashCode, but would Jose deliver us that too?
@@VuLinhAssassin It's mostly an ill-posed problem, because of the life cycle of an entity, and because you can observe all the steps of this life cycle. So I'm not sure that there is any satisfying answer to that question. For instance: you create an entity, its primary key is not set yet. At some point its primary key is set. For some reason you need to store this entity in a HashSet. If you add it before it has its primary key set, and check if it's there with a contains() when its primary key has been set, you'll be happy not to have taken into account the primary key in the equals / hashCode implementations. Is this what you would expect?
@@JosePaumard JPA Buddy plugin suggested I use the hashCode of the class (it can be a normal entity class or a Hibernate proxy class, so a check instanceof HibernateProxy is needed), like getClass().hashCode(). What do you think of this implementation?
Problem is that branch predictor's vary greatly on different CPU's, so it would have been great to run all of this on different CPU's from different vendors
With these numbers, would it be better to remove the instance equals check in the default record implementation? Source: jdk/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java:225
Skipping the instance check can also be valuable when memory is tight. If you use interning on your records, then identical records are represented by the same object (which can save memory) and then the instance check will not only succeed more often, but will also avoid loading the fields from memory, thereby reducing cache footprint.
I appreciate the attention to the lower level performance considerations of java. A lot of people seem to neglect that and just leave it to the "the jit will fix it" without actually understanding what the jit does
It could, it really depends on the complexity of the states you need to compare. Especially when you compare objects that are equal. In that case, you execute all the tests before knowing that the result is true.
Yeah i have seen this example before where checking if(this == o) just is slower. My mentality is fairly simple in that regard. Simply manually do the checks yourself in your head and if it seems slower then you want to compare. Though at some point these few nano seconds do not make a difference anymore. If you are comparing collections, it is a good idea to have this slower check in place because it can save you milliseconds if the size is large enough or the comparison logic is just slower due to the type of implementation. (Sets are faster to compare then lists for example)
The problem is that predictive branching is not the only issue. this == other also messes up with your GC. So for large objects and in a real application, things get more complicated, and not in favor of instance check.
@@JosePaumard how does it mess with your GC? Could you elaborate on that? Well yeah in most real world applications it doesn't help, but there is scenarios where the small check cost outperforms if the comparison of the object itself is expensive enough..
You can just do the branching here, and trust me, the compiler WILL optimize it for you, that's why you will see your code and decompiled code different, for example, turning preemptive return into nested if-else.
José, you're a beacon of light for the Java community! You (when it comes to Java) and Josh Long (when it comes to Spring Boot) never cease to amaze me with your interesting and in-depth explanations of all kinds of modern stuff in the Java world. Please never stop!
Thank you for your kind words, I really appreciate them! And I'll pass the compliment to Josh, I'm sure he will appreciate it also.
Thanks for this great presentation! It's really amazing how it goes into small details while the overall structure remains absolutely clear. The consistent pacing that keeps up the interest and allows to easily follow for more than 30 minutes is true craftsmanship.
It would have been interesting to also see the equals method that is generated by Lombok. It's of course impossible to measure in a reliable way but I wouldn't be surprised if there are more Lombok than IDE generated equals methods out there in the world of Java business applications.
Hi José, you correctly point out that one has to measure and don't guess. I know that a lot of mistakes can be done writing JMH tests too. Could you also show the important parts of those tests in your examples too? I think this could be also very informative. Besides that it is always a great source of information!
Of course. There is no trick, the code is in the video at 15:18. All the classes I use are records with the equals() methods that are also showed. So there are 9 record classes with the different equals() methods. And then two runs, one with no glitch (4 data sets) and another one with gliches (6 data sets).
The JMH configuration is the following: 5 warmup iterations, 10 measurement iterations, fork is 3, and each iteration is 400 milliseconds. All this depends on your machine and the errors you have. If it's too high, then you need to either have longer runs, or increase the measurement iterations.
What may change your result is the locality of your data. These are records, so rather small objects, and I use ArrayList to store them. So the locality shouldn't be too bad.
Brilliant video, as always. Thank you!
The Apache Commons's Builder-based code is hardly more "readable" and definitely seems about 20-times slower than the rest.
Any dependency is a potential source of bugs, attack vectors, or end of life problems like using deprecated or even removed API.
Take care, you are posting a lot of videos, that is plenty of coffe.... Anyway, thanks you for the great content!
José I also wanted to thank you for the cafe theme here. I'm in a position where I still need to avoid actual cafes due to Covid, and I really miss real cafe chats with colleagues. I know it's just a gimmick, but the cafe setting really resonates with me.
Thank you, and sorry to read that. I hope you'll be better soon enough!
Thanks a lot for this material, really relevant. Hat down for Mister José Paumard.
Thank you Khaled! 👍
Many thanks José for this informative explanations. Great video again.
Thank you!
I like the format of the JEP Café, so first of all thank you for all the input you gave me for my daily work. In the last months the videos became longer and longer, so now it is more a lunch break than a coffee break for me... For me, it is a bit difficult to spend half an hour during work to watch a video, whilst i often can take ten minutes for a "coffee break". Dear José and team, do you think that maybe it is possible to come "back to the roots" and make shorter JEP Café videos?
Oh, dear.. José, I find it mildly dangerous that you're hyping versions of the equals method, that break its contact.
Example: if B extends A, and B just inherits the fields on A, just adds functionality, then using instanceof or pattern matching, objects of class B can be equal objects of class A, but not the other way round, which is a breach of the equals contract and can cause hard to find bugs, for example in searching and sorting algorithms.
Remember when we had to implement the hashmap by hand in Modula-2 at uni? So long ago. No maps, no lists in the language. Only arrays. Now we have AI-assisted code generation... which I'm gonna try soon... I'm not trusting it but let's see what the little stochastic parrot tells me ...
It's a bad Java design that every object has hashCode. It should be an interface called Hashable, and then Set requiring an instance of hashable.
Right now, you have to implement hashCode defensively, just in case somebody is going to put it into a hash set.
Hi José. Thanks for this valuable insight. I wonder if the same result would be achieved with a more complicated object to compare (with Strings maybe?). Then first checking for equality may be wise.
You are comparing apples with oranges when you compare implementations of equals that check for same class and others that use instanceof. This will give different results when you have subclasses!
We are using records here, so no subclass. And yes, when you subclass a class that has an equals() method, you should always carefully check it, and override it when needed. Replacing instanceof with a class check may look like you are solving your problem, but if you add state in your subclass, it will probably not.
@@JosePaumardYou are of course right for the records 👍🏻
For common classes I learned to compare class-instances if there may be subclasses. If an instance of a subclass is equal to an instance of its superclass it may probably break the transitivity of equals. So only use instanceof in final classes.
@@dirkj.3234I agree, this is the kind of thing you need to have in mind when you are designing your object model.
If you cannot have a final class for some reason, then you can also protect yourself by making your equal method final (this is what is done with the JEP). But it still a weak protection, as someone else can easily remove it. And you'll end up with objects that are equal when they are not of the same type.
What's important to keep in mind imho is that instanceof (and pattern matching) are not only checking the exact type, they'll be true for the subtypes. Learning a solution is nice, but it's better to understand the root cause of the problem ;)
@@JosePaumard Thanks. I'm pretty sure that I know how it's working 😉
instanceof gives true for every instance of a subclass and that can be a problem if the subclass contains additional fields.
Is try-catch expensive as if statements, What happens if you eliminate all if statements.
public boolean equals(Object o) {
try {
Point p = (Point)o;
return p.x == x && p.y == y;
}
catch(ClassCastException e) {
return false;
}
}
It's usually more expensive. You can try to bench it though.
The object identity check is important for larger records, especially if they contain nested records. Checking the contents would be O(n) in the total number of fields, while checking identity is O(1). I suspect checking the identity is only slow with records composed entirely of primitive fields.
That's a good point. It would be great to have the source code for this benchmark so it would be easy to extend it to more complex objects and see if it makes a difference.
Great deep dive! Who would have thought that the equals method, visited by so many people, could have new insights? Thank you.
Optimizing for what?
Pattern matching is faster than if's
@@corinnarust sounds like an opportunity for compiler optimization.
My equals methods are always of the form of return this == obj || obj instanceof MyClass other && x == other.x && y.equals(other.y);
I hope there is special sector for JPA entity too. I've been following JPA Buddy advices when generating equals/hashCode, but would Jose deliver us that too?
Sorry to disappoint you but no, this point is not covered.
@@JosePaumardThen would you be able to cover that one day?
@@VuLinhAssassin It's mostly an ill-posed problem, because of the life cycle of an entity, and because you can observe all the steps of this life cycle. So I'm not sure that there is any satisfying answer to that question.
For instance: you create an entity, its primary key is not set yet. At some point its primary key is set. For some reason you need to store this entity in a HashSet. If you add it before it has its primary key set, and check if it's there with a contains() when its primary key has been set, you'll be happy not to have taken into account the primary key in the equals / hashCode implementations. Is this what you would expect?
@@JosePaumard JPA Buddy plugin suggested I use the hashCode of the class (it can be a normal entity class or a Hibernate proxy class, so a check instanceof HibernateProxy is needed), like getClass().hashCode(). What do you think of this implementation?
Problem is that branch predictor's vary greatly on different CPU's, so it would have been great to run all of this on different CPU's from different vendors
With these numbers, would it be better to remove the instance equals check in the default record implementation?
Source: jdk/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java:225
2:58 In this example, we iterate on an empty set and add its own elements to it, isn't this weird?
😆Indeed it is. Maybe a this somewhere coud fix that?
Skipping the instance check can also be valuable when memory is tight. If you use interning on your records, then identical records are represented by the same object (which can save memory) and then the instance check will not only succeed more often, but will also avoid loading the fields from memory, thereby reducing cache footprint.
I appreciate the attention to the lower level performance considerations of java. A lot of people seem to neglect that and just leave it to the "the jit will fix it" without actually understanding what the jit does
Does the number of instance fields matter for comparing these implementations?
It could, it really depends on the complexity of the states you need to compare. Especially when you compare objects that are equal. In that case, you execute all the tests before knowing that the result is true.
How do you measure performance? Was that a dependency or?
You want to use JMH for that -> github.com/openjdk/jmh
I think the hash bucket becomes a tree only if the value type implements Comparable. Otherwise you're stuck with a list.
No, it also works with keys that are not Comparable. It uses System.identityHashCode() in that case.
@@JosePaumardthanks! I stand corrected.
Yeah i have seen this example before where checking if(this == o) just is slower.
My mentality is fairly simple in that regard.
Simply manually do the checks yourself in your head and if it seems slower then you want to compare.
Though at some point these few nano seconds do not make a difference anymore.
If you are comparing collections, it is a good idea to have this slower check in place because it can save you milliseconds if the size is large enough or the comparison logic is just slower due to the type of implementation. (Sets are faster to compare then lists for example)
The problem is that predictive branching is not the only issue. this == other also messes up with your GC. So for large objects and in a real application, things get more complicated, and not in favor of instance check.
@@JosePaumard how does it mess with your GC? Could you elaborate on that?
Well yeah in most real world applications it doesn't help, but there is scenarios where the small check cost outperforms if the comparison of the object itself is expensive enough..
@@Speiger th-cam.com/video/tWonozjIE-s/w-d-xo.html There is a more elaborate answer in a talk we did in french.
Jaava 🎉
why not just return (this == other || (other instanceof Point(int x, int y) && this.x == x && this.y == y)) there's no branching just short-circuiting
You can just do the branching here, and trust me, the compiler WILL optimize it for you, that's why you will see your code and decompiled code different, for example, turning preemptive return into nested if-else.
why not add support of operator overloading