I’m a big fan of the Rule of Three - you don’t know what the abstraction should look until you have three examples, so duplicating something once is good, then refactor instead of duplicating it again.
One more case when DRY do not apply is accidental duplication. It happens when two independent tasks in the moment could be implemented with identical code, but has different reasons to change.
Your example of the calculate_any_volume()-function reminded me of what my old professor used to call "control coupling". It was part of his lesson about "low coupling, high cohesion". I still consider that to be the basic principle of code design.
Nice video, as always, Arjan. You're certainly right that it can be hard to figure out how to extract the commonality from similar-but-not-quite-identical blocks of code. One technique I sometimes use is to slowly *introduce* code to increase the similarity of the blocks until they're true duplicates at which time extracting the duplication is straightforward. (I also like the example of trying too hard to eliminate duplication.)
At my company we started to avoid "DRY" quite a while ago, because not so experienced developers tried to take it by hearth and prematurely abstract EVERYTHING to a point where functions where not abvious if they where ther right fit for a case or not. We now rather follow "WET" -> write everything twice. Its fine to have a code duplication. The cases might be similar but not similar enough to be worth an extraction and ot might not even be worth the time to propperly abstract it (yet). Do you come by the same code a third time? Great, now is a good time to abstract it and reuse it a bit.
As a software engineer I can argue that the "calculate_any_volume" function is not a good generalisation. It's way harder to read/understand than the simple ones. In that specific example, I'd trade out a bit of duplication over fully understanding what the function is doing.
Someone in the Golang community jokingly suggested the WET principle: Write Everything Thrice to understand the patterns before removing duplication. Not a bad idea 😁
Just have seen your comment. I basically wrote the same in a different comment and we do the same (even tho we name it "write eveything twice") and only abstract the third time it comes around. Makes it easier & helps especially with unexperienced developers who "see the possiblity of a reuse" of something and directly abstract everything up 2 levels and lead to large functions with edge cases that never happens. :D It's a realy good principle. I like WET over DRY every day now.
I have got myself into trouble removing duplicate code and doing abstractions way too early before i really knew what they should look like. Being a neat freak has its issues. Code always runs though!
Always revisiting your code is key, especially when starting out and learning a language. As I got better with functions, I was able to de-dupe hundreds of lines of code. I found the quickest and easiest deduping related to plotting/matplotlib.
Nice! I hope you'll do another video about partial functions vs DRY principles: The `caculate_any_volume` of this video could be turned into the other volume func's as partials. In real-world code, what wold be some techniques, advantages and drawbacks to approaching DRY with partial functions? (But calculating a volume is laughably too contrived to illustrate the point!)
Hi Arjan, once you mentioned to use more pathlib :P. I just remembered when I watched your video. Maybe there is even more refactoring possible. Greetings from Germany.
the duplication becomes a problem around deadlines - ppl generally copy paste code and get to the deadline - focus on clean code is really really low.. one trick that we use is - after every release generally 3-4weeks - the next 3 days are spent on just reading the code and spotting these issues - feel thats a good way to spot where things are complicated / repeated..
@@radeksmola3422 i hear you.. but without deadlines, i dont think i would ever end up shipping a release.. there will always be something that i can improve / iterate.. :)
Many devs (people in general) have a hard time telling the difference between contract and coincidence. That’s driven too many PR complaints about duplicated code, even across boundaries that need to be maintained.
A different take on this: You could also argue that making code too generic often results in a function/module/whatever that simply gets too many responsibilities. Then it either gets too complex, or too abstract, or both. So then it violates the single responsibility principle. So you could say that a function that calculates any shape has multiple responsibilities. And in case of conflicting principles, it becomes a balancing act between them.
Just an observation. In the book, I wrote that every piece of _knowledge_ should have a single place of expression. Code is one representation of knowledge, but not the only one. It just happens to be the easiest one to spot :)
I like abstraction for calculation of area, but it is hard to read by looking at function call what is happening because general name of function. And by reading it from parameters is not easy.
Don’t use f-strings with logging! String manipulation takes a lot of resources and if you use an f-string, you often need to discard the result of that operation if your logger is not configured. Or you don’t use given log level. If do this in a loop, it really adds up.
You could remove the duplication in your example of calculating the volume / area without losing readability. 1) Create a function that calculates the volume of a shape given its bounding box dimensions and its ratio volume/volume of bounding box. You can also inserts checks here. 2) Define the function to calculate the area of known shapes by calling the function above, or using functools.partial and give intuitive names. This makes it also very easy to extend to new shapes.
I think the validate_dimensions function is too unclear. If it crossed my way, I wouldn't know that it raises an exception. I would have read the explicit implementation quicker. In part, this might be solved with a different name like 'raise_if_any_negative'. And that suggests the shorter implementation if any(x>0 for x in dimensions): raise Value error(...)
When I started web development with Django, I would often shoot myself in the foot by trying to create mixins to avoid duplication in views or forms. I always regretted it later 😂
I definitely copy multiple instances of the same code in my projects (copy paste alter) which results in 'quick and dirty' solutions amd of course bugs, after that I look to my code from a 'what is it doing exactly' and then optimize it by removing duplicate code.
I think DRY is even more important for test code to avoid duplication in the test setup logic. You should show that on another episode. For production code, IMO a vertical slice architecture with proper BDD tests is fine. Some duplication buys the ability to throw the feature away without any coupling to consider. In big systems, coupling is the true painpoint anyways.
DRY, like many rules, should be subject to the quote from Douglas Bader "Rules are for the obedience of fools and the guidance of wise men." Unfortunately, everyone thinks they are wise.
I most commonly encounter code duplication between large functions. The two functions do something different, but have a common set of steps between them. And there's usually one small but difficult thing that prevents easy de-duplication. For this reason, don't let a function's size get out of hand during its creation. Once it reaches a screen height, time to factor something out. Smaller functions are easier to re-use than larger ones, preventing future duplicate code from appearing.
if the functions do something different, they should be kept separate. And maybe the common steps you can abstract into a different thing. People are way too scared about long code, having it all in the same place, as long as it is understadable is okay.
The example of refactoring the calculations of area was interesting. Not only did it make the code more complicated, you inserted a for loop which makes it much slower. If you think multiplying a few things together needs to be refactored, you are the type that may over use the DRY principle. Let me say that again...
I am learning multi-agent ai to develop an earlier sepsis detection system using bayes network, I have good understanding of database distributed systems and network communication, what is my odds?
True - "get it working and get it out" is often more important than optimizing the code. *Tip:* overestimate your maintenance jobs. It buys you some time to reevaluate your choices. And never violate the basic architecture. It'll save you a lot of time later on.
@@HansBezemer During sprint planning, and _especially_ when assigning points, I make it clear that I'll be doing testing and cleanup as part of the ticket. An extra hour of thoughtful refactoring and cleanup can (and likely will) save you days of debugging and fixing.
There is a reason it is called DRY not DRC So there are exceptions to the guide. But if you find yourself repeating a lot. Either A.) There is no solution for that language. Or B.) The current code is much better than the solution.
If your "generalization" needs to many parameters - or if it requires suspect constructions like lots of references (instead of values), reevaluate your choices.
The only problem is that your examples are too complex, often involving video processing, and that kind of stuff that not everyone is familiar with. So, it makes it more difficult to grasp the design concept you want to put forward. I know it is certainly more difficult to crave simpler examples, but it will be more understandable for me, at least. I hope you will consider that as a constructive comment as I recognize and appreciate your expertise.
I agree. Sometimes the details of these more complex examples does complicate the understanding of the concepts. Maybe the solution should be alternating between a simpler straightforward example and a more realistic example
I also really appreciate the slightly more complex example here, as it highlights the different kinds of duplication (especially the not immediately obvious duplication). IMO you don't really have to know anything about video processing to understand that the duplication is found in the way that files / directories are handled :)
There is a reason it is called DRY not DRC So there are exceptions to the guide. But if you find yourself repeating a lot. Either A.) There is no solution for that language. Or B.) The current code is much better than the solution.
💡 Learn how to design great software in 7 steps: arjan.codes/designguide.
I’m a big fan of the Rule of Three - you don’t know what the abstraction should look until you have three examples, so duplicating something once is good, then refactor instead of duplicating it again.
One more case when DRY do not apply is accidental duplication. It happens when two independent tasks in the moment could be implemented with identical code, but has different reasons to change.
That's a huge point, the principle of bounded context helps here.
Your example of the calculate_any_volume()-function reminded me of what my old professor used to call "control coupling". It was part of his lesson about "low coupling, high cohesion". I still consider that to be the basic principle of code design.
Nice video, as always, Arjan. You're certainly right that it can be hard to figure out how to extract the commonality from similar-but-not-quite-identical blocks of code. One technique I sometimes use is to slowly *introduce* code to increase the similarity of the blocks until they're true duplicates at which time extracting the duplication is straightforward. (I also like the example of trying too hard to eliminate duplication.)
At my company we started to avoid "DRY" quite a while ago, because not so experienced developers tried to take it by hearth and prematurely abstract EVERYTHING to a point where functions where not abvious if they where ther right fit for a case or not.
We now rather follow "WET" -> write everything twice. Its fine to have a code duplication. The cases might be similar but not similar enough to be worth an extraction and ot might not even be worth the time to propperly abstract it (yet). Do you come by the same code a third time? Great, now is a good time to abstract it and reuse it a bit.
As a software engineer I can argue that the "calculate_any_volume" function is not a good generalisation. It's way harder to read/understand than the simple ones. In that specific example, I'd trade out a bit of duplication over fully understanding what the function is doing.
Someone in the Golang community jokingly suggested the WET principle: Write Everything Thrice to understand the patterns before removing duplication. Not a bad idea 😁
I like that! 😊
Just have seen your comment. I basically wrote the same in a different comment and we do the same (even tho we name it "write eveything twice") and only abstract the third time it comes around. Makes it easier & helps especially with unexperienced developers who "see the possiblity of a reuse" of something and directly abstract everything up 2 levels and lead to large functions with edge cases that never happens. :D
It's a realy good principle. I like WET over DRY every day now.
I have got myself into trouble removing duplicate code and doing abstractions way too early before i really knew what they should look like. Being a neat freak has its issues. Code always runs though!
Always revisiting your code is key, especially when starting out and learning a language. As I got better with functions, I was able to de-dupe hundreds of lines of code. I found the quickest and easiest deduping related to plotting/matplotlib.
As someone who is not a python novice, i appreciate the more complex example.
Nice! I hope you'll do another video about partial functions vs DRY principles: The `caculate_any_volume` of this video could be turned into the other volume func's as partials.
In real-world code, what wold be some techniques, advantages and drawbacks to approaching DRY with partial functions? (But calculating a volume is laughably too contrived to illustrate the point!)
Hi Arjan, once you mentioned to use more pathlib :P. I just remembered when I watched your video. Maybe there is even more refactoring possible. Greetings from Germany.
the duplication becomes a problem around deadlines - ppl generally copy paste code and get to the deadline - focus on clean code is really really low..
one trick that we use is - after every release generally 3-4weeks - the next 3 days are spent on just reading the code and spotting these issues - feel thats a good way to spot where things are complicated / repeated..
Deadlines are killing creativity and lower cognitive ability. I do not like deadlines and stress during developing process.
@@radeksmola3422 i hear you.. but without deadlines, i dont think i would ever end up shipping a release.. there will always be something that i can improve / iterate.. :)
@@radeksmola3422But deadlines are necessary for productivity. That’s how things are done in time.
So many programmers duplicate code for loops over and over again, when those loops are already available in map, reduce, and filter.
Many devs (people in general) have a hard time telling the difference between contract and coincidence. That’s driven too many PR complaints about duplicated code, even across boundaries that need to be maintained.
But now the *validate_dimensions()* call is duplicated many times. Is it not cleaner to validate the inputs before calling the function?
A different take on this: You could also argue that making code too generic often results in a function/module/whatever that simply gets too many responsibilities. Then it either gets too complex, or too abstract, or both. So then it violates the single responsibility principle. So you could say that a function that calculates any shape has multiple responsibilities. And in case of conflicting principles, it becomes a balancing act between them.
Another issue or the last example is the evaluation time or the function, since boolean ops are the worse in terms of big O timings
What is doing the online auto complete here for Arjun?
Just an observation. In the book, I wrote that every piece of _knowledge_ should have a single place of expression. Code is one representation of knowledge, but not the only one. It just happens to be the easiest one to spot :)
The video starts but I don’t see the IDE giving you any hints about code duplication. Why is that?
Great concepts thanks Arjan
And when you end up duplicating code between projects it's time to start your own library ;)
Thank you so much for this video!
You’re welcome!
Love your work
Thank you, glad it’s helpful!
It's a great video, as usual, but the audio has poor quality this time!
when you remove duplication, you can make things cleaner, but if you refactor things too much you increase coupling and decrease explainability
Don't use DRY KISS and stay SOLID!
It’s pretty clear that you truly GRASP design principles!
remove this flag there..come on
I use WET KISS
You can say that again!
Removing duplication and unifying things are great when all those stuff change together. Otherwise, be careful
Great video!
Thanks!
I like abstraction for calculation of area, but it is hard to read by looking at function call what is happening because general name of function. And by reading it from parameters is not easy.
Ha ! you are explaining it ...
timestamp?
@@DrDeuteron it was example of bad deduplication. I wrote here too early.
Don’t use f-strings with logging! String manipulation takes a lot of resources and if you use an f-string, you often need to discard the result of that operation if your logger is not configured. Or you don’t use given log level. If do this in a loop, it really adds up.
You could remove the duplication in your example of calculating the volume / area without losing readability.
1) Create a function that calculates the volume of a shape given its bounding box dimensions and its ratio volume/volume of bounding box. You can also inserts checks here.
2) Define the function to calculate the area of known shapes by calling the function above, or using functools.partial and give intuitive names.
This makes it also very easy to extend to new shapes.
I think the validate_dimensions function is too unclear. If it crossed my way, I wouldn't know that it raises an exception. I would have read the explicit implementation quicker. In part, this might be solved with a different name like 'raise_if_any_negative'.
And that suggests the shorter implementation
if any(x>0 for x in dimensions):
raise Value error(...)
Less than zero I think. Also decorator would fit better here IMO
When I started web development with Django, I would often shoot myself in the foot by trying to create mixins to avoid duplication in views or forms. I always regretted it later 😂
I definitely copy multiple instances of the same code in my projects (copy paste alter) which results in 'quick and dirty' solutions amd of course bugs, after that I look to my code from a 'what is it doing exactly' and then optimize it by removing duplicate code.
Someone said: Abstraction is DRY, the "compression of code".
I think DRY is even more important for test code to avoid duplication in the test setup logic. You should show that on another episode.
For production code, IMO a vertical slice architecture with proper BDD tests is fine. Some duplication buys the ability to throw the feature away without any coupling to consider. In big systems, coupling is the true painpoint anyways.
DRY, like many rules, should be subject to the quote from Douglas Bader "Rules are for the obedience of fools and the guidance of wise men."
Unfortunately, everyone thinks they are wise.
I most commonly encounter code duplication between large functions. The two functions do something different, but have a common set of steps between them. And there's usually one small but difficult thing that prevents easy de-duplication. For this reason, don't let a function's size get out of hand during its creation. Once it reaches a screen height, time to factor something out. Smaller functions are easier to re-use than larger ones, preventing future duplicate code from appearing.
if the functions do something different, they should be kept separate. And maybe the common steps you can abstract into a different thing.
People are way too scared about long code, having it all in the same place, as long as it is understadable is okay.
The constant camera zooming is awful. I know someone is trying to draw me in or something, but it's waaaay over used.
So, DRY, unless it breaks SRP.
The example of refactoring the calculations of area was interesting. Not only did it make the code more complicated, you inserted a for loop which makes it much slower. If you think multiplying a few things together needs to be refactored, you are the type that may over use the DRY principle. Let me say that again...
Are you really this dense?
I am learning multi-agent ai to develop an earlier sepsis detection system using bayes network, I have good understanding of database distributed systems and network communication, what is my odds?
I'm stealing DAMP! 😀
Go right ahead 😁
DRR. Don't Repeat Repostory ;)
🎉
"...not what you see in production code." Wanna bet?
True - "get it working and get it out" is often more important than optimizing the code. *Tip:* overestimate your maintenance jobs. It buys you some time to reevaluate your choices. And never violate the basic architecture. It'll save you a lot of time later on.
@@HansBezemer During sprint planning, and _especially_ when assigning points, I make it clear that I'll be doing testing and cleanup as part of the ticket.
An extra hour of thoughtful refactoring and cleanup can (and likely will) save you days of debugging and fixing.
I like the joke. 😊
There is a reason it is called DRY not DRC
So there are exceptions to the guide.
But if you find yourself repeating a lot.
Either
A.) There is no solution for that language.
Or
B.) The current code is much better than the solution.
Also because that would be the Democratic Republic of the Congo
@@efovex haha true
If your "generalization" needs to many parameters - or if it requires suspect constructions like lots of references (instead of values), reevaluate your choices.
@@HansBezemer adequately enough parameters. And who uses too much references in a construction? I guess people who wants problems.
The only problem is that your examples are too complex, often involving video processing, and that kind of stuff that not everyone is familiar with. So, it makes it more difficult to grasp the design concept you want to put forward. I know it is certainly more difficult to crave simpler examples, but it will be more understandable for me, at least. I hope you will consider that as a constructive comment as I recognize and appreciate your expertise.
In opposite, I appreciate more complex examples than easy one for beginners.
I agree. Sometimes the details of these more complex examples does complicate the understanding of the concepts. Maybe the solution should be alternating between a simpler straightforward example and a more realistic example
@radeksmola3422 Being not familiar with video processing doesn't not imply being a beginner in programming...
@@JeanMarieGalliot I understand, but in the example was processing some text files for subtitles.
I also really appreciate the slightly more complex example here, as it highlights the different kinds of duplication (especially the not immediately obvious duplication).
IMO you don't really have to know anything about video processing to understand that the duplication is found in the way that files / directories are handled :)
Complex code using libraries not everyone would use plus the scrolling jerking around... not very helpful for someone trying to learn to code.
This is a video about the pitfalls of the "DRY" mantra in complicated real-life scenarios. If you're a beginner, you can watch beginner level stuff...
WET: Write Everything Twice. can get
WETTT: Write Everything Ten Thousand Times.
Rule of Three notwithstanding.
There is a reason it is called DRY not DRC
So there are exceptions to the guide.
But if you find yourself repeating a lot.
Either
A.) There is no solution for that language.
Or
B.) The current code is much better than the solution.