Wow!🤯 Thank you so much for the rich content, I didn't know about the multiple targeting, this is why i love your videos, beside the specific feature your talking about which is awesome by itself, there is always some additional features you talk casually about 😁
very nice - I have a startup that's 99.9% linq as I need to write so much in so little time (not premature the optimization) - and use Sum()/min()/max() a lot.. the move from to 6 was good and my customers always love the performance of my app - but this will just turbocharge my reports/queries - and take a load off the backend!! Whooo hoo.. a perfect upgrade for me.... thanks I will do it over the Christmas 'break'....
Thank you so much for the update. I always enjoy your content and specifically the videos that include performance improvements. Keep up the good work.
Thanks for letting us know that it's not too bad to write functional style LINQ code 😀 The SIMD vectorization features in C# are really nice. I'm having so much fun with it 😁
@@Z3rgatul I know. But you see what's possible with a bit of SIMD effort and some array pointer hacks. 45x speed-up 😂 And there's a lot more potential with other intrinsics to optimize for e.g. branchless code as well 😄 Eventually, someone will sit down and figure out what the machine actually needs to do instead of writing tons of inefficient abstractions 😅😅😅 Maybe it's finally happening 🤔
I'd be interested in 1) Seeing how a foreach compares in performance to a for (int i = 0) situation and 2) seeing the performance of Linq Select and Where
Yes. That they optimized 1 or 2 builtin lib functions is not as exciting as it would have been if they somehow eliminated the enumerator in foreach loops when the type is known.
It’s because foreach crate his own IEnumerator and it’s kind of sucks 😢if you compare with for loop it would be way more bigger difference. And one more time - the array in dotnet is sucks in comparison with Java. And the worst thing the dictionary in dotnet are way more slower then in python
@@cheebadigga4092 I said about comparison with simple FOR loop. As you remember - the foreach will lead you to creation new instance of IEnumerator and while loop with call of virtual method .Next. It is not so fast in comparison to simple FOR. It should not produce a memory allocation and could be reasonable fast for loop with 100 items I believe.
Vector SIMD and aligning data is what Unity does with its Burst compiler. Although instead of recoding a common library ( which is very useful) they cross-compile IL/.NET bytecode to native SIMD code using LLVM. Also blindingly fast but you are rolling your own functions.
Very nice! That might actually prompt me to use Linq more often, though I still dislike how much it clutters autocomplete. For that kind of performance gain, though, I can definitely put up with it.
I like how we've come full circle. Performance was the bottom line when we wrote in assembler back in the day. Now in 2022 performance is the fashion again!
Performance is always important but people also understood that products has to get out to market at a reasonable pace and assembler is hard and slow to write well but now our tools are getting more efficient to get back to the c speeds
I started reading the title and immediately took a guess that span or readonlyspan or vector or Memory will be used in the linq to make it faster. The idea used to make the code faster is awesome and will be now used in every possible places. Awesome. The link you mentioned about the Vector article by Steve is missing in the desciption. Many Thanks.
It does actually print in the output at 6:58 that you are using AVX2 with .NET 7.0, which means this is using vectorization and going to be the fastest possible for most current CPU architecture. But, this is specifically for integral math with data stored in consecutive memory such as in arrays.
Again a very interesting and informative video, thx. With the hardware acceleration (and platform dependant) requirements to take effect of the performance boost, it would be interesting to know if they are active in containerized environments or arzure functions / aws lambda. Do you have any Information on that?
As others already mentioned - it will work, if your CPU supports that, no matter if it's inside or outside of container In case of Azure Functions, they don't brag about what CPU they use and from what I read, you might get different equipment from time to time. They usually use some xeon, which have been supporting AVX for years now, so most probably it'll work. You can also use a premium plan, which guarantees you Dv2 time of machine. I think it will guarantee AVX availability, but you'd have to check all possible hardware from they're docs (I checked some random ones and looks like they all support avx and avx2)
Ah, so AVX finally makes its way into standard libraries! Good call by the C# team. Java introduced the Vector API in 17 (I believe) but I'm not aware of any usages within the JDK. Vector APIs are the future for any sort of batch processing task. My dream would be the JIT being smart enough to utilize AVX on its own from "regular" code without me having to tell it to do so explicitly. But we're still quite far away from that. Standard library integration is a good first step.
First of all it is not always AVX, those can also use instructions from older version like SSE if AVX is not supported on the host. Secondly, those vector APIs were exposed back in .NET Core 3.0 and were used since then in a lot of places. Runtime team (not the C# one as you said) makes it's best to vectorize everything what is possible already for a long time and with each release we can see more and more stuff vectorized so that is not something new.
@@turalaliyev1764 AVX is Advanced Vector eXtensions, a set of SIMD instructions and wide registers. SIMD - Single Instruction Multiple Data. It can do operations on pairs of 4 or 8 Int32 values at a time. E.g. Sum: (pseudocode) var v1 = values[0..4] var v2 = values[4..8] var v3 = VecAdd( v1, v2 ) var res = v3[0] + v3[1] + v3[2] + v3[3] 1 vector addition + 3 additions vs 7 additions And this is with 128 bit vectors. Most, if not all current CPUs can do use 256 bit vectors. It obviously gets much better if you are summing more than 8 values. For 1024 values you do 255 vector additions and 3 normal additions with 128b SIMD, with 256b it's 127 vector additions and 7 normal additions. Without SIMD you have to do 1023 normal additions. Since the CPU can do 1 SIMD instruction much faster than 7-8 normal ones, it's *much* faster. Intel CPUs (Skylake or newer) can add two 256 bit (8 Int32) vectors in 3 clocks. Part of the reason GPUs are so fast is because they're optimized to do SIMD/SIMT operations on 512 bit vectors. The T in SIMT is for Thread. Think of it as 1 thread per pixel or vertex.
I was expecting this to end with ".. they implemented it in C, and now its faster". But perhaps vectorization hooks into something even deeper, like SSE?
It does. C# has supported vectorization for a long time, but they made some changes in .NET 7 to make it easier to write vectorized code that runs as fast as possible on multiple architectures. They had an example where previously it was easier to just target SSE, and targeting ARM equivalents was more tedious. But now it's supposed to be easier to target both.
Vectorization is more like a way of letting the JIT know that data is properly aligned to be able to be processed by SIMD instruction sets, like SSE or AVX. The newer instruction sets go to ever wider vector widths, allowing for more parallelization on large vectors, and larger speed ups. In the old days you'd have to code this in C with inline assembly instructions, and test CPU instruction set bits to take different code paths for the best available SIMD (e.g. prefer AVX, then SSE3, then SSE2, etc.). A JIT should be able to do this way smarter by doing those checks before JIT compilation and only include the 'best' SIMD code path. And in theory, this should also work for other SIMD's available on different architectures like ARM or RISC-V.
YEah, a simple int array is not really saying much when usually we use Linq for large lists of deep objects with selects etc. (for example). Wonder if it is also 48x faster for Enumerables of complex classes
One can implement the "Vecoring Logic" even on .Net Framework 4.8 to achieve the same result: [Benchmark] public int VectoringMax() { Vector maxes = new Vector(_intItems); var count = Vector.Count; var index = count; do { maxes = Vector.Max(maxes, new Vector(_intItems, index)); index += count; } while (index + count value) { value = maxes[i]; } } return value; }
You should use size is equal to 101 instead of 100. Bcs compiler have an optimizations on even sizes of collections. Creator of dotNet benchmark wrote about this.
Very nice improvements. Hopefully next round they manage to get the compiler to optimize away lambda call per item in a Select call and turn the foreach in a for loop. As most code is just selecting over lists and arrays, the compiler might generate smarter code.
Main disadvantage of LINQ is when you need both Max and Min -with LINQ you will have two complete iterations over the enumerable, while with your own code it can be done with one.
@@nanvlad int min = arr[0], max=arr[0]; for(int i=0; i < arr.Length; i++); { if(arr[i] > max) max = arr[i]; if(arr[i] < min) min = arr[i]; } So, instead of two passes, you have one. And if you also need an average and a sum as well - you reduce 4 complete iterations over array to 1. Which is quite significant.
@@zakirsobirov6926 Tuple as a response is not a good choice, because we need to know the exact positions of result items. For example, If I call source.Max() I get a single number which is obviously max value. But what if I call source.MaxMinAvgSum()? I receive a single result with a tuple type and then I have to know that Item1 is Max, Item2 is min and so on. Even if you give them names it doesn't make it easier. And then what if I want to mix calls, e.g. source.MaxSumAvg() or source.AvgMinMaxSum()? That makes method api unreadable
I'd be curious to see these benchmark results on various platforms, especially iOS and Android, if the results are similar there then I can't wait for Unity to adopt .NET 7. Although Unity also converts mobile to C++ code, so I wonder if those gains would hold up after that translation.
I was about to nail you to the wall for using foreach instead of for, but then I remembered that for arrays, the compiler already turns foreach on arrays into for loops. (ICollection in general I think?) So nevermind, great video. I'm going to use this as hard evidence on why switching to 7 is a good idea. (Along with the "don't just LTS" video)
@@CantonGregory if the compiler is able to convert it to a for loop, nothing, but since the benchmark showed that the Max_Own allocated 32 bytes, it didn't in that case. A normal for loop with an ICollection is much faster than a foreach with an IEnumerable: var en = arr.GetEnumerator() while( en.MoveNext() ) { var curr = en.Current; }
It might not work very well. Specialized code is ... specialized. Here they use the fact they know about the type of the array you are working on so they are able to marshall that memory and then use SIMD vectorization. For the datetime struct it is still possible because it is composed by a long. So you can marshall a datetime struct to an array of long values. For custom structs with several fields you will need to write your own methods. Or you will need to create an array with the primitives inside. So you will end up with a tradeoff of memory bandwith (copying all the values from you custom struct to an primitive array) and SIMD vectorization. For large caches (~256) it might worth it, for smaller one I doubt it will really make sense ... It should be benchmarked and it will be really dependent of your hardware :)
You often see lists of integers in programming examples. But has anyone ever used that in real life? I don't think I ever have - it is always a list of objects. I think your examples could use a list of objects instead, perhaps a list of persons with an age-property? And then to find the min/max/avg of the persons' ages... does the new optimizations work in that scenario too?
No. As you can see in the Doc, only primitives are supported for the Vector Class, which is used here. docs.microsoft.com/en-us/dotnet/api/system.numerics.vector-1.-ctor?f1url=%3FappId%3DDev16IDEF1%26l%3DEN-US%26k%3Dk(System.Numerics.Vector%25601.%2523ctor)%3Bk(DevLang-csharp)%26rd%3Dtrue&view=net-6.0
Well, the whole idea of vectors is build around specialized operations in CPU, which are able to perform numeric operations on mutliple items at once. So you need to have some sort of numeric type there But it's related to the project needs for performance. If you are really woried about nanoseconds of traversing an array then you shouldn't use list of objects anyway. You would use structs at least, but everytime it's possible, you'd actually go for primitives
@@Otto-the-Autopilot to get 'vectorable' enumerable, you need materialized collection, so at least '.Select(...).ToList()'. But for single search most probably it would be slower then old LINQ implementation Maybe if you're interested in getting all min/max/sum/avg at once, it might be beneficial to materialize it, but it'd have to be tested
I knew it, after seeing the first benchmarks results that it uses SIMD under the hood now ;-) SIMD is the best way to optimize any code that relies on mathmatical operations and its results - but only makes sense for medium/large amounts of data. Also you can even filter out values using Vector.ConditionalSelect() which makes it much more versatile.
Quick question is their a way to implement a console app to run benchmarks against your unitof work repo pattern data calls from you web project. I'm running into dependency issues and I'm also not sure how to i would implement that in testing console app.
I personally wouldn't use Blazor for the stuff I'm building so I don't think I will be making a video on that topic. I am more of a React/TypeScript user
I have personally noticed that using foreach over a simple for doubles the execution time of my code. If the .NET compiler could gain the ability to convert foreach into for where possible, it would be a significant performance win. On that same note, would appreciate if you could include this version of the 'Max' in your benchmark: public int Max_Custom() { if(_items.Length
I usually deal with IReadOnlyCollection when I need to do things like this, rather than raw arrays. .NET Arrays happen to implement that as well, though I'm not sure if that interface mandates an indexer right at this moment, I think it might just add a Length onto IEnumerable.
In the middle of the video I thought they added cast check to IList (I saw this optimization in Count method), but this could not provide so much improvement. Now I am interesting how they handle List modifications in the middle of evaluation of Max. Old code could throw error "collection has been modified", but this new approach can only check list version at the end of evaluation.
From what I undestand, it won't notice the modification. This protection is in layer of Enumerator, but this implementation goes for ReadOnlySpan so it accesses memory on pretty low level
@@Z3rgatul I'd rather say it was always a race condition - if you actually fit a change between creating an enumerator and enumerating last element, you get an exception. But if you do changes on the same list on another thread, you can always put it at a point in which it won't result with exception and you'd get either a result from before the change or after it. now, you'll always get a result from some point. Maybe some result possibilities change, but... List were never approved to be used on multiple threads, so it's still an "undefined result" area
If anyone is like me and wanted to find the source for these new Linq methods, they can be found in the DotNet Runtime repo under 'tree/main/src/libraries/System.Linq/src/System/Linq' I tried seeing if the classes could just be used as-is in earlier versions, but they seem to reply on features introduced in C Sharp 11 (e.g. static methods in a virtual class).
@@nickchapsas Nice. I get the feels that order matters sometimes because of obvious optimizations that happen when reordering is not necessary. Intrinsics and Vectors are craaaaaayzee. Glad to see them being used here! Thanks for the update!
Wow. But are there plans to enable less memory-heavy LINQ syntax desugaring? Because currently `from in from in select` desugars into `.SelectMany(..., (.., ..) => new { ... }).SelectMany....` which wastes memory on anonymous class instances allocation instead of using `ValueTuple` with named items
@@nickchapsas hmm interesting... I even saw gitbub issues RC2 but I dont see any updated nugets for RC1 or RC2 and also not a new docker image for RC1 or RC2. my builds are still pulling preview7. so thats what confused me.
If I am not wrong, right now (up to .net 6) ReadOnlySpan's such could not be used in async Methods or Tasks, so are they going to remove that limitation in .net 7, or If I do the same Linq stuff in async methods or tasks it will fall back on the older, slower implementation? or will it still allow the newer version because it "safe" since they wrote it and not us?
You can use spans in async methods, just not make variables for them. Code below is completely valid: async Task IndexOfSubstringInFile(string filename, string substring) { string contents = await File.ReadAllTextAsync(filename); return contents.AsSpan().IndexOf(substring.AsSpan()); }
@@petrusion2827 this code doesn't improve performance or memory allocation, indexOf doesn't allocate a new string anyway... You will only save memory here if you were allowed to pass the parameters to the method a spans, then the substring would be a slice, but this isn't allowed.
@@dusrdev You are hyper-focusing on the example, the specifics of which are not important. The entire point of my comment was to *show that you can use spans in async methods if you just don't create a variable for them* - not how to optimize performance or memory with them.
If I already have some Linq implemented in a .NET 6 project, and I change the target framework to .NET 7, will I see these performance increases without doing anything else?
99/00 was sotNice tutorialng called Vision DSP or DST or sotNice tutorialng and didn't quite work the way soft soft does, but tNice tutorials video helped so
Is this speed optimization restricted to simple types. A List of Objects that has had some updates, would that still pass TryGetSpan? Or will it fallback to the slower default implementation?
I haven't watched the video, but a list of objects is just a list of managed references which are a fixed width, so I imagine these can be vectorised too.
I just looked at the source code of TryGetSpan. The current implementation requires the target to be either a T[] or a List, and requires T to be a struct. But the comments indicate they may remove this constraint in the future.
Once they support .NET 7 then you're at the mercy of whatever CPUs they choose to run it on. I've heard Azure operates more AMD than Intel, but they'll both support the instructions. However, I would speculate that if your performance requirements were so important that you're using this optimised code; then those services may not be for you! 🙂
Thank you again for your excellent videos. Could I ask though, to anyone, what is the difference between C# and .Net? I thought Linq was part of C#. Thanks in advance.
LINQ is part of C#. C# is one of the language available running on top of the virtual machine (the Common language runtime). .NET is the marketing term for the overall tech. The dotnet runtime is what enables your dotnet application to run. The dotnet sdk is what gives you the tools to work on and build your dotnet application.
I think it's fair to say that this is the premium C# channel on youtube
This video is a masterclass on how to benchmark this kind of improvement 👏
Wow!🤯
Thank you so much for the rich content,
I didn't know about the multiple targeting, this is why i love your videos, beside the specific feature your talking about which is awesome by itself, there is always some additional features you talk casually about 😁
Came here as soon as I saw "performance boost" in the notification.
same bhai same
very nice - I have a startup that's 99.9% linq as I need to write so much in so little time (not premature the optimization) - and use Sum()/min()/max() a lot.. the move from to 6 was good and my customers always love the performance of my app - but this will just turbocharge my reports/queries - and take a load off the backend!! Whooo hoo.. a perfect upgrade for me.... thanks I will do it over the Christmas 'break'....
Thank you so much for the update. I always enjoy your content and specifically the videos that include performance improvements. Keep up the good work.
Excuse me what? This is absolutely great news. I didn't even know one could somehow hardware accelerate a min function
Thanks for letting us know that it's not too bad to write functional style LINQ code 😀
The SIMD vectorization features in C# are really nice. I'm having so much fun with it 😁
It is still bad. This optimization works only on pure arrays/lists, and not in case of typical functional style Linq chain calls.
@@Z3rgatul I know. But you see what's possible with a bit of SIMD effort and some array pointer hacks. 45x speed-up 😂 And there's a lot more potential with other intrinsics to optimize for e.g. branchless code as well 😄
Eventually, someone will sit down and figure out what the machine actually needs to do instead of writing tons of inefficient abstractions 😅😅😅 Maybe it's finally happening 🤔
Wow, this is awesome. Thanks Nick!
Just Wow! 😯 Thanks once again for showing.
This performance is truly insane! In fact, I equally read the very very long performance post on Ms blog and my mind was blown away!
I'd be interested in 1) Seeing how a foreach compares in performance to a for (int i = 0) situation and 2) seeing the performance of Linq Select and Where
Yes. That they optimized 1 or 2 builtin lib functions is not as exciting as it would have been if they somehow eliminated the enumerator in foreach loops when the type is known.
@@lucbloom Didn't read everything but they made a blog post about all that: "Performance Improvements in .NET 7". There's talk about foreach too.
It’s because foreach crate his own IEnumerator and it’s kind of sucks 😢if you compare with for loop it would be way more bigger difference. And one more time - the array in dotnet is sucks in comparison with Java. And the worst thing the dictionary in dotnet are way more slower then in python
@@lshnk Nick showed in the video. The performance hit of foreach is negligible the higher the amount of data to loop for becomes.
@@cheebadigga4092 I said about comparison with simple FOR loop. As you remember - the foreach will lead you to creation new instance of IEnumerator and while loop with call of virtual method .Next. It is not so fast in comparison to simple FOR. It should not produce a memory allocation and could be reasonable fast for loop with 100 items I believe.
BROTHER, YOU ARE THE BEST!!! You oooh really helped me!! THANK YOU VERY MUCH!
Finally. 🚀blazingly fast 🚀
Vector SIMD and aligning data is what Unity does with its Burst compiler. Although instead of recoding a common library ( which is very useful) they cross-compile IL/.NET bytecode to native SIMD code using LLVM. Also blindingly fast but you are rolling your own functions.
Thanks! I've been searching how to get it and this is brilliant :D
Mind blowing 🤯 thank for sharing
Thanks Nick for the demo. That is insane!!
Great news!
Great video as always!
WOOOOOOOO! This is amazing! :D
Excellent video
Really great video. Thanks Nick.
Very nice! That might actually prompt me to use Linq more often, though I still dislike how much it clutters autocomplete. For that kind of performance gain, though, I can definitely put up with it.
Thanks bro that was really helpful
Thank you, Nick. Good job, as always =)
I like how we've come full circle. Performance was the bottom line when we wrote in assembler back in the day. Now in 2022 performance is the fashion again!
Performance is always important but people also understood that products has to get out to market at a reasonable pace and assembler is hard and slow to write well but now our tools are getting more efficient to get back to the c speeds
I started reading the title and immediately took a guess that span or readonlyspan or vector or Memory will be used in the linq to make it faster.
The idea used to make the code faster is awesome and will be now used in every possible places. Awesome.
The link you mentioned about the Vector article by Steve is missing in the desciption.
Many Thanks.
I can see it there, is it still missing?
@@nickchapsas yes i see now
A very good video Nick, thank you.
It does actually print in the output at 6:58 that you are using AVX2 with .NET 7.0, which means this is using vectorization and going to be the fastest possible for most current CPU architecture. But, this is specifically for integral math with data stored in consecutive memory such as in arrays.
Again a very interesting and informative video, thx. With the hardware acceleration (and platform dependant) requirements to take effect of the performance boost, it would be interesting to know if they are active in containerized environments or arzure functions / aws lambda. Do you have any Information on that?
I'd love to see a follow up video of even a comment on how good this is in containers
As others already mentioned - it will work, if your CPU supports that, no matter if it's inside or outside of container
In case of Azure Functions, they don't brag about what CPU they use and from what I read, you might get different equipment from time to time. They usually use some xeon, which have been supporting AVX for years now, so most probably it'll work.
You can also use a premium plan, which guarantees you Dv2 time of machine. I think it will guarantee AVX availability, but you'd have to check all possible hardware from they're docs (I checked some random ones and looks like they all support avx and avx2)
Another great video. I'd love to see your approach for a fully dynamic expression tree builder.
so cool, thank you
Always an interesting video!
Very interesting Video, thanks😌
Ah, so AVX finally makes its way into standard libraries! Good call by the C# team. Java introduced the Vector API in 17 (I believe) but I'm not aware of any usages within the JDK. Vector APIs are the future for any sort of batch processing task. My dream would be the JIT being smart enough to utilize AVX on its own from "regular" code without me having to tell it to do so explicitly. But we're still quite far away from that. Standard library integration is a good first step.
100% agree, hoping to see more AVX utilisation in standard library.
Please, Explain me what is AVX ?
First of all it is not always AVX, those can also use instructions from older version like SSE if AVX is not supported on the host. Secondly, those vector APIs were exposed back in .NET Core 3.0 and were used since then in a lot of places. Runtime team (not the C# one as you said) makes it's best to vectorize everything what is possible already for a long time and with each release we can see more and more stuff vectorized so that is not something new.
@@sps014 -- oops.. meant to reply to @Tural Aliyev
@@turalaliyev1764 AVX is Advanced Vector eXtensions, a set of SIMD instructions and wide registers. SIMD - Single Instruction Multiple Data.
It can do operations on pairs of 4 or 8 Int32 values at a time.
E.g. Sum: (pseudocode)
var v1 = values[0..4]
var v2 = values[4..8]
var v3 = VecAdd( v1, v2 )
var res = v3[0] + v3[1] + v3[2] + v3[3]
1 vector addition + 3 additions
vs
7 additions
And this is with 128 bit vectors. Most, if not all current CPUs can do use 256 bit vectors.
It obviously gets much better if you are summing more than 8 values.
For 1024 values you do 255 vector additions and 3 normal additions with 128b SIMD, with 256b it's 127 vector additions and 7 normal additions. Without SIMD you have to do 1023 normal additions.
Since the CPU can do 1 SIMD instruction much faster than 7-8 normal ones, it's *much* faster.
Intel CPUs (Skylake or newer) can add two 256 bit (8 Int32) vectors in 3 clocks.
Part of the reason GPUs are so fast is because they're optimized to do SIMD/SIMT operations on 512 bit vectors. The T in SIMT is for Thread. Think of it as 1 thread per pixel or vertex.
I was expecting this to end with ".. they implemented it in C, and now its faster". But perhaps vectorization hooks into something even deeper, like SSE?
It does. C# has supported vectorization for a long time, but they made some changes in .NET 7 to make it easier to write vectorized code that runs as fast as possible on multiple architectures. They had an example where previously it was easier to just target SSE, and targeting ARM equivalents was more tedious. But now it's supposed to be easier to target both.
Vectorization is more like a way of letting the JIT know that data is properly aligned to be able to be processed by SIMD instruction sets, like SSE or AVX. The newer instruction sets go to ever wider vector widths, allowing for more parallelization on large vectors, and larger speed ups.
In the old days you'd have to code this in C with inline assembly instructions, and test CPU instruction set bits to take different code paths for the best available SIMD (e.g. prefer AVX, then SSE3, then SSE2, etc.). A JIT should be able to do this way smarter by doing those checks before JIT compilation and only include the 'best' SIMD code path. And in theory, this should also work for other SIMD's available on different architectures like ARM or RISC-V.
Thanks for sharing. The link in the description is missing though.
Added the link, sorry
Works good, tnx
Awesome content. 👌
Muchas gacias por compartir esta infor!
be interesting to see a more complex select statement and see how much better that is
YEah, a simple int array is not really saying much when usually we use Linq for large lists of deep objects with selects etc. (for example). Wonder if it is also 48x faster for Enumerables of complex classes
Bro, thx!
One can implement the "Vecoring Logic" even on .Net Framework 4.8 to achieve the same result:
[Benchmark]
public int VectoringMax()
{
Vector maxes = new Vector(_intItems);
var count = Vector.Count;
var index = count;
do
{
maxes = Vector.Max(maxes, new Vector(_intItems, index));
index += count;
} while (index + count value)
{
value = maxes[i];
}
}
return value;
}
Here you have to pass an Array... Maybe here is an disadvantage
4:38 foreach creates enumerator (unless compiler magic), you need for loop for this test with Array
It's unbelievable bro
You should use size is equal to 101 instead of 100. Bcs compiler have an optimizations on even sizes of collections. Creator of dotNet benchmark wrote about this.
I really want to see how Vector class works. Please, if you can do a tutorial, i would be very thankful.
oh that will be very useful
Very nice improvements.
Hopefully next round they manage to get the compiler to optimize away lambda call per item in a Select call and turn the foreach in a for loop.
As most code is just selecting over lists and arrays, the compiler might generate smarter code.
Main disadvantage of LINQ is when you need both Max and Min -with LINQ you will have two complete iterations over the enumerable, while with your own code it can be done with one.
What do you expect as a result of your operations?
@@nanvlad a tuple.
@@nanvlad
int min = arr[0], max=arr[0];
for(int i=0; i < arr.Length; i++);
{
if(arr[i] > max)
max = arr[i];
if(arr[i] < min)
min = arr[i];
}
So, instead of two passes, you have one.
And if you also need an average and a sum as well - you reduce 4 complete iterations over array to 1. Which is quite significant.
@@zakirsobirov6926 Tuple as a response is not a good choice, because we need to know the exact positions of result items. For example, If I call source.Max() I get a single number which is obviously max value. But what if I call source.MaxMinAvgSum()? I receive a single result with a tuple type and then I have to know that Item1 is Max, Item2 is min and so on. Even if you give them names it doesn't make it easier. And then what if I want to mix calls, e.g. source.MaxSumAvg() or source.AvgMinMaxSum()? That makes method api unreadable
@@nanvlad Tuple elements can have names in C#.
Please do a video about Span
It's just awesome
I'd be curious to see these benchmark results on various platforms, especially iOS and Android, if the results are similar there then I can't wait for Unity to adopt .NET 7. Although Unity also converts mobile to C++ code, so I wonder if those gains would hold up after that translation.
Thanks, Nick. Vectors - one more topic for more profound research. 👍
Madness!
Insane in the membrane, insane in the brain!
I was about to nail you to the wall for using foreach instead of for, but then I remembered that for arrays, the compiler already turns foreach on arrays into for loops. (ICollection in general I think?)
So nevermind, great video. I'm going to use this as hard evidence on why switching to 7 is a good idea. (Along with the "don't just LTS" video)
What's wrong with foreach?
The compiler turns it into 'for' loops when it knows that it's an array at compile time.
Not here though, since we have an IEnumerable>
@@CantonGregory if the compiler is able to convert it to a for loop, nothing, but since the benchmark showed that the Max_Own allocated 32 bytes, it didn't in that case.
A normal for loop with an ICollection is much faster than a foreach with an IEnumerable:
var en = arr.GetEnumerator()
while( en.MoveNext() )
{
var curr = en.Current;
}
Nice. I had the idea to vectorize summing once. Damnit, I could have been the first.
Insaneeeee!!!!
Is this only for the integer based linq operation? or are all lookups faster?
Only for numeric primitive types (not sure if for all - I'd need to chec if decimal too)
Very cool. Also has anybody noticed that the vectorized Average was twice as fast as the vector sum? I need to figure out how…
Yeah I figured Average would be Sum divided by Count meaning it has to be a tiny bit slower than Sum
Yeah I came here to say this. Its unexpected.
If lists had a property that indicated whether the list was sorted and how, library functions could return median, min and max pretty much instantly.
The moment you went to analyze the source code of C# itself was the moment I noped the fuck out
Awesome.
Interesting how it works with other structs, e.g. DateTime or custom-ones
It might not work very well. Specialized code is ... specialized.
Here they use the fact they know about the type of the array you are working on so they are able to marshall that memory and then use SIMD vectorization.
For the datetime struct it is still possible because it is composed by a long. So you can marshall a datetime struct to an array of long values.
For custom structs with several fields you will need to write your own methods. Or you will need to create an array with the primitives inside. So you will end up with a tradeoff of memory bandwith (copying all the values from you custom struct to an primitive array) and SIMD vectorization. For large caches (~256) it might worth it, for smaller one I doubt it will really make sense ... It should be benchmarked and it will be really dependent of your hardware :)
You often see lists of integers in programming examples. But has anyone ever used that in real life? I don't think I ever have - it is always a list of objects. I think your examples could use a list of objects instead, perhaps a list of persons with an age-property? And then to find the min/max/avg of the persons' ages... does the new optimizations work in that scenario too?
Interesting question. Worst case: use a .Select(...) first.
I guess: It doesn't.
No. As you can see in the Doc, only primitives are supported for the Vector Class, which is used here.
docs.microsoft.com/en-us/dotnet/api/system.numerics.vector-1.-ctor?f1url=%3FappId%3DDev16IDEF1%26l%3DEN-US%26k%3Dk(System.Numerics.Vector%25601.%2523ctor)%3Bk(DevLang-csharp)%26rd%3Dtrue&view=net-6.0
Well, the whole idea of vectors is build around specialized operations in CPU, which are able to perform numeric operations on mutliple items at once. So you need to have some sort of numeric type there
But it's related to the project needs for performance. If you are really woried about nanoseconds of traversing an array then you shouldn't use list of objects anyway. You would use structs at least, but everytime it's possible, you'd actually go for primitives
@@Otto-the-Autopilot to get 'vectorable' enumerable, you need materialized collection, so at least '.Select(...).ToList()'. But for single search most probably it would be slower then old LINQ implementation
Maybe if you're interested in getting all min/max/sum/avg at once, it might be beneficial to materialize it, but it'd have to be tested
I knew it, after seeing the first benchmarks results that it uses SIMD under the hood now ;-)
SIMD is the best way to optimize any code that relies on mathmatical operations and its results - but only makes sense for medium/large amounts of data. Also you can even filter out values using Vector.ConditionalSelect() which makes it much more versatile.
What is the link to the Vector article?
Sorry, added
Would be nice if there was some kind of package that implemented this hardware acceleration into older versions of .Net.
Great.
Is any download size improvement for Blazor wasm with .net 7?
Nice, they implemented the LinqFaster way
Wow😵
Quick question is their a way to implement a console app to run benchmarks against your unitof work repo pattern data calls from you web project. I'm running into dependency issues and I'm also not sure how to i would implement that in testing console app.
what is your point of view about Blazor WebAssembly and Blazor Server? can you please create a video on this topic?
I personally wouldn't use Blazor for the stuff I'm building so I don't think I will be making a video on that topic. I am more of a React/TypeScript user
I do the see the link to the post about vector
So where's the blog you said you'll put in the description?
Its there now
Sorry, added it
I have personally noticed that using foreach over a simple for doubles the execution time of my code.
If the .NET compiler could gain the ability to convert foreach into for where possible, it would be a significant performance win.
On that same note, would appreciate if you could include this version of the 'Max' in your benchmark:
public int Max_Custom()
{
if(_items.Length
This won't compile since _items are an ienumerable and you're treating it as an array
@@nickchapsas ah, true, I was blinded by you instantiating it with .ToArray() and ToList().
@@nickchapsas true but you could attempt to cast it to an IList, if it fails then fallback to the foreach.
@@ever-modern it’s technically both, int[] inherits from IEnumerable
I usually deal with IReadOnlyCollection when I need to do things like this, rather than raw arrays. .NET Arrays happen to implement that as well, though I'm not sure if that interface mandates an indexer right at this moment, I think it might just add a Length onto IEnumerable.
How can you run c# without the main method and defining a class? And what is that editor?
Since C# 9 you don’t need a main method. Also it’s called JetBrains Rider
In the middle of the video I thought they added cast check to IList (I saw this optimization in Count method), but this could not provide so much improvement.
Now I am interesting how they handle List modifications in the middle of evaluation of Max. Old code could throw error "collection has been modified", but this new approach can only check list version at the end of evaluation.
From what I undestand, it won't notice the modification. This protection is in layer of Enumerator, but this implementation goes for ReadOnlySpan so it accesses memory on pretty low level
@@qj0n if it won't notice modification, then this is breaking change. some crazy code may rely on these exceptions
@@Z3rgatul I'd rather say it was always a race condition - if you actually fit a change between creating an enumerator and enumerating last element, you get an exception. But if you do changes on the same list on another thread, you can always put it at a point in which it won't result with exception and you'd get either a result from before the change or after it.
now, you'll always get a result from some point. Maybe some result possibilities change, but... List were never approved to be used on multiple threads, so it's still an "undefined result" area
@@Z3rgatul Then let it be a breaking change. Code like that should be actively discouraged.
If anyone is like me and wanted to find the source for these new Linq methods, they can be found in the DotNet Runtime repo under 'tree/main/src/libraries/System.Linq/src/System/Linq'
I tried seeing if the classes could just be used as-is in earlier versions, but they seem to reply on features introduced in C Sharp 11 (e.g. static methods in a virtual class).
Oh its there now.
Yeah sorry I forgot to save the edit
Which editor are you using right now.
Curious to know about AOT compiled apps benchmark results?
Buen video
Wouldn't shuffling the item array potentially have an affect on the speed of the results?
There isn't a noticable difference. Here are the results: gist.github.com/Elfocrash/dd33ad39d2e1e7a5a7ae056500a442ab
@@nickchapsas Nice. I get the feels that order matters sometimes because of obvious optimizations that happen when reordering is not necessary. Intrinsics and Vectors are craaaaaayzee. Glad to see them being used here! Thanks for the update!
YES!!! but how did not consume any memory???
It used readonly spans and the return value is a value type so it’s stack allocated
Ok... no heap allocation... but consumed some stack memory ;-)
please add link to this vector thing to description :)
It's there
Wow. But are there plans to enable less memory-heavy LINQ syntax desugaring? Because currently `from in from in select` desugars into `.SelectMany(..., (.., ..) => new { ... }).SelectMany....` which wastes memory on anonymous class instances allocation instead of using `ValueTuple` with named items
I thought it was a hoax, but everything works!
Does it apply to other Linq methods as well?
The demoed methods are the only ones affected for now
Where do you get the RC1 version of .NET 7? the latest version available to me is preview 7
From the dotnet/installers repository
@@nickchapsas hmm interesting... I even saw gitbub issues RC2 but I dont see any updated nugets for RC1 or RC2 and also not a new docker image for RC1 or RC2.
my builds are still pulling preview7.
so thats what confused me.
@@syzuna_ There is even a .NET 8 alpha SDK that is downloadable
If I am not wrong, right now (up to .net 6) ReadOnlySpan's such could not be used in async Methods or Tasks, so are they going to remove that limitation in .net 7, or If I do the same Linq stuff in async methods or tasks it will fall back on the older, slower implementation? or will it still allow the newer version because it "safe" since they wrote it and not us?
Spans cannot be used directly in async methods but there's no limitations on non async methods that async code calls into
You can use spans in async methods, just not make variables for them. Code below is completely valid:
async Task IndexOfSubstringInFile(string filename, string substring)
{
string contents = await File.ReadAllTextAsync(filename);
return contents.AsSpan().IndexOf(substring.AsSpan());
}
@@petrusion2827 this code doesn't improve performance or memory allocation, indexOf doesn't allocate a new string anyway... You will only save memory here if you were allowed to pass the parameters to the method a spans, then the substring would be a slice, but this isn't allowed.
@@dusrdev You are hyper-focusing on the example, the specifics of which are not important. The entire point of my comment was to *show that you can use spans in async methods if you just don't create a variable for them* - not how to optimize performance or memory with them.
If I already have some Linq implemented in a .NET 6 project, and I change the target framework to .NET 7, will I see these performance increases without doing anything else?
If you’re using any of those methods then yeah
So... Would this be ~160x faster if you performed min/max/sum/average on an array of 100 bytes instead of ints?
Depends on the hardware implementation, but yeah in theory it would be 4x as fast with bytes
99/00 was sotNice tutorialng called Vision DSP or DST or sotNice tutorialng and didn't quite work the way soft soft does, but tNice tutorials video helped so
Maths is nuts in .Net 7 and C# 11
Is this speed optimization restricted to simple types. A List of Objects that has had some updates, would that still pass TryGetSpan? Or will it fallback to the slower default implementation?
I haven't watched the video, but a list of objects is just a list of managed references which are a fixed width, so I imagine these can be vectorised too.
I just looked at the source code of TryGetSpan. The current implementation requires the target to be either a T[] or a List, and requires T to be a struct. But the comments indicate they may remove this constraint in the future.
But do you have hardware acceleration in Azure app Service or Functions?
Give it a go, see what the jit says
Once they support .NET 7 then you're at the mercy of whatever CPUs they choose to run it on. I've heard Azure operates more AMD than Intel, but they'll both support the instructions.
However, I would speculate that if your performance requirements were so important that you're using this optimised code; then those services may not be for you! 🙂
So is this a Linq wide performance boost, not just focused on these Linq methods?
It is focused on these methods, for now
Thank you again for your excellent videos. Could I ask though, to anyone, what is the difference between C# and .Net? I thought Linq was part of C#. Thanks in advance.
LINQ is part of C#.
C# is one of the language available running on top of the virtual machine (the Common language runtime).
.NET is the marketing term for the overall tech.
The dotnet runtime is what enables your dotnet application to run.
The dotnet sdk is what gives you the tools to work on and build your dotnet application.