The CSAIL talk on evmap was how I originally found your channel and decided to buckle down and learn rust even if it took some time and borrow checker arguing.
I'm just here to like the video and bookmark so one day when I'm hopefully smart enough to understand Jon I can come back to this and be like "oh wow that makes sense"
Publish is a pretty good name, it's true! I like it. Could you make a comment on the PR (linked in the video description) and I'll update it when I make the next batch of changes?
@@megumin4625 I think it should be "bike-shedding" as they are two distinct words. Sure, joining them make sense as they describe one phenomenon, but that's why the hyphen is more appropriate as it'll show the reader that it's one word made up by two different words. That will definitely make it more understandable, as I've encountered the English language before and therefore have some insights in this. I'm quite confident in my opinion, and some day I'll look up what that word means.
Hey, the announcement you were talking about at 45:10 was in 2015, and the full name of the paper is "Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads"
Very cool. However, there's a minor terminology problem. "Eventually consistent" means the readers can see updates in the middle of a transaction. That's what you get if the writer switches maps on every write. If you want to debit Mary $10 and credit Bill $10 and you send it off to the readers in between, that's eventually consistent. If you make both changes before you send the updates off to the readers, then you have isolated transactions, which is the ACID property that "eventually consistent" violates. If you just push updates on every change, then you have no transactional updates and it's just "most up to date," not "consistent." :-) However, it is a very cool concurrency principle. I learned a lot.
A rope might be a good candidate for an alternative type that could use this as a concurrency primitive. (One might consider it a collection of bytes though.)
For the discussion at 34:00 on different ways for the 'Writer' to determine that all 'Readers' have moved on to the other map clone: Could you not simply store the address of the map they are reading from or zero if they are not reading? This should work because: 1. All readers that are not reading are null, they will get the new atomic pointer when they start reading, they can be ignored. 2. All readers that have the new pointer can be ignored. 3. All readers that have the old pointer must be waited on (so filter and goto 1). This should be similarly efficient because we're just updating a single usize. Since you are updating a single u64 right now, that shouldn't hurt. Alternatively, if leaking the pointer is scary for any reason or you want to use a smaller storage option than usize then just annotate each map copy with a number, so map 1 and map 2, then store 1 or 2 or 0.
Really it's just a tristate: Not reading, reading map 1, reading map 2. If the readers also know which map is the 'newer' map they could even optimise this to a boolean; reading old map or reading new map + not reading. If that was done you'd basically be trading that increment for some comparisons. The tristate option seems more intuitive than the counter or boolean optimization to me.
Take a look at the other comment thread on the video with 9 comments (TH-cam won't let me link to it it seems). Basically, the challenge is that there now exists a race since readers must learn which pointer value to write. An integer tri-state with pre-distributed pointers might work, though I'm not sure how much it really simplifies the design in practice. Might save you from eventual overflow on 32-bit platforms though.
If Absorb had its operation as an associated type, then the impl for HashMap would have to be in the left-right crate due to orphan rules. You wouldn't be able to impl Absorb for HashMap with a custom operation type in the evmap crate, because even though the operation type is local to the crate, it does not help to uniquely identify the impl - it is not part of the signature. Both Absorb and HashMap in `impl Absorb for HashMap` are foreign relative to evmap, which would violate orphan rules.
@@solenskinerable You could, though newtypes in Rust as still relatively painful. There isn't a nice general way to forward traits, so you end up writing all of this boilerplate for forwarding that just bloats your code unnecessarily. There is a proc macro crate that takes care of that I believe, but I haven't looked into it too much.
The Vim window background colour keeps flashing between black and dark gray. I particularly noticed it during the hasher discussion at 1:45:00. Don't know whether that's your Vim window or the video, though.
About the collection-specific-ness of LeftRight: it solves a shared memory concurrency challenge. Usually when you want to share data across threads over a prolonged period of time (as opposed to e.g. sending a message onto an agent's queue) you want to share more than a single object, i.e. you want to share a collection. In 99% of the remaining 1% of cases, you probably just want an atomic number or pointer, for which LeftRight is probably overkill. TL;DR: my gut feeling is that LeftRight is not a collection-specific solution, but it is a solution to an almost-but-only-almost-exclusively-collection-specific problem.
Thank you for great content! It was really good stream. I found small problem with the recording (for example 3:07:20 - 3:07:25). There are changes in the vim's background color from dark to almost_dark. Is it your some specific configuration or the problem is in stream recording software?
Thank you! Yes, I'm aware of it - it seems to be something weird about the video encoder I was using. I'm hoping to have it fixed by the next stream :)
Can we use bits to represent the reader state is busy or not, maybe 1 for busy and 0 for free (or 2 bits for each reader if we need more states). Then writer only need to compare with 0 for alls readers bits to check for no readers are in working state (bit mask or something, I’m not expert). Maybe the bit operator is faster than looping all the reader and read counter equal to even. Just my 2 cents when watching at 39:34s
Unfortunately that doesn't work very well, because you'd then end up with all the threads trying to update the one shared value, which is going to lead to a performance collapse as the number of threads to up. It would speed things up for the writer, but at the cost of significantly worse performance for many readers :)
Is the problem described at 36:30 a real problem? I imagine the scenario as follows: 1. reader R reads the sheared atomic pointer to the new map 2. writer W reads R's state (atomic pointer) and sees an _old_ pointer thinking that R is still blocking him 3. R stores the pointer he just read into his state (atomic pointer) and begins to read the new map 4. W re-reads R's atomic pointer and sees the new pointer, now it can write to old map (in the case of R being the only reader, otherwise it need to wait on others too) Am I missing something?
Yes and no. The issue you have described is less of a problem though im sure there is a problem we are overlooking. The real issue comes from starvation since the writer never know's if you are finished. Say you do a read and then sleep or go work on something else never reading in the new pointer. But the issue is solved by setting your pointer to null post read. Its basically the same thing we do now when we increment post read.
So, the problem is actually the gap that now exists between when you read the pointer you're about to use, and when you write that into the slot that the writer checks. During that gap, the writer will think that you're _not_ in the datastructure (since the value was null), and will assume that it's fine to write. But, you could have read the pointer before the writer's swap, and then just been descheduled or something so that the writer had time to see you still being null, but then you go on to use that old pointer. You _need_ to update the slot strictly before you read the pointer. You could probably fix this by writing a known dummy value before you read the pointer, then write the pointer, and then write null when you're done, but that's becoming pretty inefficient. I think I'd prefer to go the route of handling epoch wrap-around instead, especially since it seems like it's just a matter of using equality instead of greater than!
@@jonhoo I believe that's the best option, since chances are very low that the readers have time to do a complete round on the epoch count so that the writer reads again the same value and gets tricked into thinking they had not moved on, and even lower to happen multiple times.
@@jonhoo I think I understand. Is this more or less correct? So we would need a 3rd state "Pending". So in order to use pointers we would need 3 states the pointer itself, not accessing the map and some kind of pending. This would cause an extra write(pending) we don't use with the epoch count. This could offer a space improvement since we don't need to store the old values of each reader, I think? (You may have talked about that I didn't get to finish the stream. Ill go take a look so this may be edited with an answer) The other issue with pointers as a space optimization under this would be could could be pending then be put to sleep. The writer has no idea if we read the new or old pointer or if we even read at all. Where as with the counter we just need odd and even. Odd telling us that we are "idle" where even tells us we have read the pointer. We don't care what state it is as long as we see it increment.
I think he mentioned he might return to it at some point in the future (in the PhD defense talks? maybe?), but is looking forward to pursuing some other things around the ecosystem for a bit.
You can see the joy on his face when start explaining the overlap in wait time when calling refresh.
The CSAIL talk on evmap was how I originally found your channel and decided to buckle down and learn rust even if it took some time and borrow checker arguing.
Yep, same here. :)
Now I have a new favourite language and the greatest yet intermediate learning content.
I'm just here to like the video and bookmark so one day when I'm hopefully smart enough to understand Jon I can come back to this and be like "oh wow that makes sense"
This is super cool! I sorta feel like “publish” could be a more suitable name for “refresh” 🤔
Edit: oops here I am bikeshedding
Publish is a pretty good name, it's true! I like it. Could you make a comment on the PR (linked in the video description) and I'll update it when I make the next batch of changes?
Bikeshedding, the classic Rust way :)
@@megumin4625 I think it should be "bike-shedding" as they are two distinct words. Sure, joining them make sense as they describe one phenomenon, but that's why the hyphen is more appropriate as it'll show the reader that it's one word made up by two different words. That will definitely make it more understandable, as I've encountered the English language before and therefore have some insights in this. I'm quite confident in my opinion, and some day I'll look up what that word means.
@@naxaes7889nice
@1:25:04 totally muted it to hear the sirens on the street outside. hah. Love these streams! This problem has a nice density, spiciness.
Hey, the announcement you were talking about at 45:10 was in 2015, and the full name of the paper is "Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads"
thanks for sharing the paper name 😄
The notable difference between these and the Cambridge videos, are all the sirens. The song of 'Murica.
Very cool. However, there's a minor terminology problem. "Eventually consistent" means the readers can see updates in the middle of a transaction. That's what you get if the writer switches maps on every write. If you want to debit Mary $10 and credit Bill $10 and you send it off to the readers in between, that's eventually consistent. If you make both changes before you send the updates off to the readers, then you have isolated transactions, which is the ACID property that "eventually consistent" violates. If you just push updates on every change, then you have no transactional updates and it's just "most up to date," not "consistent." :-)
However, it is a very cool concurrency principle. I learned a lot.
A rope might be a good candidate for an alternative type that could use this as a concurrency primitive. (One might consider it a collection of bytes though.)
Oh, yeah, that'd be cool!
For the discussion at 34:00 on different ways for the 'Writer' to determine that all 'Readers' have moved on to the other map clone: Could you not simply store the address of the map they are reading from or zero if they are not reading?
This should work because:
1. All readers that are not reading are null, they will get the new atomic pointer when they start reading, they can be ignored.
2. All readers that have the new pointer can be ignored.
3. All readers that have the old pointer must be waited on (so filter and goto 1).
This should be similarly efficient because we're just updating a single usize. Since you are updating a single u64 right now, that shouldn't hurt.
Alternatively, if leaking the pointer is scary for any reason or you want to use a smaller storage option than usize then just annotate each map copy with a number, so map 1 and map 2, then store 1 or 2 or 0.
Really it's just a tristate: Not reading, reading map 1, reading map 2. If the readers also know which map is the 'newer' map they could even optimise this to a boolean; reading old map or reading new map + not reading. If that was done you'd basically be trading that increment for some comparisons. The tristate option seems more intuitive than the counter or boolean optimization to me.
Nevermind, at 35:20 someone suggests this idea lol.
My idea of resetting to 0 at the end of a read does address the race condition discussed later though.
Take a look at the other comment thread on the video with 9 comments (TH-cam won't let me link to it it seems). Basically, the challenge is that there now exists a race since readers must learn which pointer value to write. An integer tri-state with pre-distributed pointers might work, though I'm not sure how much it really simplifies the design in practice. Might save you from eventual overflow on 32-bit platforms though.
If Absorb had its operation as an associated type, then the impl for HashMap would have to be in the left-right crate due to orphan rules. You wouldn't be able to impl Absorb for HashMap with a custom operation type in the evmap crate, because even though the operation type is local to the crate, it does not help to uniquely identify the impl - it is not part of the signature. Both Absorb and HashMap in `impl Absorb for HashMap` are foreign relative to evmap, which would violate orphan rules.
Yup, exactly. That's what I was getting at (but perhaps somewhat unhelpfully) when talking about how this affects coherence.
can you not derive a newtype with its its traits and methods, and impl absorb for that?
@@solenskinerable You could, though newtypes in Rust as still relatively painful. There isn't a nice general way to forward traits, so you end up writing all of this boilerplate for forwarding that just bloats your code unnecessarily. There is a proc macro crate that takes care of that I believe, but I haven't looked into it too much.
The Vim window background colour keeps flashing between black and dark gray. I particularly noticed it during the hasher discussion at 1:45:00. Don't know whether that's your Vim window or the video, though.
It's the video encoding. I'm not quite sure why it happened, because it hasn't been a thing in the past. Hopefully I'll have it fixed by next time!
Do I understand CQRS? The segregation, the event log, the possibility to replay or rebuild the map from the log, sounds familiar.
So to forward the types, can you not write the following?
impl Absorb for D
where
D: Deref,
{
// add all the forwarding methods once
}
Ahhh, you answered this
what drawing app is that? i cant find a good whiteboard app for linux
About the collection-specific-ness of LeftRight: it solves a shared memory concurrency challenge. Usually when you want to share data across threads over a prolonged period of time (as opposed to e.g. sending a message onto an agent's queue) you want to share more than a single object, i.e. you want to share a collection.
In 99% of the remaining 1% of cases, you probably just want an atomic number or pointer, for which LeftRight is probably overkill.
TL;DR: my gut feeling is that LeftRight is not a collection-specific solution, but it is a solution to an almost-but-only-almost-exclusively-collection-specific problem.
Thank you for great content! It was really good stream.
I found small problem with the recording (for example 3:07:20 - 3:07:25). There are changes in the vim's background color from dark to almost_dark. Is it your some specific configuration or the problem is in stream recording software?
Thank you! Yes, I'm aware of it - it seems to be something weird about the video encoder I was using. I'm hoping to have it fixed by the next stream :)
6hours :) i need to overclock my mind
Awesome , make more . thank you
Can we use bits to represent the reader state is busy or not, maybe 1 for busy and 0 for free (or 2 bits for each reader if we need more states). Then writer only need to compare with 0 for alls readers bits to check for no readers are in working state (bit mask or something, I’m not expert). Maybe the bit operator is faster than looping all the reader and read counter equal to even. Just my 2 cents when watching at 39:34s
Unfortunately that doesn't work very well, because you'd then end up with all the threads trying to update the one shared value, which is going to lead to a performance collapse as the number of threads to up. It would speed things up for the writer, but at the cost of significantly worse performance for many readers :)
Hat off to this guy!
WOOO
Thankyou
Can you maybe publish your dotfiles/vim setup?
I actually have a video up already on my desktop setup, and my dotfiles are public at github.com/jonhoo/configs
Is the problem described at 36:30 a real problem?
I imagine the scenario as follows:
1. reader R reads the sheared atomic pointer to the new map
2. writer W reads R's state (atomic pointer) and sees an _old_ pointer thinking that R is still blocking him
3. R stores the pointer he just read into his state (atomic pointer) and begins to read the new map
4. W re-reads R's atomic pointer and sees the new pointer, now it can write to old map (in the case of R being the only reader, otherwise it need to wait on others too)
Am I missing something?
Yes and no. The issue you have described is less of a problem though im sure there is a problem we are overlooking.
The real issue comes from starvation since the writer never know's if you are finished. Say you do a read and then sleep or go work on something else never reading in the new pointer.
But the issue is solved by setting your pointer to null post read. Its basically the same thing we do now when we increment post read.
So, the problem is actually the gap that now exists between when you read the pointer you're about to use, and when you write that into the slot that the writer checks. During that gap, the writer will think that you're _not_ in the datastructure (since the value was null), and will assume that it's fine to write. But, you could have read the pointer before the writer's swap, and then just been descheduled or something so that the writer had time to see you still being null, but then you go on to use that old pointer.
You _need_ to update the slot strictly before you read the pointer. You could probably fix this by writing a known dummy value before you read the pointer, then write the pointer, and then write null when you're done, but that's becoming pretty inefficient.
I think I'd prefer to go the route of handling epoch wrap-around instead, especially since it seems like it's just a matter of using equality instead of greater than!
@@jonhoo I believe that's the best option, since chances are very low that the readers have time to do a complete round on the epoch count so that the writer reads again the same value and gets tricked into thinking they had not moved on, and even lower to happen multiple times.
@@jonhoo I think I understand. Is this more or less correct?
So we would need a 3rd state "Pending". So in order to use pointers we would need 3 states the pointer itself, not accessing the map and some kind of pending.
This would cause an extra write(pending) we don't use with the epoch count. This could offer a space improvement since we don't need to store the old values of each reader, I think? (You may have talked about that I didn't get to finish the stream. Ill go take a look so this may be edited with an answer)
The other issue with pointers as a space optimization under this would be could could be pending then be put to sleep. The writer has no idea if we read the new or old pointer or if we even read at all.
Where as with the counter we just need odd and even. Odd telling us that we are "idle" where even tells us we have read the pointer. We don't care what state it is as long as we see it increment.
Yes, that seems about right!
How is noriadb goin?
I don't think he's still working on it since he finished his PhD.
I think he mentioned he might return to it at some point in the future (in the PhD defense talks? maybe?), but is looking forward to pursuing some other things around the ecosystem for a bit.
Don't tell me I'm the only one who clicked biobreak 3:01:01
Can you please tell me how to get your coloscheme? I know it's a version of gruvbox but how do I get it?
I get it through base16. It's specifically the version called gruvbox-dark-hard
i wish i could learn rust ... there is no good tutorials in this area ... just an incomplete book .