Thank you for this presentation, Rishi. It was very informative. I've been messing with bit-packing types since the start of the summer and you gave me some ideas!
Datum has trivial constructor and destructor, but it has static methods to create Datum objects holding different types of values. It also has a "destroy" method to deallocate any memory that may have been allocated while creating a Datum object holding a type that requires heap allocation. When you copy a Datum, the bits are copied. So essentially both Datum values are pointing to the same allocated memory on heap (ex. if you were storing a long string object in the Datum object.)
@@rishinmotion So who owns the data if I have two Datums pointing at it? Do I have to do some kind of manual reference counting? In my "modern c++ mindset" Datum makes no sense, it feels prone to memory leaks if not handled with care. In the example in the talk, I would just use std::variant with custom string class with very small short string optimization such that it takes around 8 bytes of memory.
A good way to understand the model for a 'Datum' object's relationship to its data is by analogy: The relationship between a 'Datum' object and the memory to which it refers is analogous to that of a raw-pointer and the data to which it points. You could very well implement a custom string class, but with the discriminator inside std::variant, you are still going to exceed 8 bytes. Also, Datum supports more "larger" types than just strings. You would have to make custom classes for those types too. Datum is a very low level type, but there is a value semantic type called 'bdld::ManagedDatum' that holds a Datum and a pointer to an allocator, and behaves the same way as you would expect any "modern C++ type" would behave: bbgithub.dev.bloomberg.com/uiinf/bde/blob/master/groups/bdl/bdld/bdld_manageddatum.h
some other points: usecases mention "spreadsheets" and data usage as important, but you do not discuss why not use float(or even clipped float(FP16)). Is precision so important you need double over float, but not decimal?
rishinmotion thank you for answering questions... wrt datum I feel it is quite ugly data type to work with, but the performance requirements you have make it mandatory. If it was me implementing it I would make it movable type that requires you to manually make a copy, something how unique_ptr has low overhead over raw ptr but it is not leaky.
The data about destruction that you see in excel is when the static "destroy" method is called to manually release memory held by the Datum objects when they are storing pointers to long strings on the heap.
Thank you for this presentation, Rishi. It was very informative. I've been messing with bit-packing types since the start of the summer and you gave me some ideas!
Glad you found the presentation helpful for your project!
Around 19:00 Datum "May allocate memory on heap" but it is "Bitwise copyable" and has "Trivial destructor", how can this all be true?
Datum has trivial constructor and destructor, but it has static methods to create Datum objects holding different types of values. It also has a "destroy" method to deallocate any memory that may have been allocated while creating a Datum object holding a type that requires heap allocation. When you copy a Datum, the bits are copied. So essentially both Datum values are pointing to the same allocated memory on heap (ex. if you were storing a long string object in the Datum object.)
@@rishinmotion So who owns the data if I have two Datums pointing at it? Do I have to do some kind of manual reference counting? In my "modern c++ mindset" Datum makes no sense, it feels prone to memory leaks if not handled with care. In the example in the talk, I would just use std::variant with custom string class with very small short string optimization such that it takes around 8 bytes of memory.
A good way to understand the model for a 'Datum' object's relationship to its data is by analogy: The relationship between a 'Datum' object and the memory to which it refers is analogous to that of a raw-pointer and the data to which it points. You could very well implement a custom string class, but with the discriminator inside std::variant, you are still going to exceed 8 bytes. Also, Datum supports more "larger" types than just strings. You would have to make custom classes for those types too. Datum is a very low level type, but there is a value semantic type called 'bdld::ManagedDatum' that holds a Datum and a pointer to an allocator, and behaves the same way as you would expect any "modern C++ type" would behave:
bbgithub.dev.bloomberg.com/uiinf/bde/blob/master/groups/bdl/bdld/bdld_manageddatum.h
@@rishinmotion Thanks, now it makes more sense to me.
some other points: usecases mention "spreadsheets" and data usage as important, but you do not discuss why not use float(or even clipped float(FP16)). Is precision so important you need double over float, but not decimal?
No, we do need the higher range provided by doubles for some fields.
I think its available here: github.com/bloomberg/bde/blob/master/groups/bdl/bdld/bdld_datum.h
Unless I missed it, video did not show slide with link.
rishinmotion thank you for answering questions... wrt datum I feel it is quite ugly data type to work with, but the performance requirements you have make it mandatory. If it was me implementing it I would make it movable type that requires you to manually make a copy, something how unique_ptr has low overhead over raw ptr but it is not leaky.
That is a fair point.
cool
And then someone comes around and copy-pastes it into excel
The data about destruction that you see in excel is when the static "destroy" method is called to manually release memory held by the Datum objects when they are storing pointers to long strings on the heap.