Hi Raphaël Walter could you please help me understand the following. 1. What happens when the data doesn't fit memory. 2. Can the data source be a Kafka topic ? 3. How is the data versioned ? Thanks in advance
Hello Ranjeet, Thank you for your comment. :) For your first question : - I'm loading the S3 bucket into SAP HANA DB, but of course, I could have done my whole process without the use of HANA and left the data in my bucket and load it somewhere else in a datalake, or not a in-memory DB and run the Python code on it, the model creation process would simply have been slower. - If you still want to use HANA like me, as you know, HANA is column store, you will have a compression of your data, meaning 5-7 compression ratio. But if you still don't have enough memory space, you can also keep the data in the bucket or another data lake and use HANA's SDA (Smart Data Access), it will create a virtual version of the data and get the data as needed from the remote source. - You can also sample your data and only keep a subset of your data (in terms of lines), perform data quality analysis and find out flaws in your data and only keep the relevant one. It's only recently that we started using all the data for creating our data science models. Finally, you could also perform feature engineering (and keep only a certain number of columns and leave out the ones that are not correlated) on the largest subset possible for your memory limit and then load the data accordingly. For your second question: - The source can be a kafka topic, SAP Data Intelligence has a Kafka consumer component for data orchestration. You simply define your system in the connections and then you can use it in your flows. For the third question : - In the ML scenario Manager of SAP Data Intelligence you define several datasests, notebooks, pipelines, executions and deployments. This is where you will handle the versioning of your all these elements. Good things is that with SAP Data intelligence the whole process will be greatly industrialized and you wont have to perform extracts of your data for model creations, once the piplelines are done you can use them to create updated versions model, switch back to another version that had better results, and deploy them quickly. :) Best of luck! Take care, Raphaël
Hi Raphaël Walter, i am SAP BI Consultant, so, can you tell me what features can be cited to differentiate SAP Data Intelligence from other solutions, like, Azure and AWS?
Hello Francisco, Difficult question to answer in a few lines... :) If you want to know more about SAP Data Intelligence, they are actually launching a free opensap course on it, exactly today : open.sap.com/courses/di1/items/38zdH2qkuqFSWC3zojPNYc
This is very strange indeed, there used to be sound on this video as the subtitles are still there. I'll check into it. Thank you for pointing this out. :) Take care, R.
Hi Raphaël Walter could you please help me understand the following.
1. What happens when the data doesn't fit memory.
2. Can the data source be a Kafka topic ?
3. How is the data versioned ?
Thanks in advance
Hello Ranjeet,
Thank you for your comment. :)
For your first question :
- I'm loading the S3 bucket into SAP HANA DB, but of course, I could have done my whole process without the use of HANA and left the data in my bucket and load it somewhere else in a datalake, or not a in-memory DB and run the Python code on it, the model creation process would simply have been slower.
- If you still want to use HANA like me, as you know, HANA is column store, you will have a compression of your data, meaning 5-7 compression ratio. But if you still don't have enough memory space, you can also keep the data in the bucket or another data lake and use HANA's SDA (Smart Data Access), it will create a virtual version of the data and get the data as needed from the remote source.
- You can also sample your data and only keep a subset of your data (in terms of lines), perform data quality analysis and find out flaws in your data and only keep the relevant one. It's only recently that we started using all the data for creating our data science models. Finally, you could also perform feature engineering (and keep only a certain number of columns and leave out the ones that are not correlated) on the largest subset possible for your memory limit and then load the data accordingly.
For your second question:
- The source can be a kafka topic, SAP Data Intelligence has a Kafka consumer component for data orchestration. You simply define your system in the connections and then you can use it in your flows.
For the third question :
- In the ML scenario Manager of SAP Data Intelligence you define several datasests, notebooks, pipelines, executions and deployments. This is where you will handle the versioning of your all these elements. Good things is that with SAP Data intelligence the whole process will be greatly industrialized and you wont have to perform extracts of your data for model creations, once the piplelines are done you can use them to create updated versions model, switch back to another version that had better results, and deploy them quickly. :)
Best of luck!
Take care,
Raphaël
@@raphaelwalter7412 Thanks Raphaël !!
Hi Raphaël Walter, i am SAP BI Consultant, so, can you tell me what features can be cited to differentiate SAP Data Intelligence from other solutions, like, Azure and AWS?
Hello Francisco,
Difficult question to answer in a few lines... :)
If you want to know more about SAP Data Intelligence, they are actually launching a free opensap course on it, exactly today :
open.sap.com/courses/di1/items/38zdH2qkuqFSWC3zojPNYc
If you want to go through this video turn on the subtitle, we are able to follow something.
This is very strange indeed, there used to be sound on this video as the subtitles are still there.
I'll check into it.
Thank you for pointing this out. :)
Take care,
R.
I am unable to hear anything.