วีดีโอ

มุมมอง 0

ความคิดเห็น •

@HypaxBE 3 ปีที่แล้ว ⁺⁴⁷
The first and immediate remark I want to make is that the guy presenting this is the CEO of Fivetran that operates within the ELT spectrum. So remain sceptical to what he's saying as he has his motives as well. My personal view on this:
I do agree that the big ELT advantage is to make large volumes of data acessible way faster in it's native form into the data warehouse. Once there you would transfrom this to a dimensional model for operational reporting (the BI, self service BI, ... sphere where you need aggregated data) and/or replicate it into the data lake as raw data which is already interesting (for example patterns in transactions in the data science sphere where you need this detailed grain of the data).
I don't agree that analysts will make the T happen (normalized -> dimensional) as this is often not their speciallity. To me this remains in the data engineering realm. But once it's in a dimensional model Analysts will happily query on this and look for insights.
@RaYaN-uc4ms 3 ปีที่แล้ว
This one so beneficial information. Please have another big data process explanation.
@emmanuelhernandezolvera 3 ปีที่แล้ว ⁺¹
That smiley face at the end got me lol
@olaoyeanthonysomide9528 3 ปีที่แล้ว
Thanks for sharing
@_xcelpro 3 ปีที่แล้ว
A really clear explanation! This is awesome!
@sarveshpandey1125 3 ปีที่แล้ว
made it crystal clear. Thanks!!
@marlonbariuad7177 ปีที่แล้ว
BEst Explanation... 10 Stars
@selvakumar2984 ปีที่แล้ว
Clear explanation.
@i3bdallah 3 ปีที่แล้ว ⁺¹
Well presented, thanks for the insight.
@donchichiumelo2762 ปีที่แล้ว
The 2 major issues with the ELT approach are performance and cumbersome SQL queries. Users have to hit large tons of data every time and build complex large SQL queries. This is what ETL tries to solve. Its true ETL has its own setbacks but it's more specific to the company. Choosing between ELT and ETL will ultimately depend on the company's data and IT infrastructure in place plus budget.
@doctorbower1989 3 ปีที่แล้ว ⁺⁴
Isn’t the way ELT is explained there just a staging table what’s the difference?
@benjaminchai4205 3 ปีที่แล้ว
This idea is cool! Thanks for sharing.
@papachoudhary5482 3 ปีที่แล้ว
GREAT
@camiaz09 ปีที่แล้ว
See you at Tech Week SF!
@mro5858 2 ปีที่แล้ว
Niceeee!
@rodneychan1227 3 ปีที่แล้ว ⁺¹
This is awesome - The only tool I'm aware of that does ELT is Oracle data integrator, are there any other tools?
@rodneychan1227 3 ปีที่แล้ว
@@mikepickett5498 Thanks so much for the reply!
@carnotantonioromero3024 3 ปีที่แล้ว ⁺¹
Used to be on the ODI team, so I know whereof I speak... ELT goes back a ways (Oracle Warehouse Builder used it for practically everything as early as 2004, and OWB 11.2 had complex ETLT type topologies), and many products like Talend (was also on that DI team), Informatica, etc. had ELT as well. There's also "query pushdown" from federated query products like Tibco Information Server, Denodo, and others that can in effect do ELT by moving source data into a target system, processing it there, and possibly landing it there. So... not at all unique, but definitely good.
@abhishekpharkya6234 2 ปีที่แล้ว ⁺²
Really are you saying that we should go back to 20 years back and again start writing the long complex quries and packeges to archive the required transformations . What about data cleansing , standerization & MDM , should all these be done on target database using queries lol . And how can you say that this ELT solution can be reused , still companies have their own custome business requirements and need for which custome code and database model would be needed. And how the transformation changes will be easy post production deployment. ELT does not has a great future.
@yashdahenkar8289 3 ปีที่แล้ว
It is insightful, thanks for sharing
@Eliassausaur 3 ปีที่แล้ว ⁺¹
Interesting but how does it optimizes costs? To me it feels like what was the work of the Data-engineers is now the work of the Data-analysts (this is a geniuine question and not sarcasm).
@MrMulletman21 3 ปีที่แล้ว ⁺¹
Data analysts are cheaper and its much easier to write sql and amend sql here and there.
You also just just refresh the query and everything updates, rather than having to delete the original data, retransform it and bring it in. When you change it in the new world you don’t change the raw data/code tables only the end result
@cosmos177 3 ปีที่แล้ว
I agree with @Elias Ozor in this regard.
@@MrMulletman21 FiveTran is in the Business of automating replication so the Transform angle is non-existent in their solution. Its kicking the can further down the road. This approach brings Data Integration issues into the realm of the Data Analyst instead of handling it at the time of building the Data Lake or Data Warehouse. One can even take the operational data and then do the transforms and analysis within a custom Analytical solution like SAS or SPSS if the dimensional schema is not needed.
This simply makes the source system reside in-house via the EL automation. However the integration problems are deferred due to a non-attempt to solve. Some simpler cases this makes sense and maybe easy to maintain by Data Analysts/non-specialists but for a large mature Enterprise, the SQL skills that T in ETL (ie Transformation) require are very different from the SQL skills that Data Analysts employ using BI Tools. You may need a team of SQL specialists supporting the dimenisonal schemas ... part of the T work ... before the Data Scientist or Data Analysts play with data (unless Data Scientist are good at SQL too, not just python).
Typically there will be multiple stages of data transformation/storage before the UI gets to play with modifying things and seeing how everything changes (reset and check results afresh). SQL changes wont just re-run reports but will indeed reset the entire analysis env incl dropping/recreating the data/delta lakes/analytical marts via re-running the transforms and only then re-running the reports.
This will fit the Notebook usecase where all transforms and analytics are done in the same space but the skill sets needed now encompass Data Integration as well as Data Analysis incl. Ai/ML that Data Scientists may need to do.
@georgewfraser 3 ปีที่แล้ว ⁺⁵
Working with data that's already been replicated to a data warehouse is much faster to iterate on. You can run a new SQL query against all your historical data in a few seconds, whereas a traditional ETL tool has to "replay" from the source.
@carnotantonioromero3024 3 ปีที่แล้ว
Part of optimizing costs is, you are doing the heavy crunching in a system that you already own that's scaled for that. Older ETL products required a separate processing cluster for the transformation-- you'd have a whole cluster doing nothing but data pipeline transformation work, and then a separate cluster that had to scale to support the needed analytics workload on the transformed result. And you had to move the data twice-- once from sources into transform cluster, again from transform cluster into target. If you can do all that inside the destination it may well be cheaper.
@z911empire 3 ปีที่แล้ว ⁺²
Poor guy had so many cameras on him
@michalpesicka8152 ปีที่แล้ว
I have a problem to agree with half of the things he said.
@cordularaecke 3 ปีที่แล้ว
So you are replicating and archiving every production transaction first... of course, there's ZERO cost in having to maintain historical data in this replicated database, right? I mean, there's no performance hit there at all - so you can just play with SQL to transform the data. Anyone who's actually been involved in ETL or ELT will tell you what a challenge it is to replicate production databases, keep them in sync and maintain good performance in the DWH. I'd say you'd prob need more dev ops and DBA's to keep the replicated DB in shape. IMHO
@joyo2122 3 ปีที่แล้ว
wrong
@wida12 3 ปีที่แล้ว ⁺¹
I'm not 100% convinced here, its like you have a document, and copy-paste the document, then change the copy right? which gives the data analyst more task, because he/she have to understand the complex transactional databases first before actually analyzing the transactional data. btw, you keep looking sideways, that makes you looked like you're unsure about what you're presenting
@carnotantonioromero3024 3 ปีที่แล้ว
The "changes" to the copy are very computationally intense and therefore better performed in a system already scaled to handle that workload. If your sources are modest-sized operational databases, SaaS services, files etc. and then your destination is a big Spark cluster on S3 or something, the efficiencies can be enormous.