Good presentation. We can also include the database CDC logs use. We can move the on-prem transactional database data to the DW on cloud using CDC and KAFKA.
"we can use a time window to restrict how we are going to join a dataset [...] very hard to do that inside a database" select from T1 join T2 on T1.key = T2.key where T1.datetime between @datetime1 and @datetime2 and T2.datetime between @datetime1 and @datetime2 Hm.
That wouldn't produce the same result as it is a fixed, retrospective query. In theory, you could keep running such a query over and over but that would be computationally expensive and wouldn't give you correct results as you would miss any records that came in late (beyond your configured window). Stream processors handle the latter and do the whole thing more efficiently.
@@stopfob Yes, since database don't expose streams, we can't do stream processing in a database. But we can certainly "use a time window to restrict how we are going to join a dataset". And we might be using temporal tables... a database isn't something which can only contain current state, it can contain history too. I'm not saying "never use stream processing", my point was merely to show that some of the assertions being made here about databases in general are questionable.
@@allmhuran Indeed you can, but windowing does more than limit the data in a join. Take the late data case. Record R5 from stream S1 comes in late. It belongs in window W2. The current window is W4. It also needs to join with another stream/table as at W2. Now you'd need to construct a query that only picks up unprocessed records like R5, but knows to join it to data in a previous window. It's theoretically possible but it a fair bit more complex than your statement above and to be honest it gets pretty messy.
Good presentation. We can also include the database CDC logs use. We can move the on-prem transactional database data to the DW on cloud using CDC and KAFKA.
"we can use a time window to restrict how we are going to join a dataset [...] very hard to do that inside a database"
select from T1 join T2 on T1.key = T2.key where T1.datetime between @datetime1 and @datetime2 and T2.datetime between @datetime1 and @datetime2
Hm.
That wouldn't produce the same result as it is a fixed, retrospective query. In theory, you could keep running such a query over and over but that would be computationally expensive and wouldn't give you correct results as you would miss any records that came in late (beyond your configured window). Stream processors handle the latter and do the whole thing more efficiently.
@@stopfob Yes, since database don't expose streams, we can't do stream processing in a database. But we can certainly "use a time window to restrict how we are going to join a dataset". And we might be using temporal tables... a database isn't something which can only contain current state, it can contain history too. I'm not saying "never use stream processing", my point was merely to show that some of the assertions being made here about databases in general are questionable.
@@allmhuran Indeed you can, but windowing does more than limit the data in a join. Take the late data case. Record R5 from stream S1 comes in late. It belongs in window W2. The current window is W4. It also needs to join with another stream/table as at W2. Now you'd need to construct a query that only picks up unprocessed records like R5, but knows to join it to data in a previous window. It's theoretically possible but it a fair bit more complex than your statement above and to be honest it gets pretty messy.