thx mate, i'm just an intern and my boss put me onto designing and building a POC pipeline from Postgres to Kafka to Flink and then to target db. My first time playing with these technologies and learning about them but it's cool to get into the architecture world of things
Hey Jordan, Great video. One question, as job scheduler is stateless(Assuming worker node health info, checkpoint details are kept in S3) , can we not simply have replicas of this job scheduler to handle SPOF. Do we really need zookeeper & leader/stand-by mechanism ?
I think the reason you want a single job manager to ensure consistency with snapshotting. Having two job managers can be bad, as they'll contradict what each other say! That's generally why we need to get consensus involved somewhere, to avoid split brain - not just for fault tolerance :)
Hey Jordan! Quick question - on your slide at 04:51, how come we would need apache flink versus a regular message queue to ingest changes from a stream? Or would they both be sufficient for this particular usecase?
I think that you're slightly misinterpreting the point of flink here. Flink isn't actually a message broker, but works with a message broker. It takes in messages from a message queue, and has local storage that is useful for performing computations on the messages that it has received in the past. The message broker is just responsible for getting messages from producers and delivering them to consumers.
thx mate, i'm just an intern and my boss put me onto designing and building a POC pipeline from Postgres to Kafka to Flink and then to target db. My first time playing with these technologies and learning about them but it's cool to get into the architecture world of things
Gonna finish this playlist by end of Oct!
thank you!
Hey Jordan, Great video. One question, as job scheduler is stateless(Assuming worker node health info, checkpoint details are kept in S3) , can we not simply have replicas of this job scheduler to handle SPOF. Do we really need zookeeper & leader/stand-by mechanism ?
I think the reason you want a single job manager to ensure consistency with snapshotting. Having two job managers can be bad, as they'll contradict what each other say! That's generally why we need to get consensus involved somewhere, to avoid split brain - not just for fault tolerance :)
Hey Jordan! Quick question - on your slide at 04:51, how come we would need apache flink versus a regular message queue to ingest changes from a stream? Or would they both be sufficient for this particular usecase?
I think that you're slightly misinterpreting the point of flink here. Flink isn't actually a message broker, but works with a message broker. It takes in messages from a message queue, and has local storage that is useful for performing computations on the messages that it has received in the past. The message broker is just responsible for getting messages from producers and delivering them to consumers.
You explain it so well
How do we create checkpoints? I have seen checkpoints being used in all kind of cases. May be a new video idea for you?
Look up chandy Lamport checkpointing, may be a bit beyond the scope of the channel lol
@@jordanhasnolife5163 Maybe you can cover it
oh no. i am dumb.
That's a lie!