Workflow Orchestration for Building Resilient Software Systems

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ก.ค. 2024
  • Building resilient software systems can be difficult especially when they are distributed. Executing a long-running business process or workflow that involves multiple different service boundaries requires a lot of resiliency. Everything needs to be available and functioning because if something fails mid-way through you can be left in an inconsistent state. So a the solution? Removing direct service to service communication and temporal coupling.
    🔗 Solace
    solace.com/codeopinion
    🔔 Subscribe: / @codeopinion
    💥 Join this channel to get access to source code & demos!
    / @codeopinion
    🔥 Don't have the JOIN button? Support me on Patreon!
    / codeopinion
    📝 Blog: codeopinion.com
    👋 Twitter: / codeopinion
    ✨ LinkedIn: / dcomartin
    📧 Weekly Updates: mailchi.mp/63c7a0b3ff38/codeo...
    0:00 Intro
    1:03 Distributed Monolith
    3:53 Temporal Coupling
    5:50 Orchestration
    #softwarearchitecture #softwaredesign #codeopinion
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 87

  • @Pretence01
    @Pretence01 2 ปีที่แล้ว +9

    Increased resilience is one of the added benefits of using messaging, another important one would be that it effectively enables you to emulate a distributed transaction in a microservice environment by using an outbox on the sender side that defers publishing the outgoing message until its own state was successfully persisted.

  • @charlesopuoro5295
    @charlesopuoro5295 ปีที่แล้ว

    Thank you very much for this hands-on, pragmatic approach to understanding Workflow Orchestration.

  • @shashanksingh9238
    @shashanksingh9238 2 ปีที่แล้ว

    Thank you so much for explaining this. My Company is using Netflix conductors and documentation for it is very complex. Thank my Tech Gods that I landed here :-)

  • @avineshwar
    @avineshwar 2 ปีที่แล้ว

    I usually think about services that are being built as not just one service, but, a pair of services:
    - main service ("the very core" (perhaps, an even refined version) of business logic code)
    - main's supporter service (for development simplification, e.g. steer back a database to a consistent state)
    - any other infra piece needed to support this (e.g. object store)
    Once we could model and generalize, we should have a working model for a broad class of issues.
    I was hoping you think about it.

  • @buildingphase9712
    @buildingphase9712 ปีที่แล้ว +1

    I get the point however payments are probably going to happen client side with a redirect to the payment gateway and a success message call. But point taken in terms of async messages.

  • @rcosta551
    @rcosta551 2 ปีที่แล้ว +2

    Thank you for this video. I am working on specific project which has a few of use cases to be orchestrated. I created a native solution something like "Workflow of use cases". I am not sure if that is a good practice but It is working very well.
    I have been looking at temporal.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Let me know how you make out with Temporal if you end up using it.

  • @JosueIbarraNinja
    @JosueIbarraNinja 4 หลายเดือนก่อน

    Great explanation of workflow orchestration! I’m part of the Cadence team at Uber

    • @CodeOpinion
      @CodeOpinion  4 หลายเดือนก่อน

      Nice! Thanks for the comment and some validation 😀

  • @sathyajithps013
    @sathyajithps013 2 ปีที่แล้ว +3

    Cool vid. My first exposure to something like this is in MassTransit Sagas. I think this example can be done using Masstransit Sagas + Courier. Perfect for these kind of scenarios.

    • @TheRak00792
      @TheRak00792 2 ปีที่แล้ว +1

      Routing slip can definitely work for the provided example. I'll prefer coupling it with a state machine for complex scenarios though

    • @morespinach9832
      @morespinach9832 4 หลายเดือนก่อน

      As in choreography?

  • @loren-sr
    @loren-sr 2 ปีที่แล้ว +7

    Great video, thanks for making! I love Temporal, and work there, if you have any questions! Main differentiator is orchestrating via code instead of via JSON/YAML, and we have SDKs for Go, Java, PHP, TS/JS. Python/Ruby/.NET/Rust SDKs are in development.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Was just recently looking at the GitHub repo for the .NET SDK for point of reference to see what it looked like.

    • @avineshwar
      @avineshwar 2 ปีที่แล้ว

      So, just to be sure how Temporal works, if an "n" step process (Workflow) is executed and if somewhere a step (Activity) "x" is considered (through business logic code) to be failing, then we c/would re-attempt our workflow from the step "x" / "x-1" (assuming idempotency)?
      If yes, does that mean Temporal is trying to influence 2 (arguably huge) scenarios:
      - simplify failure management pattern (reduce code/implementation branching depth on left or right)
      - simplify developer lives by letting them not worry about simple enough things
      Those are big things, but, maybe someday we will get:
      - detect an underlying reason for a certain observation (e.g. exception due to socket close) and deal in some standard way

    • @loren-sr
      @loren-sr 2 ปีที่แล้ว +1

      @@avineshwar yes and yes. For the last, you can certainly handle in code different errors in different ways, and can do so across all Activities and Workflows. It would be nice someday to have some automated ML-based error handling, and a system like Temporal is a necessary base for that, since we not only have information on all failures, but are also the orchestrator deciding what to do next.

    • @avineshwar
      @avineshwar 2 ปีที่แล้ว

      @@loren-sr I see (when I think about "fast af" operations and AI, seems like apple/oranges by today's standard). Thanks. All good information.

    • @morespinach9832
      @morespinach9832 4 หลายเดือนก่อน

      Temporal doesn’t have a visual flow charter like camunda. Correct?

  • @softcoda
    @softcoda ปีที่แล้ว

    How do you know if the service is down and would like to take an action subsequently

  • @rafaspimenta
    @rafaspimenta 2 ปีที่แล้ว

    Hi Derek, thank you for the great contente as usual. Could you tell about the tools that you use to draw architectural diagrams and you thinking process to build one?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +4

      I just use PowerPoint, nothing special. In terms of how I make them, specifically for a video, I've been thinking about making a video about that

  • @nav201182
    @nav201182 3 หลายเดือนก่อน

    In case we want to buy a tool from market to perform workflow orchestration, which tool you would recommend?

  • @morgadoapi4431
    @morgadoapi4431 2 ปีที่แล้ว

    Kafka provides transactions but in a more technical sense. These transactions are to guarantee that either a collection of messages are written to many topics or not at all.

  • @alexsiuwh
    @alexsiuwh 2 ปีที่แล้ว +1

    I have been with WFL orchestration for 20 years , 100% with you technically, but what makes project fails are mainly human factors and work politics in user environment

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      Ya, pretty much the case with everything.

  • @rockmanjacky
    @rockmanjacky 2 ปีที่แล้ว

    That's a very good video to explain the message queue system design, but how can we prevent the single point of failure if the message queue is down?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Good question! It's core infrastructure, no different than a database. What do you do to prevent your database from being a single point if failure? Generally, a cluster for high availability and also use the outbox pattern: th-cam.com/video/u8fOnxAxKHk/w-d-xo.html

  • @rcts3761
    @rcts3761 2 ปีที่แล้ว

    Do you know some good strategies to make sure that services which publish commands reliably process all possible "response" events? For example, a developer might add a new response type to the responding service and forget to update the event handler in the commanding service.

    • @b1ueocean
      @b1ueocean 2 ปีที่แล้ว +1

      responding service doesn’t care about the commanding service (or any other) in your scenario above - responding service simply lands a response event in the broker.
      commanding service emits messages without knowing the who and the how regarding response events.
      if a specific event handler is missing from the commanding service how has it been released while falling short of the Definition of Done? 👈
      Even if you rely on a list of supported events in the commanding service’s configuration and hand roll verification to ensure supporting handlers are available/registered, such configuration needs to be kept up-to-date.
      easiest strategy is good testing practices 😋

  • @jalalalmutawa4889
    @jalalalmutawa4889 ปีที่แล้ว

    Hi Derek, how should we handle the failure of a service that has updated its database but failed before sending an event/reply?

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      One option is the outbox pattern: th-cam.com/video/u8fOnxAxKHk/w-d-xo.html

  • @BonnakChea
    @BonnakChea 2 ปีที่แล้ว +2

    Thank for the video. It helps me a lot. However, I couldn't find one that supports orchestration for Nodejs. Really appreciate if I can get a stable one.

    • @loren-sr
      @loren-sr 2 ปีที่แล้ว +1

      Temporal's Node/TypeScript SDK is pretty stable, and hitting v1 soon (which won't have any major breaking changes).

    • @BonnakChea
      @BonnakChea 2 ปีที่แล้ว

      @@loren-sr Thanks a lot.

  • @dmsanz_youtube
    @dmsanz_youtube ปีที่แล้ว

    To keep that workflow "state" is it necessary to use some kind of saga or similar? Or is it enough with having service Ordering state (i.e: aggregate) capturing all the "distributed state" in order to react with compensating actions, etc? Or would that be too aware of other domains and a violation of separation of concerns?
    I suppose sagas (with mass transit, nservicebus) help a lot with these things if we want to have this external orchestration. But what if we don't use a message broker and we simply have event streams the services are subscribed to?

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      What you're referring to is more event choreography where you don't have centralized orchestrator, but each boundary is consuming events and publishing events. Check out: th-cam.com/video/TA12e2ZJcGg/w-d-xo.html

  • @saharis811
    @saharis811 ปีที่แล้ว

    Greate video as usual Derek!!
    I have one question, when for example one of the services in the workflow is in inavailablity state, now this service is unavailable and is a bottleneck, it's not a failure for this point in time, but we don't really know until it will be available again when it will be available it will process the request and then will publish success or failure, it can happen instantly, or after days, we don't really know.
    We need to respond to the client instantly. How can we handle this from client side?

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      Well the key is you don't/shouldn't need to respond to the client instantly. The simplest example is placing an order. You don't need the entire workflow to succeed/fail to respond to the client. You accept the request to place the order, you do so and return to the client. The payment processing and everything else involved can be async/out of process. If the payment service is unavailable or the payment fails, you'd email the customer to notify them. I talk about this a bit more in this video: th-cam.com/video/wEUTMuRSZT0/w-d-xo.html

  • @ItzukiTheDemon
    @ItzukiTheDemon 2 ปีที่แล้ว

    How would you notify the client about the success or failure? Is you depiction here a fire and forget from the “client” to “ordering”?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +2

      It depends on what the workflow is and if the client is "aware" of what's actually happening. In the case of an Order, you simply tell them the order was placed initially. If there is an issue with their credit card or payment, you can email them etc. It really depends on the exact use case. Long running process means it can take milliseconds or it can take days or weeks even. It all depends on what the workflow is. Check out the video I did on using WebSockets as a means to push down to the client. th-cam.com/video/Tu1GEIhkIqU/w-d-xo.html

    • @ItzukiTheDemon
      @ItzukiTheDemon 2 ปีที่แล้ว

      @@CodeOpinion I’ll definitely check out the video. Thank you!

  • @maxkomarow
    @maxkomarow 2 ปีที่แล้ว +1

    Thanks for the video. Isn't it a saga orchestration pattern that you described? And also I wonder how the orchestrator can be implemented without frameworks. It seems like he has to have its own tables in a service database to check completed events. If so, we have to have something to get and save the orchestrator. Repository? But the orchestrator doesn't seem like a part of the domain model, more like a part of an application service. Would be glad to hear your thoughts on this

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +4

      It fits more on top of messaging infrastructure. There can be state involved which is needed if you want to keep track of which events have occurred or something in their contents to then send a command. If state is involved, you'll likely use a library to handle this. The alternative is event choreography, which I will cover independently in a separate video.

    • @juliancyanid
      @juliancyanid 2 ปีที่แล้ว

      @Petar Vucetin aka "Routing Slip"-pattern. My first goto when flow is linear, especially if team/codebase has no existing orchestration (it's not only "stateless", it's also dead simple).

  • @maartenlouage
    @maartenlouage 2 ปีที่แล้ว

    How about using Azure Durable Functions for orchestration?

    • @katjejoek
      @katjejoek ปีที่แล้ว

      That's what I was thinking as well. I guess it depends on how the asynchronous communication is done between the parts. What happens if something is not available? Will it be picked up once it is back or will it fail immediately?

  • @AntonioRonde
    @AntonioRonde 2 ปีที่แล้ว +1

    Why do we place the workflow inside a service like ordering instead of in the broker? This way the start of a workflow would go the same route as any ongoing workflow. Don't know if this is possible / advisable, just wondering about any tradeoffs.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      Not entirely sure what you mean by of "in the broker". Ultimately you're using a broker in a more of a "hub and spoke" way to broker messages between services. How you interact and handle those messages will be something you need to write in code. Can't say I'm a fan of using anything that's based more on markup/metadata like json to define the workflow.

    • @juliancyanid
      @juliancyanid 2 ปีที่แล้ว +1

      Mt 2 cents:
      In this example, probably the _business_ of 'ordering' depends on (business) capacities of payment and warehouse. If your business had a "sales", maybe _that_ would talk to payment. Business workflows belongs to some business context, and is bound to change with the way business works. That's also why you don't want the flow-logic (orchestration) in some generic component.

    • @AntonioRonde
      @AntonioRonde 2 ปีที่แล้ว

      @@CodeOpinion Yes, we use the broker as a "hub and spoke" for messages. Except for the invocation of the workflow at 6:15, there the client directly invokes the workflow at a service and doesn't follow the route via the broker. My question is why not use the broker here? Would it make sense to also relay this first message/call that invokes the workflow via the broker?

  • @pelaoinfo
    @pelaoinfo ปีที่แล้ว

    I'm studying this in deep since I'm modeling a whole system, but I can't still understand whether Orchestration is async (messages queues) or sync (req/res approach), most of the info found tells Orchestration is sync req/res.

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      I'd say more often then not it's async because of resilience and failures. Check out th-cam.com/video/LMKVzguhFw4/w-d-xo.html

  • @lilivier
    @lilivier 9 หลายเดือนก่อน

    But what if after creating your order in database you never receive a message back from the next service ? You have a 'ghost' order still commited in the database. How do you handle that ?

    • @JosueIbarraNinja
      @JosueIbarraNinja 4 หลายเดือนก่อน

      You generally have a timeout and retry limits set as parameters when starting your workflow or calling your next activity.
      In Cadence, the cadence service (broker) ensures to follow up on the pending workflow. Even if the codebase changes, you can replay the workflow in its entirety

  • @vicradon
    @vicradon 2 ปีที่แล้ว

    So an example of such workflow orchestration tool will be Airflow right?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      Yes as well as other tools like Temporal or messaging libraries that provide a stateful saga.

  • @andreashe36
    @andreashe36 2 ปีที่แล้ว

    Can you make a video, how to aggregate if models from a different domain is needed? Eg if I need summary sells of a product as a projection. But product and order may be processed in different domains and different event stores. How does the order get info about product details?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Ya good suggestion.

    • @mrxscheia
      @mrxscheia 2 ปีที่แล้ว

      There’s a great video on this topic from Mauro Servienti th-cam.com/video/hev65ozmYPI/w-d-xo.html

  • @morgadoapi4431
    @morgadoapi4431 2 ปีที่แล้ว +1

    Thanks for the video

  • @ThugLifeModafocah
    @ThugLifeModafocah 2 ปีที่แล้ว

    But what happens when the broker is broken?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      Use an outbox: th-cam.com/video/u8fOnxAxKHk/w-d-xo.html

  • @sangmin7648
    @sangmin7648 2 ปีที่แล้ว

    The comparison always seems to be between synchronous api call and asynchronous messaging. But how about asynchronous api call? Could there be any advantage of using async api call over messaging (other than ease of infra management maybe)?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Meaning something like gRPC's non-blocking async? It's still service to service direct calls. If you call a service and it's unavailable, or you call a service async non-blocking, and your own service fails and doesn't get the reply. Adding messaging that's durable in the mix removes services needing to be online.

    • @sangmin7648
      @sangmin7648 2 ปีที่แล้ว

      More like rest api call on separate thread. I was thinking about something like an outbox pattern; but instead of messaging event to the queue, use rest api call on separate thread. With this approach, other services doesn’t have to be online.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      Yes to a degree. The problem with service to service is if you do it, it can become viral. A calls B that calls C that calls D. At some point that becomes unreliable. Rather deliver a message to a broker and then you're done. I have a video coming out in a few weeks that illustrates where I think rpc and service to service is viable.

    • @sangmin7648
      @sangmin7648 2 ปีที่แล้ว

      Thanks for the reply. I’ll think about how separate thread rest api call can(or cannot) work in the situation you mentioned. I was wondering about this because my company’s going the separate thread route. Having watched your videos, it felt awkward, but as a clueless junior couldn’t make an argument against it. Thanks again for the great video as always

    • @evilroxxx
      @evilroxxx ปีที่แล้ว

      My application currently works exactly as you described. 1 api asynchronously awaits for another apis response and then once received continues the rest of its own execution. However this is called temporal coupling and must be avoided. Your 1st api cannot proceed with the success flow or error flow until it receives a response and throws an exception if it times out. So it’s not exactly an asynchronous execution if it’s waiting for something to finish. Hence this complex event driven architecture is favorable because not only can you control the execution flow but also you can pick up where you left off in case if the consumer of a message is unavailable momentarily and then comes back alive after a short period or something.

  • @selvakumars6487
    @selvakumars6487 2 ปีที่แล้ว +1

    Hi Derek, Is this not chroreography as there is no master process co-ordinating the flow like orchestration ?

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      This is orchestration as there is something controlling the workflow. I'm using a combination of sending commands and consuming events (or replies) in the orchestration. Choreography would be each service consuming events and publish events without any central knowledge of the workflow.

    • @chengchen9032
      @chengchen9032 2 ปีที่แล้ว

      ​@@CodeOpinion I have the same question for the video. because I was assuming there would be a master process with central knowledge of the workflow which would use that knowledge to pick up events from the MQ, convert those events to specific commands and then put it back to the MQ. so the chain would goes like
      service A -> Event -> MQ -> Orchestrator -> Command -> MQ -> service B.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว +1

      @@chengchen9032 Exactly. There's different tools/libraries/frameworks that can help you orchestrate. In a bunch of my videos I use NServiceBus which is a .NET Library, but you can also use other tooling such as Termporal (temporal.io) that has SDKs for bunch of languages/platforms.

    • @selvakumars6487
      @selvakumars6487 2 ปีที่แล้ว

      I got it, ordering service (the one with the workflow label) owning the workflow and orchestrates it. I assume, If there is any failure, the same flow expected to issue compensation for all the involved parties.

    • @CodeOpinion
      @CodeOpinion  2 ปีที่แล้ว

      Correct

  • @thedacian123
    @thedacian123 ปีที่แล้ว

    As far as i understood the orchestartor is a pHYSIcALL app from a given boundary, which is dumb cannot contain business logic.Are not you going to run into single point of failure issue,when this app were to failure?Thank you!

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      Generally the logic is around consuming events and sending commands. Generally no I/O or db access. If the service owning the orchestration is down, then yes it won't execute until it's back up

  • @evilroxxx
    @evilroxxx ปีที่แล้ว

    What is the definition of a service here? Ordering, payment, warehouse are all single api endpoints? Are they console applications running processes in the background?
    If they’re apis then how to they pick up events generated and process them?
    If they’re background processes how’s the client going to communicate with them?
    Let’s say Ordering service is going to communicate with products and/or pricing and/or coupons or inventory services. Is it going to send messages on the broker and wait for a response or an event from them and will that event contain the required data or will it need to do something else to get the data to prepare the order?
    How’s all this going to work to handle scale of say millions of orders daily? Won’t that overwhelm the broker? What if the queues get so full that they cannot accept any more commands? What’s going to happen to the data then?
    Can you please shed some light on these points?
    As always I’m a super fan your videos Derek. Thanks so much for enhancing the community’s knowledge🎉

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว

      My definition of a service is "the authority of a set of business capabilities." They could be a combination of HTTP APIs as well as separate processes that are consuming messages off a queue/broker. A lot of your questions I actually cover in a ton of videos. You're about to go down a rabbit hole :) Here are some that might be helpful.
      re: scaling message processing - th-cam.com/video/xv6Ljbq6me8/w-d-xo.html
      re: communicating to client from background processing: th-cam.com/video/Tu1GEIhkIqU/w-d-xo.html

  • @marcelbricman
    @marcelbricman 11 วันที่ผ่านมา

    your line of argument is that its impossible to roll back in a distributed way and then magically with messages it works - calling BS: the problem is still exactly the same. dont get me wrong, temporal decoupling is good, also the centralised workflow engine can be a great benefit, but your argument dpes not illustrate the point of all that. if every service can perform rollbacks asynchronously, there are plenty onther ways to resolve this problem

  •  ปีที่แล้ว +1

    You forgot idempotence.

    • @CodeOpinion
      @CodeOpinion  ปีที่แล้ว +2

      Messaging is a broad topic. I've got videos at many that focus specifically on various aspects, one of them being idempotence.

  • @haskell3702
    @haskell3702 ปีที่แล้ว

    In this case the Ordering will be the orchestrator? Does this mean that Ordering has to know Payment and Warehouse (which does not look good)? Or the orchestrator should be an independent microservice of these 3?

    • @maximfateev2369
      @maximfateev2369 ปีที่แล้ว

      It depends on the orchestrator's implementation. It can be separate, or each of the services can own its own orchestration and execute its own operations and compensations.

  • @marcelbricman
    @marcelbricman 11 วันที่ผ่านมา

    your line of argument is that its impossible to roll back in a distributed way and then magically with messages it works - calling BS: the problem is still exactly the same. dont get me wrong, temporal decoupling is good, also the centralised workflow engine can be a great benefit, but your argument dpes not illustrate the point of all that. if every service can perform rollbacks asynchronously, there are plenty onther ways to resolve this problem