We were literally just having this conversation with another team within our company. We have separate, distributed "feature slices" with their own data. And their product is a monolithic code base that suffers from all of issues that we expect. I had to field lots of "but what if" questions. I shall be forwarding this video to them as a much better explanation! Thanks!
An interesting alternative that I love is using the Saga orchestration pattern (not the choregraphy one) to handle a situation like your ordering and payment example. You can still use eventing as a way of communication between your bounded contextes, but it helps a lot the tracking/monitoring of a full business use case. And naturally, it helps the troubleshooting and data/process repair if required when a business flow halt at the half for example. Big contrast compared to just multiple systems raising events and cascading all effects everywhere, too easy to lost track of everything.
Agreed. I compare and contrast both choreography and orchestration using a code example here: th-cam.com/video/rO9BXsl4AMQ/w-d-xo.html I also have a few more recent videos talking about both explicitly.
Awesome video. This is the best explanation of database in microservice architecture. Love the different patterns and examples especially BFF and order-payment scenario. Learn so much in this short video. Thank you!
We are currently in the process of decomposing our monolith. We have individual services but have one database. In order to act like we have separate databases we have separate schemas per service. Every service has shared code for accessing the authorization databases. Eventually when we actually split apart the database the this will change but for the time being is a nice solution.
I really like this approach, but one thing I still have doubts over is during the last part - I'm genuinely asking as I'm unclear the best way to do this, any advice appreciated. In the checkout flow described at around 10:00, the client has to pass order information to the order service, but also has to pass some of that information to the payment service, which puts the responsibility of that extra step on to the client and thus the developers. You say the order service can pass the ID to the payment service to mark the order as ready to have payment collected, but rather than passing the ID, wouldn't it be better to pass the order summary information, so that when the client then calls the payment service it already has that locally? That way the client has to call the order service once in a normal flow, and then the payment service, but doesn't need to have multiple calls to each - easier for everyone? Have I missed something? Advice appreciated, and thanks for all your awesome videos!
You have options such as API gateway to do the routing, or my preferred method is to use hypermedia to guide the client, which then the URIs become opaque. Check out : th-cam.com/video/OcWa0WJBF2U/w-d-xo.html
Thanks, we had similar situation, so we created boundary in our db, schema for each microservice. it can read but not able to write to schema where it doesnt own .
On the command side, if the client needs to save data to both boundaries for example on a single button click, how would you handle data inconsistency if the request to another boundary fails? Handling transactions spanning multiple http requests in the ui or creating compensating action in the backend in this case seem like a pain..
You didn't describe the scenario where: i) The Schema and DB Write Access is limited to a single service; and, ii) Other services are given read access through VIEWS, enabling JOINs and orchestration in the RDBMS.
@@CodeOpinion While my own intuitions would advise against this as well, this is an extremely unsatisfying answer. Yes, why indeed not? There might be good reasons, but none has been presented here. We've already accepted that sharing infrastructure is fine. So e.g. potential resource contention is ignored. The point regarding data ownership is also moot. How is a view (materialized or not) constructed in this way purely for integration purposes conceptually different, say, from a schema designed for an API on the service layer? A different access protocol obviously, but what else? In my view this is also similar to an outbox pattern in an event-driven architecture. You still own your internal database schema (data on the inside), the view is just a public interface to that schema which you can evolve separately (data on the outside). The view can even live in a schema of its own. Furthermore, the monolithic nature of a gateway / BFF was also brushed aside.
How you persist data is an implemention detail of your logical boundary. Tying consumers to it then limits everything tied to it. Sharing infrastructure is very different than exposing internals. Now if you want to expose a DB that's a projection that's explicitly for consumers, then that's an option. Versioning might be more of a nightmare but that's a different story. Regarding outbox pattern, that's about consistency. I think your more referring to CDC. Lastly, about a bff, that's not glossed over in the grand scheme of my channel. Unfortunately I can't talk about everything related in a single video, or internationally gloss over details.
@3:00 I'm currently building a browser-based game using a micro-service architecture with a shared database. I don't really see the issue, because each service is doing something different, right? I have a service that handles anything related to accounts and authentication, I have a service that handles everything related to the concept of the player's "character". I have a service for in-game items and combat. When it comes to the database there isn't much crossover. Even though it's a single database, the responsibilities of each service are clearly defined and so that service will always be the one to make changes to the database as they relate to each service. Service B will never try to update an account's email address. Service A will never try to alter a character's level. Why would anyone ever build services that each attempt to alter the same data in a shared database? The point of microservices is to say "This service does this, specifically"
It's a good question on why people would do it, but they do. I'd argue there are more large systems built in that way where there are 100's to 1000's of tables and it's a free for all.
The BFF approach is the way to go when composition is needed, but in some cases we need to show data in a grid or some other format that is heavily dependant on how the data is structured, thus impacting performance. In that case I tend to aggregate (by listening to events) in the BFF service itself, there we can optimize indexes, cache data and so on. What do you think about that?
Yes replicating data via events is the way to go. A service should be able to function on its own and have all data it needs to do this. If you need ALL data then your boundaries are incorrect.
There isn't a once size fits all approach. Usually a combination depending on needs. The statement above: "A service should be able to function on its own and have all data it needs to do this." I think is valid on the command side, not so much the query side. Hence needing composition.
Event driven approach is fairly acceptable. However, there are situations where it can pose challenges 1. Any event published as side effect would require the recipient service to manage the state of affected entity - inmemory or using some other persistence (additional cost) 2. If the recipient service just restarts for any reasons, the state built by the recipient service is either lost fully or become inconsistent. If it were to rebuild the state from persistent queue of events, then, the startup of recipient service needs to synchronize state-rebuilding and readiness to accept it's own requests from external clients. 3. Also, events should carry enough context (setting aside, schema of event changing across releases), so that recipient service is not constrained to do some lookup elsewhere to build internal state So, it's never a silver-bullet always. Event based data transfer can shine in usecases where events are relatively less frequent and the footprint of state, and mutation of state maintained by recipient service is minimal. For example, if recipient manages a map of ip ranges for a given geography, it seldom changes, and updating such map via events would be more practical whereas if the map is a realtime stock price of a given stock code, it would be a big issue
Thank you ! Great content and super clear explanations ! Really made it click! One question I have is regarding the relationship between schemas belonging to different services. Should the relationships like on to one or one to many still be used across services’ schema to use cascading or other handy features ?
Hi thanks for great content, at 4:15 what do you do to prevent circular dependencies? when a service need to call another service? I can imagine a case where a service needs data from another service, and that service also have a service that calls the first service...
Really nice video .. Thanks for sharing really like the way you have explained. Wondering what would be disadvantages of having single physical instance and. different schema before we move on to different physical and different schema for each micro service ?
One disadvantage is your can have one service be performing queries that are consuming all the resources of the DB instance (eg, CPU), then causing the other services to be degraded because those queries are slow or timeout.
why don't you use aggregator pattern for the same? You pull data from each service individually and using composite microservice, club the data how the user wants and sends it back. Since it is business data, API gateway won't be of right implementation than aggregator pattern using composite microservice
Curious about your example. If instead you had two payment type options with 2 services which have have shared core fields like amount, account, etc. but different “specialized” fields associated to them like Zelle wants ‘email’ and credit card wants ‘ccv’… would you just agree on a canonical model for the core values and deploy them separately still?
I wouldn't "shared core fields like amount, account, etc.". Each service should own specific pieces of data that related to the underlying capabilities it provides.
Great video, as always! Question, if I'm using CQRS and my service only has permission (database grant) to execute queries on the shared tables/databases, would it still be a problem?
I'm not quite sure what you're asking. CQRS suggests that your services _don't_ share tables, ie you've created data representations for different services so it isn't clear to me why shared tables are then mentioned. So I'll answer generally. Allowing service B to query a schema owned by service A is going to cause problems (unless the schemas are _very_ stable). A compromise I have considered (assuming relational DB) is for service A to expose _views_ to which service B is granted access. That allows A to evolve as long as the view can be mapped to the new representation of data. It isn't complete decoupling, ie isn't as elegant as CQRS-style solutions but might be suitable for some environments.
I don't view it as a service. I view a service as a collection of grouped functionality that has explicit data ownership behind it. BFF is often just a proxy as described for things like view composition, routing, etc.
hi, do you have any video about communication between domains in DDD pattern? I found something like "shared kernel" but it does not really answer my concern, can you have a video about this, thanks.
I tend to leverage events to communicate between boundaries. I have a bunch of videos on them but as a primer related to Events: th-cam.com/video/qKD2YUTJAXM/w-d-xo.html
So if I suddenly 2 DBs in a single microservices... should I then split the microservice into 2 microservices? A lot of people using graphql to suggest managing those boundaries. Are there other things for help with work boundaries?
Not sure why you'd split it? If you have two DBs within a boundary, I'd assume the reason is because they are for different use-cases. Eg, you might be using an event store for writes and a document or relational for reads. Or perhaps you have a document store for some data that fits more into that model for both reads/writes, and a relational for another subset of data. If the capabilities provided by the service are cohesive and the use both databases, then so be it.
@@CodeOpinion so it sounds like manageability trumps whether you have to cross a boundary or not (where splitting boundary decisions are extremely valueable). Could this be the same for shared DBs? If so it comes back to the 2nd part of that question... What tools are good for helping to determine value in boundaries? Graphql doesn't seem to sound like a catch-all
Suppose I have a service that exposes an API which allows me to make some state changes. The data is then used by another service (a "worker") that acts upon that data to do something when it receives an event from elsewhere. The API service and worker service are deployed separately purely for scaling reasons - say I want to have just 1 instance of the API and 10 instances of the worker. Do you think this is a valid use case for having two services access the same schema and database?
Great video, but I don't think we should dismiss updating all of the microservices at the same time. In that case, ops takes ownership of the data. I think it's a valid approach in some situations.
If you have to change/deploy services together, you have a distributed monolith. I'd argue that's worst on many levels (observability, deployment, versioning). What would be a valid use case for it?
@@CodeOpinion For one thing, that's just where we are in our product's lifecycle. Maybe it's not ideal, but it works well for our needs, and we don't seem to have problems with observability, deployment, or versioning. Maybe we'll never know how much we're actually suffering until we experience the benefits of a true microservice architecture. For the time being, this feels natural and sustainable, and we don't have to worry about the pain points that you made this video about.
The problems will probably become apparent once you have lots of teams all working on their own services, and each needing to deploy at different rates, where one team may have to wait for another team to deploy
"Update" it how? Change it? Then change it. Move data for the logical boundary you want to move. Leave the rest intact
2 ปีที่แล้ว
@@CodeOpinion Update the version, or something where you need turn it off then on. I'd be better handled by having 1 db per ms, only the ms would be affected.
As long as you're sharing data you will always have a shared database. What are the alternatives to sharing tables in a Postgres instance, for example? You can ask a microservice synchronously for the data it owns via RPC, as shown in your first example. This is practically the same as reading from the Postgres server directly: you have to make the same kind of synchronous request to read from a database and I would argue that calling a mature database product is actually much more reliable than some homegrown microservice. If you use RPC for data integration, then the callee microservice will become your unreliable database server. The other option is to use some kind of event-driven paradigm as outlined in your second example. Whether you use Kafka, EventStoreDB or whatever, they are all databases too. Kafka is a log-structured database with limited querying capabilities, but it is still a database. They say that Kafka is asynchronous, but that is also misleading: when you want to read an event from the Kafka log you still have to make a TCP connection to the Kafka cluster to retrieve your event and thus you are synchronously coupled to the uptime of your Kafka cluster (i.e. database server) just the same as with postgres. We have established that both shared relational databases and event-carried state transfer over a message bus are integration through a database system. So what is the real difference? The key point is that Kafka's limited querying capabilities _force_ you to implement some kind of CQRS where readers have a materialized view of the event stream. This does two important things: Firstly, it decouples the reader from the availability of the Kafka cluster (i.e. database server) since the reader can continue to read from its materialized view/cache even when the Kafka cluster is down. Secondly, the writer who owns the data also needs a materialized view if he wants to read it. This naturally leads to decoupling of the writer's internal data model and the schema of the integration events; the event schema can evolve independently of the writer's business requirements, so you do not have the problem of needing to redeploy all readers at the same time when a field/column is renamed. In conclusion, the advantages of event-driven state transfer are more accidental rather than a fundamental problem with integration through relational databases. You can achieve almost the same result on top of Postgres: Manage access rights to tables so each microservice can only write the tables it owns; Carefully segregate the writer's own internal data model from the schema of the shared "integration table"; Use read replicas with materialized views of the "integration table" for each reader. Under the hood, changes to the "integration table" are written as change events to the Postgres write-ahead log and the change events are then propagated to read replicas. We can see that the write-ahead log and asynchronous replication even work the same as event-carried state transfer via Kafka.
Making an RPC call is not the same as making a call to the database directly. With a database, generally you can transactions with the correct isolation level (serializable) to prevent dirty reads. This gives you consistency. With an RPC call, you have no distributed transaction. The moment you make an RPC call and get data back, that data is potentially stale. To your conclusion, I do think that is a solution if you want to use the database as an integration point, so as long as it's clearly defined who owns what (as you mentioned).
@@CodeOpinion With "RPC" your microservice basically acts like a database, not necessarily a good database, or you could even use the JDBC protocol and simply forward every request to Postgres. That would still get you the same transaction guarantees. My point is that "integration through database is bad" is not a very valuable rule because the spectrum of databases is very broad (comprising many different transaction, querying and distribution capabilities) and even the supposed alternatives are still databases in some sense. The important thing is to understand what are the issues with the typical ball-of-mud shared-everything relational databases and how does event-carried state transfer avoid those problems. If you understand that, then you can also use a shared Postgres or Redis database successfully.
Sure, just never confused logical and physical boundaries or acknowledging they are different things. I can't imagine how doing so would end up poorly. Oh wait... our current trend of (micro)service-to-service RPC all in the name of not realizing logical and physical boundaries aren't the same thing.
So let me try to paraphrase to understand correctly. If it's an aggregated view, use a gateway / BFF. If it's a command, let the client make different command calls to the different services. And in between services, they communicate with each other via broker where suitable. In your use case, it's 1. client to ordering: hey i want this order. ordering returns order_id 2. client to payment: hey i want to pay for this order. Here's the order id. payment returns payment_id meant for that order_id 3. client to ordering: hey this is the payment_id for this order order_id please mark order as complete. ordering marks order complete with the payment_id. ordering returns confirmation to client, order is complete. 4. ordering to payment via broker: after step 3, ordering tells payment, i have marked this order as complete with this payment_id. Here's the payment_id and order_id 5. payment after notified by ordering via broker: okay since ordering marked this order_id as complete with this payment_id, i am going to charge the credit card. Did I get this correct? Updated after Derek's reply abt shared id. Refactored from the above and taking his reply into account, the easiest way is to treat the ordering as the shared id 1. client to ordering: hey i want this order. ordering returns order_id (no change) 2. client to payment: hey i want to pay for this order. Here's the order id. payment may create its own payment_id for its own purposes. payment just acknowledged to client, order received. 3. client to ordering: hey payment ack the order. please mark order as complete. ordering marks order complete. ordering returns confirmation to client, order is complete. 4. ordering to payment via broker: after step 3, ordering tells payment, i have marked this order as complete. Here's the order_id. 5. payment after notified by ordering via broker: okay since ordering marked this order_id as complete, i am going to charge the credit card. Did I get this correct?
Close. With #3, client isn't telling ordering what the payment_id is. Both ordering and payment would be using the same ID or some way to correlate to the two together. So when ordering publishes an event (or cmd, depending your workflow) that payments picks up, it knows based on the ID in the event, what payment information belongs to that order. So then in #4, ordering isn't telling payments anything than a shared ID.
id for userA --> 1 id for userA's shopping cart -> 2 id of userA's intent -> 1 xor 2 = 3 (published as an event from ordering to payment) Now, if you do 2 xor 3, you get 1. Or, if you get 1 xor 3, you get to 2. Given the e-commerce context, imagine having to implement protection against a scenario where, in a flash sale, you programmatically try to trick the system into making double purchases when the intended business requirement was to restrict 1 item per user.
@@avineshwar Thank you Apologies for my skepticism, but i find this a bit hard to believe. Are the ids for shopping cart (which i assume is equivalent to order id in my paraphase) and user still autogenerated by database using either bigintegers or UUIDs? if not, then what are they? And if they are either big integers or UUIDs what is the XOR in this case? if not, then what's the XOR function? Can Derek shed some light on whether the shared id in his context is exactly the same concept as illustrated by Avineshwar here?
@@kimstacks Derek can reply as he feels fit. Finally, think about it like this: if upon measuring your performance statistics say that certain kinds of changes must be made, then it is those changes that we are representing here as "XOR". It is extremely simplified and made pure if you may, of course. The idea also is, if all you must (or rather can at some huge scale) do (due to perf. and cost), is prefer communicating via messages, without introducing side-effect type coupling between services, what would that integration/interaction via an event (and by extension, the business logic on some destination service) look like? This is, as it goes without saying, one way of implementing an overall system where comprising services can interact with each other without the need for any external triggers, interactions, and whatnot. So, another way of representing the XOR proposal could be: can a micro-service self-deduce based on some input (event) the arguments (e.g. parameter values of a function) it requires to perform whatever it is that that micro-service must do. ^ If it could indeed self-deduce, then it doesn't have to wait for anyone to do its thing. As always, if there is a devil, and if it is in the details, then implementing a micro-service with such self-deducing capability also means that it is likely going to be more useful only if it can be approached in a way where such an idea can be generalized among micro-services (everyone will be able to self-deduce whatever they want!). So, for a very heavily used e-commerce (e.g. Amazon) this may be more useful than for someone else. And, with all that, we are definitely a bit off-topic.
We were literally just having this conversation with another team within our company. We have separate, distributed "feature slices" with their own data. And their product is a monolithic code base that suffers from all of issues that we expect. I had to field lots of "but what if" questions. I shall be forwarding this video to them as a much better explanation! Thanks!
Glad it helped! Thanks for sharing!
An interesting alternative that I love is using the Saga orchestration pattern (not the choregraphy one) to handle a situation like your ordering and payment example. You can still use eventing as a way of communication between your bounded contextes, but it helps a lot the tracking/monitoring of a full business use case. And naturally, it helps the troubleshooting and data/process repair if required when a business flow halt at the half for example. Big contrast compared to just multiple systems raising events and cascading all effects everywhere, too easy to lost track of everything.
Agreed. I compare and contrast both choreography and orchestration using a code example here: th-cam.com/video/rO9BXsl4AMQ/w-d-xo.html
I also have a few more recent videos talking about both explicitly.
Awesome video. This is the best explanation of database in microservice architecture. Love the different patterns and examples especially BFF and order-payment scenario. Learn so much in this short video. Thank you!
Glad it was helpful!
We are currently in the process of decomposing our monolith. We have individual services but have one database. In order to act like we have separate databases we have separate schemas per service. Every service has shared code for accessing the authorization databases. Eventually when we actually split apart the database the this will change but for the time being is a nice solution.
I really like this approach, but one thing I still have doubts over is during the last part - I'm genuinely asking as I'm unclear the best way to do this, any advice appreciated.
In the checkout flow described at around 10:00, the client has to pass order information to the order service, but also has to pass some of that information to the payment service, which puts the responsibility of that extra step on to the client and thus the developers.
You say the order service can pass the ID to the payment service to mark the order as ready to have payment collected, but rather than passing the ID, wouldn't it be better to pass the order summary information, so that when the client then calls the payment service it already has that locally?
That way the client has to call the order service once in a normal flow, and then the payment service, but doesn't need to have multiple calls to each - easier for everyone? Have I missed something? Advice appreciated, and thanks for all your awesome videos!
You have options such as API gateway to do the routing, or my preferred method is to use hypermedia to guide the client, which then the URIs become opaque. Check out : th-cam.com/video/OcWa0WJBF2U/w-d-xo.html
Thanks, we had similar situation, so we created boundary in our db, schema for each microservice. it can read but not able to write to schema where it doesnt own .
On the command side, if the client needs to save data to both boundaries for example on a single button click, how would you handle data inconsistency if the request to another boundary fails? Handling transactions spanning multiple http requests in the ui or creating compensating action in the backend in this case seem like a pain..
Something like a reservation. Check out this video: th-cam.com/video/PZm0RQGcs38/w-d-xo.html
@@CodeOpinion I need to watch that again, thx :)
You didn't describe the scenario where: i) The Schema and DB Write Access is limited to a single service; and, ii) Other services are given read access through VIEWS, enabling JOINs and orchestration in the RDBMS.
Wouldn't recommend integration via the DB.
@@CodeOpinion why?
@@ugoryny4 it's all about context and data ownership. You want to ensure each service boundary only access its own schema
@@CodeOpinion While my own intuitions would advise against this as well, this is an extremely unsatisfying answer. Yes, why indeed not? There might be good reasons, but none has been presented here.
We've already accepted that sharing infrastructure is fine. So e.g. potential resource contention is ignored.
The point regarding data ownership is also moot. How is a view (materialized or not) constructed in this way purely for integration purposes conceptually different, say, from a schema designed for an API on the service layer? A different access protocol obviously, but what else?
In my view this is also similar to an outbox pattern in an event-driven architecture. You still own your internal database schema (data on the inside), the view is just a public interface to that schema which you can evolve separately (data on the outside). The view can even live in a schema of its own.
Furthermore, the monolithic nature of a gateway / BFF was also brushed aside.
How you persist data is an implemention detail of your logical boundary. Tying consumers to it then limits everything tied to it. Sharing infrastructure is very different than exposing internals. Now if you want to expose a DB that's a projection that's explicitly for consumers, then that's an option. Versioning might be more of a nightmare but that's a different story. Regarding outbox pattern, that's about consistency. I think your more referring to CDC. Lastly, about a bff, that's not glossed over in the grand scheme of my channel. Unfortunately I can't talk about everything related in a single video, or internationally gloss over details.
@3:00 I'm currently building a browser-based game using a micro-service architecture with a shared database. I don't really see the issue, because each service is doing something different, right? I have a service that handles anything related to accounts and authentication, I have a service that handles everything related to the concept of the player's "character". I have a service for in-game items and combat. When it comes to the database there isn't much crossover. Even though it's a single database, the responsibilities of each service are clearly defined and so that service will always be the one to make changes to the database as they relate to each service. Service B will never try to update an account's email address. Service A will never try to alter a character's level.
Why would anyone ever build services that each attempt to alter the same data in a shared database? The point of microservices is to say "This service does this, specifically"
It's a good question on why people would do it, but they do. I'd argue there are more large systems built in that way where there are 100's to 1000's of tables and it's a free for all.
I work in such company. Data ownership non existent. Then management wondering why so many bugs in the app, and hard to extend
The BFF approach is the way to go when composition is needed, but in some cases we need to show data in a grid or some other format that is heavily dependant on how the data is structured, thus impacting performance. In that case I tend to aggregate (by listening to events) in the BFF service itself, there we can optimize indexes, cache data and so on. What do you think about that?
Yes replicating data via events is the way to go. A service should be able to function on its own and have all data it needs to do this. If you need ALL data then your boundaries are incorrect.
There isn't a once size fits all approach. Usually a combination depending on needs. The statement above: "A service should be able to function on its own and have all data it needs to do this." I think is valid on the command side, not so much the query side. Hence needing composition.
Event driven approach is fairly acceptable. However, there are situations where it can pose challenges
1. Any event published as side effect would require the recipient service to manage the state of affected entity - inmemory or using some other persistence (additional cost)
2. If the recipient service just restarts for any reasons, the state built by the recipient service is either lost fully or become inconsistent. If it were to rebuild the state from persistent queue of events, then, the startup of recipient service needs to synchronize state-rebuilding and readiness to accept it's own requests from external clients.
3. Also, events should carry enough context (setting aside, schema of event changing across releases), so that recipient service is not constrained to do some lookup elsewhere to build internal state
So, it's never a silver-bullet always. Event based data transfer can shine in usecases where events are relatively less frequent and the footprint of state, and mutation of state maintained by recipient service is minimal. For example, if recipient manages a map of ip ranges for a given geography, it seldom changes, and updating such map via events would be more practical whereas if the map is a realtime stock price of a given stock code, it would be a big issue
Thank you ! Great content and super clear explanations ! Really made it click!
One question I have is regarding the relationship between schemas belonging to different services.
Should the relationships like on to one or one to many still be used across services’ schema to use cascading or other handy features ?
One to one. A service owns it's schema and data in that schema.
Hi thanks for great content, at 4:15 what do you do to prevent circular dependencies? when a service need to call another service? I can imagine a case where a service needs data from another service, and that service also have a service that calls the first service...
This is exactly why you don't want service to service direct communication
Really nice video .. Thanks for sharing really like the way you have explained. Wondering what would be disadvantages of having single physical instance and. different schema before we move on to different physical and different schema for each micro service ?
One disadvantage is your can have one service be performing queries that are consuming all the resources of the DB instance (eg, CPU), then causing the other services to be degraded because those queries are slow or timeout.
why don't you use aggregator pattern for the same? You pull data from each service individually and using composite microservice, club the data how the user wants and sends it back. Since it is business data, API gateway won't be of right implementation than aggregator pattern using composite microservice
Curious about your example. If instead you had two payment type options with 2 services which have have shared core fields like amount, account, etc. but different “specialized” fields associated to them like Zelle wants ‘email’ and credit card wants ‘ccv’… would you just agree on a canonical model for the core values and deploy them separately still?
Or would you make a composite operation that uses a single service for core values
I wouldn't "shared core fields like amount, account, etc.". Each service should own specific pieces of data that related to the underlying capabilities it provides.
Great video, as always!
Question, if I'm using CQRS and my service only has permission (database grant) to execute queries on the shared tables/databases, would it still be a problem?
For schema/data it owns yes, not for schema/data it doesn't.
I'm not quite sure what you're asking. CQRS suggests that your services _don't_ share tables, ie you've created data representations for different services so it isn't clear to me why shared tables are then mentioned. So I'll answer generally.
Allowing service B to query a schema owned by service A is going to cause problems (unless the schemas are _very_ stable).
A compromise I have considered (assuming relational DB) is for service A to expose _views_ to which service B is granted access. That allows A to evolve as long as the view can be mapped to the new representation of data. It isn't complete decoupling, ie isn't as elegant as CQRS-style solutions but might be suitable for some environments.
The communication between bff and services, what you would use? At the end of the day the bff is still a service I think isn't?
I don't view it as a service. I view a service as a collection of grouped functionality that has explicit data ownership behind it. BFF is often just a proxy as described for things like view composition, routing, etc.
@@CodeOpinion got it. But the communication between them is done by http requests too?
Likely something synchronous request/response like http or grpc
hi, do you have any video about communication between domains in DDD pattern? I found something like "shared kernel" but it does not really answer my concern, can you have a video about this, thanks.
I tend to leverage events to communicate between boundaries. I have a bunch of videos on them but as a primer related to Events: th-cam.com/video/qKD2YUTJAXM/w-d-xo.html
So if I suddenly 2 DBs in a single microservices... should I then split the microservice into 2 microservices?
A lot of people using graphql to suggest managing those boundaries. Are there other things for help with work boundaries?
Not sure why you'd split it? If you have two DBs within a boundary, I'd assume the reason is because they are for different use-cases. Eg, you might be using an event store for writes and a document or relational for reads. Or perhaps you have a document store for some data that fits more into that model for both reads/writes, and a relational for another subset of data. If the capabilities provided by the service are cohesive and the use both databases, then so be it.
@@CodeOpinion so it sounds like manageability trumps whether you have to cross a boundary or not (where splitting boundary decisions are extremely valueable).
Could this be the same for shared DBs? If so it comes back to the 2nd part of that question... What tools are good for helping to determine value in boundaries? Graphql doesn't seem to sound like a catch-all
@@baseman00 Not sure I totally understand the question. GraphQL used as a tool for composing data from multiple boundaries?
@@CodeOpinion what part of the question would you like clarification around?
Not sure now graphql has to with determining boundaries
Suppose I have a service that exposes an API which allows me to make some state changes. The data is then used by another service (a "worker") that acts upon that data to do something when it receives an event from elsewhere. The API service and worker service are deployed separately purely for scaling reasons - say I want to have just 1 instance of the API and 10 instances of the worker. Do you think this is a valid use case for having two services access the same schema and database?
If API and worker belong to the same domain, then yes, they should share the same schema
Exactly. @Ronul Don't confuse physical boundaries with logical boundaries. Check out this video: th-cam.com/video/Uc7SLJbKAGo/w-d-xo.html
Thanks both! Bettered my understanding today.
Great video, but I don't think we should dismiss updating all of the microservices at the same time. In that case, ops takes ownership of the data. I think it's a valid approach in some situations.
If you have to change/deploy services together, you have a distributed monolith. I'd argue that's worst on many levels (observability, deployment, versioning). What would be a valid use case for it?
@@CodeOpinion For one thing, that's just where we are in our product's lifecycle. Maybe it's not ideal, but it works well for our needs, and we don't seem to have problems with observability, deployment, or versioning. Maybe we'll never know how much we're actually suffering until we experience the benefits of a true microservice architecture. For the time being, this feels natural and sustainable, and we don't have to worry about the pain points that you made this video about.
@@CodeOpinion And by the way, your channel has the best architectural content on TH-cam, and I've come a long way recently in part because of it.
I think that's fair. However my concern would be as it evolves and how that could hold you back. Appreciate the comment!
The problems will probably become apparent once you have lots of teams all working on their own services, and each needing to deploy at different rates, where one team may have to wait for another team to deploy
What happens if you want to update your DB...
"Update" it how? Change it? Then change it. Move data for the logical boundary you want to move. Leave the rest intact
@@CodeOpinion Update the version, or something where you need turn it off then on. I'd be better handled by having 1 db per ms, only the ms would be affected.
@ That's the trade-off. Less infrastructure, less granularity.
Distributed Event Driven Baby, all the good stuff!
Great video, thanks!
Glad you liked it!
This is great!
"a large system" is subjective
As long as you're sharing data you will always have a shared database. What are the alternatives to sharing tables in a Postgres instance, for example? You can ask a microservice synchronously for the data it owns via RPC, as shown in your first example. This is practically the same as reading from the Postgres server directly: you have to make the same kind of synchronous request to read from a database and I would argue that calling a mature database product is actually much more reliable than some homegrown microservice. If you use RPC for data integration, then the callee microservice will become your unreliable database server. The other option is to use some kind of event-driven paradigm as outlined in your second example. Whether you use Kafka, EventStoreDB or whatever, they are all databases too. Kafka is a log-structured database with limited querying capabilities, but it is still a database. They say that Kafka is asynchronous, but that is also misleading: when you want to read an event from the Kafka log you still have to make a TCP connection to the Kafka cluster to retrieve your event and thus you are synchronously coupled to the uptime of your Kafka cluster (i.e. database server) just the same as with postgres.
We have established that both shared relational databases and event-carried state transfer over a message bus are integration through a database system. So what is the real difference? The key point is that Kafka's limited querying capabilities _force_ you to implement some kind of CQRS where readers have a materialized view of the event stream. This does two important things: Firstly, it decouples the reader from the availability of the Kafka cluster (i.e. database server) since the reader can continue to read from its materialized view/cache even when the Kafka cluster is down. Secondly, the writer who owns the data also needs a materialized view if he wants to read it. This naturally leads to decoupling of the writer's internal data model and the schema of the integration events; the event schema can evolve independently of the writer's business requirements, so you do not have the problem of needing to redeploy all readers at the same time when a field/column is renamed.
In conclusion, the advantages of event-driven state transfer are more accidental rather than a fundamental problem with integration through relational databases. You can achieve almost the same result on top of Postgres: Manage access rights to tables so each microservice can only write the tables it owns; Carefully segregate the writer's own internal data model from the schema of the shared "integration table"; Use read replicas with materialized views of the "integration table" for each reader. Under the hood, changes to the "integration table" are written as change events to the Postgres write-ahead log and the change events are then propagated to read replicas. We can see that the write-ahead log and asynchronous replication even work the same as event-carried state transfer via Kafka.
Making an RPC call is not the same as making a call to the database directly. With a database, generally you can transactions with the correct isolation level (serializable) to prevent dirty reads. This gives you consistency. With an RPC call, you have no distributed transaction. The moment you make an RPC call and get data back, that data is potentially stale.
To your conclusion, I do think that is a solution if you want to use the database as an integration point, so as long as it's clearly defined who owns what (as you mentioned).
@@CodeOpinion With "RPC" your microservice basically acts like a database, not necessarily a good database, or you could even use the JDBC protocol and simply forward every request to Postgres. That would still get you the same transaction guarantees. My point is that "integration through database is bad" is not a very valuable rule because the spectrum of databases is very broad (comprising many different transaction, querying and distribution capabilities) and even the supposed alternatives are still databases in some sense. The important thing is to understand what are the issues with the typical ball-of-mud shared-everything relational databases and how does event-carried state transfer avoid those problems. If you understand that, then you can also use a shared Postgres or Redis database successfully.
shared database is clear indication of bad design of service boundaries.
Well done for confusing a really simple issue with unnecessary ambiguity. The answer is really simple; just dont share databases - ever...
Sure, just never confused logical and physical boundaries or acknowledging they are different things. I can't imagine how doing so would end up poorly. Oh wait... our current trend of (micro)service-to-service RPC all in the name of not realizing logical and physical boundaries aren't the same thing.
So let me try to paraphrase to understand correctly. If it's an aggregated view, use a gateway / BFF. If it's a command, let the client make different command calls to the different services. And in between services, they communicate with each other via broker where suitable. In your use case, it's
1. client to ordering: hey i want this order. ordering returns order_id
2. client to payment: hey i want to pay for this order. Here's the order id. payment returns payment_id meant for that order_id
3. client to ordering: hey this is the payment_id for this order order_id please mark order as complete. ordering marks order complete with the payment_id. ordering returns confirmation to client, order is complete.
4. ordering to payment via broker: after step 3, ordering tells payment, i have marked this order as complete with this payment_id. Here's the payment_id and order_id
5. payment after notified by ordering via broker: okay since ordering marked this order_id as complete with this payment_id, i am going to charge the credit card.
Did I get this correct?
Updated after Derek's reply abt shared id.
Refactored from the above and taking his reply into account, the easiest way is to treat the ordering as the shared id
1. client to ordering: hey i want this order. ordering returns order_id (no change)
2. client to payment: hey i want to pay for this order. Here's the order id. payment may create its own payment_id for its own purposes. payment just acknowledged to client, order received.
3. client to ordering: hey payment ack the order. please mark order as complete. ordering marks order complete. ordering returns confirmation to client, order is complete.
4. ordering to payment via broker: after step 3, ordering tells payment, i have marked this order as complete. Here's the order_id.
5. payment after notified by ordering via broker: okay since ordering marked this order_id as complete, i am going to charge the credit card.
Did I get this correct?
Close. With #3, client isn't telling ordering what the payment_id is. Both ordering and payment would be using the same ID or some way to correlate to the two together. So when ordering publishes an event (or cmd, depending your workflow) that payments picks up, it knows based on the ID in the event, what payment information belongs to that order. So then in #4, ordering isn't telling payments anything than a shared ID.
Ok who generates the shared ID?
id for userA --> 1
id for userA's shopping cart -> 2
id of userA's intent -> 1 xor 2 = 3 (published as an event from ordering to payment)
Now, if you do 2 xor 3, you get 1. Or, if you get 1 xor 3, you get to 2.
Given the e-commerce context, imagine having to implement protection against a scenario where, in a flash sale, you programmatically try to trick the system into making double purchases when the intended business requirement was to restrict 1 item per user.
@@avineshwar
Thank you
Apologies for my skepticism, but i find this a bit hard to believe. Are the ids for shopping cart (which i assume is equivalent to order id in my paraphase) and user still autogenerated by database using either bigintegers or UUIDs?
if not, then what are they?
And if they are either big integers or UUIDs what is the XOR in this case?
if not, then what's the XOR function?
Can Derek shed some light on whether the shared id in his context is exactly the same concept as illustrated by Avineshwar here?
@@kimstacks
Derek can reply as he feels fit.
Finally, think about it like this: if upon measuring your performance statistics say that certain kinds of changes must be made, then it is those changes that we are representing here as "XOR". It is extremely simplified and made pure if you may, of course. The idea also is, if all you must (or rather can at some huge scale) do (due to perf. and cost), is prefer communicating via messages, without introducing side-effect type coupling between services, what would that integration/interaction via an event (and by extension, the business logic on some destination service) look like?
This is, as it goes without saying, one way of implementing an overall system where comprising services can interact with each other without the need for any external triggers, interactions, and whatnot.
So, another way of representing the XOR proposal could be: can a micro-service self-deduce based on some input (event) the arguments (e.g. parameter values of a function) it requires to perform whatever it is that that micro-service must do.
^ If it could indeed self-deduce, then it doesn't have to wait for anyone to do its thing.
As always, if there is a devil, and if it is in the details, then implementing a micro-service with such self-deducing capability also means that it is likely going to be more useful only if it can be approached in a way where such an idea can be generalized among micro-services (everyone will be able to self-deduce whatever they want!). So, for a very heavily used e-commerce (e.g. Amazon) this may be more useful than for someone else.
And, with all that, we are definitely a bit off-topic.