This was super-helpful for me José, thank you so much! This is by far the best video on TH-cam for repository pattern in Python. I've know the abstract pattern for some time but struggled to interpret it into a Python context. This totally did the trick for me, thanks again so much!
Hola Jose, thank you for your video. As a .NET developer currently working with Python, I feel uneasy when I see global dependencies, loosely typed parameters... I find your more professional demos to be particularly useful. Have a question: could you explain the purpose of the RepositoriesRegistry? I'm wondering if it would be possible to simply pass the BookingsRepository as a parameter instead of creating a RepositoriesRegistry. What is the benefit of using RepositoriesRegistry in this context?
Ok, this looks good for CRUD. What about beyond CRUD? How do we do joins with this repo pattern? Or a simple subquery? How can we group queries into one instead of hitting the DB several times? I mean at least you're still tightly coupled to SQLAlchemy and you're returning the sqlalchemy instances as is. I've seen some repo patterns where people serialized those into intermediate objects, creating even bigger problems down the line cause of the need to re-fetch that sqlalchemy object in different functions. I feel like everyone goes for repository patterns for simple projects which ends up creating way more problems than it solves. Also, I love the repository registry idea, but am not loving the fact that it's injected into the request object. I think it would we way cleaner as a dependency!😁
Thanks for your comment @RamiAwar! This is a great discussion. First of all 💯 repository isn't suitable in all situations - if it doesn't help it has no place in our codebase. Re joins and subqueries - I think this is query repository shines. You are simply encapsulating the complexity of those queries away from your business and other layers. The example in the video may give the wrong impression that repository is a 1-to-1 between classes and tables, but that's not how it works in practice. Domain models usually pull data from multiple tables and the repository should serve those needs. Ideally, repository doesn't return SQLAlchemy objects, but DTOs or something similar like you say. In the tutorial, the repo's add() method returns an instance of a plain Booking object (github.com/abunuwas/repository-pattern-tutorial/blob/master/data_access/repository.py#L24) and the list() method returns a list of dictionaries (github.com/abunuwas/repository-pattern-tutorial/blob/master/data_access/repository.py#L14). A plain and simple DTO would be a better choice. Personally, I only use the repository pattern when I want to enforce a clear separation between data access and other layers for testing and other purposes, and when queries are growing complex and I want to abstract them away from the business layer. Hope these comments help and thanks again for sharing your thoughts!
Really good video. I'm starting to learn about FastAPI. One question I have about your design is related to the dependency injection. Why are you injecting the dependency as part of the "app" object (create_server method) instead of adding them to the dependencies list offered when creating a new "FastAPI" and then using them somehow in the routes of the controller? Shouldn't the Repository be added only at the Router level for Bookings instead of at the level of the entire application? Let me know.
Hi Lucas thank you for your kind feedback and for your questions 🎉! This is a very good question about dependency injection in FastAPI, so let me analyze each option separately: 👉 Using 𝐠𝐥𝐨𝐛𝐚𝐥 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬 (fastapi.tiangolo.com/tutorial/dependencies/global-dependencies/): Unfortunately, global dependencies can't return a value, so we can't register the repositories as global dependencies and access them in the routes. There's currently a feature request open in FastAPI to make this possible (github.com/tiangolo/fastapi/issues/4246). It would be great if this gets done, because this would be the proper way of registering dependencies. 👉 Using 𝐫𝐨𝐮𝐭𝐞𝐫 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬: the APIRouter class has the same problem, so we can't register repositories as dependencies for a whole group of routes. 👉 Using 𝐫𝐨𝐮𝐭𝐞-𝐛𝐚𝐬𝐞𝐝 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬: finally, we could inject the repo directly on the routes, but then we would be importing a specific implementation of the repo in the controller, which means we tight couple the repo with the controller and we wouldn't be able to easily inject test repositories. To fully take advantage of dependency injection, we want to be able to inject our dependencies from an entry point which is fully under our control, which in this case is the create_server() function. Through that function, we control how FastAPI is initialized and which dependencies must be injected into it. Hope this makes sense! Let me know if you have more questions!
Hi @@microapis , thank you so much for your response and sorry about the delay of mine. As far as I've seen, there's is a parameter that you can specify at the Router or App level called "dependencies", which is a list. The problem of doing this is that in order to access a certain dependency from the route itself (the endpoint of the controller), you would need to know the order in which such dependency was injected, which to be honest it sucks. I don't understand why FastAPI does not have anything better for this situation. To me, registering everything at the application level and then having to know inside each controller endpoint how to access those dependencies doesn't make much sense either, it breaks the idea of having each controller receive only what it needs. I'd rather prefer to inject the repository at each route despite of the duplication of the code than having to register them at the application level. Why do you mention that if you inject the Repository at the controller's route level then you wouldn't be able to test it? Can't you just mock that same repository in the test and that's it? Let me know.
@@lucasalvarezlacasa2098 Hi Lucas thank you for your answer! In this case, it's all about tradeoffs. My solution isn't perfect, but it's a common use of dependency injection. The main benefit of my approach is it gives you control over which dependencies must be injected at load time, which is when you have most control over your app configuration. Notice that, although repositories and sessions are available to all routes, you still need to instantiate them, so you're not unnecessarily opening database sessions and so on. I like this approach because it allows you to set the application in its desired state from the moment you create it, and it avoids duplication. Injecting directly on the routes is doable, but it means you'll be importing the session maker from SQLAlchemy and the repositories directly in your router module. It means any test on any route within that module needs to mock those import paths if you want to keep the test isolated. The downside of this for me is that if I'm doing a bunch of tests that have nothing to do with the database, I still need to mock those imports. It kind of defeats the purpose of dependency injection. The trickiest part really is SQLAlchemy. You need to create the database connection and get the session factory somewhere. Ideally, that happens outside of your routes. So one compromise could be to set up SQLAlchemy in the server factory function ("create_server()"), and inject the repositories in the routes. As you say, and as many devs have requested, ideally we would be able to inject the dependencies at the app or router level and access them as parameters in the routes. Hopefully we'll see this feature available soon. As I say in the video description, my solution here is opinionated, and I'm sure it won't necessarily work well in all cases. Like all things in software, there's hardly a universally right way of doing things and it's all about use cases and the needs of your project.
Hi thank you for your question and sorry for my late reply! Bear in mind what we're injecting is the session factory, not the session itself. Do you have an example of the code you're struggling with?
When I try to use the context manager protocol, I get 'TypeError: 'AsyncSession' object does not support the context manager protocol' error, on the other hand, session = request.app.session_maker(), works as expected and I get result, but Postgres is complaining about pools. "Please ensure that SQLAlchemy pooled connections are returned to the pool explicitly, either by calling ``close()`` or by using appropriate context managers to manage their lifecycle." and await session.close(), does not seem to solve the problem.
What if I wanted to save a booking to the DB & also the filesystem, it seems because you bind the repo inside of the api route theres no choice to change it or have multiple?
Thanks for your question @yslx and apologies for the late reply! This is actually a great question. In this case, you'd create another repository for the file system, and list this repo in the registry together with the others. The only complication is when we initialise a repository, we typically pass the session. In a file system repository, a database session doesn't make sense - we'd probably pass a file name, or perhaps a buffer. So the business layer needs to distinguish between different types of repositories and know which type of argument to pass, or preferably we encapsulate this knowledge within the registry itself. What is important is ensuring that we either commit or roll back all the operations together. If we're saving data to the db and to the filesystem at the same time, we'd want to make sure both writes fail or succeed together - otherwise we'd corrupt our records. So the filesystem repository would have normal commit() and rollback() methods. I'd personally implement the file handling logic using the pathlib library and use either write_text() or write_bytes() (depending on the requirements) to save data to the file. I'd commit() using write_text(). If rolling back, then nothing needs to happen - just close the file if you opened it earlier and/or flush the buffer if you were using one. Hope this helps!
Thank you for your question! Technically, Data Access Objects (DAOs) are objects that handle the implementation details of saving or retrieving data from a persistence storage. For example, if you use a SQL database, a DAO would take care of translating code to SQL statements. It's kind of related to Data Mapper, tho the idea in Data Mapper is to map data from the persistence storage into domain objects. So a Data Mapper may use a DAO to handle the low-level interactions with the db. Repository represents a collection of domain objects. It encapsulates the logic required to map db records to domain objects. In most situations, you don't have to handle DAOs directly. You'll normally use a SQLAlchemy model to write db queries using Python code. The SQLAlchemy model is your data mapper, and it'll use internal logic akin to a data access object to translate the code into SQL statements. I always recommend not introducing any business logic within SQLAlchemy models, and to keep them strictly as a data representation layer. So the way I use SQLAlchemy models is closer to a DAO. If you need to write very complex queries not worth doing with SQLAlchemy, then you'd write your own DAO. It could be a function that puts together the right SQL statement to retrieve some data. When you map that data to a domain object, you're writing a data mapper. All that logic would be encapsulated behind a repository. Note that I say a DAO could be a function despite its name saying data access OBJECT. Historically, the concept comes from the Java community, where everything used to be objects, si that was the only possible implementation. In Python, it can be anything you want and suits your needs. Hope this helps 🚀🚀!
Your repository looks like a simple DAO to me. There is like a one-to-one mapping between your (domain) model and the database. A repository can achieve more complex things and more custom mappings between the domain objects and the database. In particular, if the Aggregate Root has multiple Value Objects, the repository is responsible for creating or removing these VOs (and for hiding this complexity). Thanks for the video.
Thanks for your comment @Angély! You're absolutely right, I totally forgot to bring up this point! I guess I was too focused on encapsulation 😅. Maybe I should cover repository more in-depth in another video!
@@microapis Well, I used the Repository pattern in a recent Python project (also based on FastAPI), hence my comment 😊 I have a case where the database differs from the domain objects, so I saw a benefit in applying the Repository pattern. But when using ORM and CRUD apps (one-to-one mapping between entities and the database), it seems to me it just adds another layer for nothing really. I just published the project on GitHub. If interested, I'd gladly share it here. It is rather simple and it doesn't include an ORM (SQL queries not being complex).
@@microapis Wow, TH-cam has a nice feature called "we silently delete comment with a link in it." 😄 I cannot post the full link, so for those interested: *angely-dev/freeradius-api* after the base URL of GitHub. Conceptual approach at the end. Repositories implementation are in *src/pyfreeradius* file. It surely is not perfect (e.g., committing inside the repositories) but I think it could be a good example of "objects reconstitution" as per the DDD. Thanks again for the video, glad I'm not the only one trying this pattern in Python.
@@angely9783 Thanks for the code! That's a great example of how to encapsulate complex database operations behind a repository, which is when the pattern becomes truly useful. Great job!
@@jcatstreams8550 Fowler is a good start, but it only scratches the surface. To learn more about repositories, check out Eric Evan's "Domain-Driven Design". In addition to reading these books, I always recommend to work through the examples, and to think of how to apply them to different situations. Again, if you have any questions or difficulties working out the repository pattern I'm here to help!
This was super-helpful for me José, thank you so much! This is by far the best video on TH-cam for repository pattern in Python. I've know the abstract pattern for some time but struggled to interpret it into a Python context. This totally did the trick for me, thanks again so much!
Thank you for your kind words Steve! I'm so glad to hear you found it useful!
great and comprehensive video, thanks!
Thank you for your kind words Kevin 🙏!
Hola Jose, thank you for your video. As a .NET developer currently working with Python, I feel uneasy when I see global dependencies, loosely typed parameters... I find your more professional demos to be particularly useful.
Have a question: could you explain the purpose of the RepositoriesRegistry? I'm wondering if it would be possible to simply pass the BookingsRepository as a parameter instead of creating a RepositoriesRegistry. What is the benefit of using RepositoriesRegistry in this context?
Great video, very well explained, thanks!! 😀
Thank you for your nice feedback ❤❤!
Ok, this looks good for CRUD. What about beyond CRUD? How do we do joins with this repo pattern? Or a simple subquery? How can we group queries into one instead of hitting the DB several times?
I mean at least you're still tightly coupled to SQLAlchemy and you're returning the sqlalchemy instances as is. I've seen some repo patterns where people serialized those into intermediate objects, creating even bigger problems down the line cause of the need to re-fetch that sqlalchemy object in different functions.
I feel like everyone goes for repository patterns for simple projects which ends up creating way more problems than it solves.
Also, I love the repository registry idea, but am not loving the fact that it's injected into the request object. I think it would we way cleaner as a dependency!😁
Thanks for your comment @RamiAwar! This is a great discussion. First of all 💯 repository isn't suitable in all situations - if it doesn't help it has no place in our codebase.
Re joins and subqueries - I think this is query repository shines. You are simply encapsulating the complexity of those queries away from your business and other layers. The example in the video may give the wrong impression that repository is a 1-to-1 between classes and tables, but that's not how it works in practice. Domain models usually pull data from multiple tables and the repository should serve those needs.
Ideally, repository doesn't return SQLAlchemy objects, but DTOs or something similar like you say. In the tutorial, the repo's add() method returns an instance of a plain Booking object (github.com/abunuwas/repository-pattern-tutorial/blob/master/data_access/repository.py#L24) and the list() method returns a list of dictionaries (github.com/abunuwas/repository-pattern-tutorial/blob/master/data_access/repository.py#L14). A plain and simple DTO would be a better choice.
Personally, I only use the repository pattern when I want to enforce a clear separation between data access and other layers for testing and other purposes, and when queries are growing complex and I want to abstract them away from the business layer.
Hope these comments help and thanks again for sharing your thoughts!
great video. tks
Thanks for checking and for your kind comment 🙏!
Really good video. I'm starting to learn about FastAPI. One question I have about your design is related to the dependency injection.
Why are you injecting the dependency as part of the "app" object (create_server method) instead of adding them to the dependencies list offered when creating a new "FastAPI" and then using them somehow in the routes of the controller?
Shouldn't the Repository be added only at the Router level for Bookings instead of at the level of the entire application?
Let me know.
Hi Lucas thank you for your kind feedback and for your questions 🎉! This is a very good question about dependency injection in FastAPI, so let me analyze each option separately:
👉 Using 𝐠𝐥𝐨𝐛𝐚𝐥 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬 (fastapi.tiangolo.com/tutorial/dependencies/global-dependencies/): Unfortunately, global dependencies can't return a value, so we can't register the repositories as global dependencies and access them in the routes. There's currently a feature request open in FastAPI to make this possible (github.com/tiangolo/fastapi/issues/4246). It would be great if this gets done, because this would be the proper way of registering dependencies.
👉 Using 𝐫𝐨𝐮𝐭𝐞𝐫 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬: the APIRouter class has the same problem, so we can't register repositories as dependencies for a whole group of routes.
👉 Using 𝐫𝐨𝐮𝐭𝐞-𝐛𝐚𝐬𝐞𝐝 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬: finally, we could inject the repo directly on the routes, but then we would be importing a specific implementation of the repo in the controller, which means we tight couple the repo with the controller and we wouldn't be able to easily inject test repositories.
To fully take advantage of dependency injection, we want to be able to inject our dependencies from an entry point which is fully under our control, which in this case is the create_server() function. Through that function, we control how FastAPI is initialized and which dependencies must be injected into it.
Hope this makes sense! Let me know if you have more questions!
Hi @@microapis , thank you so much for your response and sorry about the delay of mine.
As far as I've seen, there's is a parameter that you can specify at the Router or App level called "dependencies", which is a list.
The problem of doing this is that in order to access a certain dependency from the route itself (the endpoint of the controller), you would need to know the order in which such dependency was injected, which to be honest it sucks. I don't understand why FastAPI does not have anything better for this situation.
To me, registering everything at the application level and then having to know inside each controller endpoint how to access those dependencies doesn't make much sense either, it breaks the idea of having each controller receive only what it needs. I'd rather prefer to inject the repository at each route despite of the duplication of the code than having to register them at the application level. Why do you mention that if you inject the Repository at the controller's route level then you wouldn't be able to test it? Can't you just mock that same repository in the test and that's it?
Let me know.
@@lucasalvarezlacasa2098 Hi Lucas thank you for your answer! In this case, it's all about tradeoffs. My solution isn't perfect, but it's a common use of dependency injection. The main benefit of my approach is it gives you control over which dependencies must be injected at load time, which is when you have most control over your app configuration. Notice that, although repositories and sessions are available to all routes, you still need to instantiate them, so you're not unnecessarily opening database sessions and so on. I like this approach because it allows you to set the application in its desired state from the moment you create it, and it avoids duplication.
Injecting directly on the routes is doable, but it means you'll be importing the session maker from SQLAlchemy and the repositories directly in your router module. It means any test on any route within that module needs to mock those import paths if you want to keep the test isolated. The downside of this for me is that if I'm doing a bunch of tests that have nothing to do with the database, I still need to mock those imports. It kind of defeats the purpose of dependency injection.
The trickiest part really is SQLAlchemy. You need to create the database connection and get the session factory somewhere. Ideally, that happens outside of your routes. So one compromise could be to set up SQLAlchemy in the server factory function ("create_server()"), and inject the repositories in the routes.
As you say, and as many devs have requested, ideally we would be able to inject the dependencies at the app or router level and access them as parameters in the routes. Hopefully we'll see this feature available soon.
As I say in the video description, my solution here is opinionated, and I'm sure it won't necessarily work well in all cases. Like all things in software, there's hardly a universally right way of doing things and it's all about use cases and the needs of your project.
Is this approach for putting a session object into an app going to work for an async session? I was struggling to make this work but without luck.
Hi thank you for your question and sorry for my late reply! Bear in mind what we're injecting is the session factory, not the session itself. Do you have an example of the code you're struggling with?
@@microapis Good point, I'll be back to it then. Thanks.
When I try to use the context manager protocol, I get 'TypeError: 'AsyncSession' object does not support the context manager protocol' error, on the other hand, session = request.app.session_maker(), works as expected and I get result, but Postgres is complaining about pools. "Please ensure that SQLAlchemy pooled connections are returned to the pool explicitly, either by calling ``close()`` or by using appropriate context managers to manage their lifecycle." and await session.close(), does not seem to solve the problem.
What if I wanted to save a booking to the DB & also the filesystem, it seems because you bind the repo inside of the api route theres no choice to change it or have multiple?
Thanks for your question @yslx and apologies for the late reply! This is actually a great question. In this case, you'd create another repository for the file system, and list this repo in the registry together with the others.
The only complication is when we initialise a repository, we typically pass the session. In a file system repository, a database session doesn't make sense - we'd probably pass a file name, or perhaps a buffer. So the business layer needs to distinguish between different types of repositories and know which type of argument to pass, or preferably we encapsulate this knowledge within the registry itself.
What is important is ensuring that we either commit or roll back all the operations together. If we're saving data to the db and to the filesystem at the same time, we'd want to make sure both writes fail or succeed together - otherwise we'd corrupt our records. So the filesystem repository would have normal commit() and rollback() methods.
I'd personally implement the file handling logic using the pathlib library and use either write_text() or write_bytes() (depending on the requirements) to save data to the file. I'd commit() using write_text(). If rolling back, then nothing needs to happen - just close the file if you opened it earlier and/or flush the buffer if you were using one.
Hope this helps!
@@microapis thanks a lot for the detailed reply! Some food for thought
Thank you :)
Great! Thanks!
Thank you for your kind feedback!
what is the difference between a repository pattern and DAO ? When to use each ?
Thank you for your question! Technically, Data Access Objects (DAOs) are objects that handle the implementation details of saving or retrieving data from a persistence storage. For example, if you use a SQL database, a DAO would take care of translating code to SQL statements.
It's kind of related to Data Mapper, tho the idea in Data Mapper is to map data from the persistence storage into domain objects. So a Data Mapper may use a DAO to handle the low-level interactions with the db. Repository represents a collection of domain objects. It encapsulates the logic required to map db records to domain objects.
In most situations, you don't have to handle DAOs directly. You'll normally use a SQLAlchemy model to write db queries using Python code. The SQLAlchemy model is your data mapper, and it'll use internal logic akin to a data access object to translate the code into SQL statements. I always recommend not introducing any business logic within SQLAlchemy models, and to keep them strictly as a data representation layer. So the way I use SQLAlchemy models is closer to a DAO.
If you need to write very complex queries not worth doing with SQLAlchemy, then you'd write your own DAO. It could be a function that puts together the right SQL statement to retrieve some data. When you map that data to a domain object, you're writing a data mapper. All that logic would be encapsulated behind a repository.
Note that I say a DAO could be a function despite its name saying data access OBJECT. Historically, the concept comes from the Java community, where everything used to be objects, si that was the only possible implementation. In Python, it can be anything you want and suits your needs.
Hope this helps 🚀🚀!
Why do you call _id a class property instead of a private property?
Your repository looks like a simple DAO to me. There is like a one-to-one mapping between your (domain) model and the database. A repository can achieve more complex things and more custom mappings between the domain objects and the database. In particular, if the Aggregate Root has multiple Value Objects, the repository is responsible for creating or removing these VOs (and for hiding this complexity). Thanks for the video.
Thanks for your comment @Angély! You're absolutely right, I totally forgot to bring up this point! I guess I was too focused on encapsulation 😅. Maybe I should cover repository more in-depth in another video!
@@microapis Well, I used the Repository pattern in a recent Python project (also based on FastAPI), hence my comment 😊 I have a case where the database differs from the domain objects, so I saw a benefit in applying the Repository pattern. But when using ORM and CRUD apps (one-to-one mapping between entities and the database), it seems to me it just adds another layer for nothing really. I just published the project on GitHub. If interested, I'd gladly share it here. It is rather simple and it doesn't include an ORM (SQL queries not being complex).
@@angely9783 would love to take a look the repo!
@@microapis Wow, TH-cam has a nice feature called "we silently delete comment with a link in it." 😄 I cannot post the full link, so for those interested: *angely-dev/freeradius-api* after the base URL of GitHub. Conceptual approach at the end. Repositories implementation are in *src/pyfreeradius* file. It surely is not perfect (e.g., committing inside the repositories) but I think it could be a good example of "objects reconstitution" as per the DDD. Thanks again for the video, glad I'm not the only one trying this pattern in Python.
@@angely9783 Thanks for the code! That's a great example of how to encapsulate complex database operations behind a repository, which is when the pattern becomes truly useful. Great job!
Your coverage is below 20% 😮
It’s not repository pattern, it’s trash, gl
Happy to help if you have any questions or difficulties understanding how it works!
@@microapis u need to read Martin Fowler, why u try to teach someone?
@@jcatstreams8550 Fowler is a good start, but it only scratches the surface. To learn more about repositories, check out Eric Evan's "Domain-Driven Design". In addition to reading these books, I always recommend to work through the examples, and to think of how to apply them to different situations. Again, if you have any questions or difficulties working out the repository pattern I'm here to help!