This is the first vid that truly explained this N + 1 problem so easy a newbie would understand, and showed clearly the examples with and without. Nice job!
5 ปีที่แล้ว +4
Been looking for documentation about N+1 problem and couldn't find anything practical about it. This video just saved my life. Thanks.
True story, i was asked this question in an interview, the interviewer just asked how would you fetch a foreign key in a one to many database. I didnt understand the question then, gave a non optimized answer , 6 months later i realize why i was rejected. 😢 Btw great video, thanks man. You are the first person i subscribe to on youtube. 😁
This is the reason I'm always 100% advising people do not use GraphQL unless they know what they're doing. This is the most-basic thing about GraphQL and the thing I've had trouble with myself in the past. It's difficult to solve unless you understand how GraphQL is querying and managing resolvers. I didn't know it had a name, but I'm glad someone else is pointing this out. It's gotta be the one thing that I've consistently noticed in GraphQL services.
What I like about DataLoader is two fold; one is that it can be used with any data source you may have, as explained in the video. The other is separating the concerns of fetching that data into a loader, rather than the resolver. Separating the two makes it easy to maintain and substitute the underlying fetching logic and it can even have multiple loaders for one resolver, based on some condition. While DataLoader may provide caching capabilities, it is not actually the main purpose of it; it is a data loading mechanism and it doesn't matter where the data comes from, it just ensure that a signal value by its key always gives you the same value (like what _batching_ does).
@@ascensionblade indeed, sir. It's common practice in our industry to adopt measures to improve performance without even knowing whether we need it all. Developer time is very expensive.
In NoSQL DBs is common to use denormalized schemas, so in your example, we could just store the author data within the book records themselves and we wouldn't have this problem. Obviously we would then need to update the books on author changes, but this is fine if we assume that the number of reads is way larger than the number of writes. A safe assumption for most use-cases.
Thankyou so much. It's super helpful. I think example with dataloader ( can be seen at 8:00 minute ) to solve this n+1 is better, because in this way we have the power of resolvers i.e. reusability and dry code. Other than this, where we are doing every thing in single resolver, no possibility of reuse, wet wet code everywhere.
What's the point with the "plumbing to get back the authors in order"? Is it the case that simply returning "authors" does not ensure they are returned in order?
One solution I use is to have the AuthorLoader (because your gql api can also have a authorById or any others "entrypoints") then in the hydrate function you can fill the AuthorLoader cache with its prime() function. It's a mix between your join and dataloader solution, to get only one database query and keep the dataloader advantages in complexes gql queries.
I'm from the REST world. I respect GraphQL and I'm pretty new in it because I didnt' have a chance to use it in my projecects. I saw a lot from Ben's channel and Ben is doubtless great teacher and explanator. But I also saw a lot of things from GraphQL those are 'reinvent the wheel'. As much as I understood, the point of GQL is to optimize req/res between client and server, to make more comfortable env for developers... Internet is more and more faster so there is a question when the speed becomes criticatl? And I saw a lot of complexity in the code for GQL implementations. So I didn't see quite enough reasons to jump to GQL in some of my projects, when it is much more simpler and readable with the REST, without all those packages and libraries like Apollo. But from new guys in web development, I suppose it's a good idea to start and continue with GraphQL. And finally I hope next tool for communication between server and client will be much simpler than this one. Why something is new and maybe quite good when it makes the things much confusinge and a lot of headache.
At 10:29 there is an example with JOIN. I think there is a hidden bug in addition to the inconvenience of the extra data being pulled. Before the proper mapping is done using hydrate(...), there is an object of shape { id, title, authorId, name }. Does this not mean that any column names which overlap between the joined tables will get overwritten? i.e. what if the book has a `name` column and so does the author - how does the hydrate(...) function distinguish the two?
This is incredibly cool and still relevant 2 years later. I do have a question though: the hydrate function is doing the job of a field resolver, meaning e.g. if we want to add another sub-field on the author in the future, and that field is a relational field in a third table, the complexity of that hydrate function increases. Is it possible to keep the field resolvers but still have conditional joins? It seems like it might be possible because the context function takes in the request object... but I'm not sure how hard it would be to create that doesPathExist function when the request object is the parameter, or whether that fully solves the problem in the first place
Nice presentation of what I would call "resolving sub-queries". The next problem might be to resolve entities/items in use multiple. Eg think of a collection of books each having author(s) and reviews. And the reviews in themselves having authors as well. How would we make sure to "dataload" authors last?
this is an excellent video as usual, but I'm wondering how to do a generic dataloader factory, especially when it comes to re-ordering the result of the query (the trick of AuthorMap in your video). The way you are doing it assumes that the type of the keys is a string or a number, as after, you are creating your AuthMap object which is a Record where, in your example typeof Key is the value of you Author.id. If your Author.id is a string or a number, no issue. But what about if the field of the author you want to "cache' is not a number of a string ? Below is a piece of code where I'm struggling to define the refMap record. I had to transform the type of the key in string to make it works (need also to enforce TTypeOfKey to extends string|number) export const buildLoader = < TEntity extends MyBaseEntity, TTypeOfKey >( entity: TypeObject, keyName: keyof TEntity ): DataLoader => { const loader = new DataLoader( async (valuesofkey: Readonly) => { const vals: TTypeOfKey[] = valuesofkey.map((val) => val); const unsortedValues = await getConnection() .getRepository(entity.name) .find({ [keyName]: In(vals) }); const refMap: Record = {}; { const theValue: TTypeOfKey= (val[keyName] as unknown) as TTypeOfKey; refMap[theValue] = val; }); return valuesofkey.map((val) => refMap[val]); } ); return loader; }; usage : const AuthorLoaderId = buildLoader(Author, "id"); const AuthorLoaderAge = buildLoader(Author, "age"); const AuthorLoaderWhatever = buildLoader(Author, whatever);
If your data is stored in sql, you can also make use of views and transactions (or procedures) which is way easier to scale off and prevent redundant code.
What if you are using a document database like MongoDB and you have nesting? Say for example, I have Bands with Albums but I am also nesting Songs in there. In this case, I could use includes on the Albums but I would have to use $in for fetching the Songs in a different query. See another way?
Do Prisma resolvers already solve N+1 problem? I saw this video and went ahead to try out my GraphQL server with SQL debug from prisma and it looks that when I Query a 1:N relationship it does not trigger n+1 SQL queries. Very good video
You solve this by simply limiting the amount of data that can be requested, like you should do anyway in a real-world project. Otherwise you have bigger problems, such as opening yourself to DDoS by someone requesting a lot of data from GraphQL in a single request for example. The most common way to avoid this is to use pagination. There's cursor pagination which returns nodes and edges (like a graph) and a cursor, that can be used to get the next graph of data. You can also just use a regular pagination, which takes a page and offset and returns a list of items within the thresholds you have set (like max items that can be returned)
hi, could you make a video about handling error in graphql mutations or queries, I'm having an issue about connection reset on graphql server due to a lot of timeout sockets are being hanged when throwing exception in case of error provided input. Any idea how to solve it? Thanks
I know you mentioned Join Monster, but why use Apollo Server at all rather than an automated db frontend (postgraphile, edgedb) or other graphql server (hasura)?
One reason would be that there are multiple data stores within your organisation. Say you are a company with some data in salesforce, some in a sql db, some in mongo, some data accessible from 3rd party API. Now imagine you have a team making some dashboard or Web app that needs data from these disparate sources. You can present all of these different data sources to multi different front end graphql client projects via a single graphql server. If your organisation has all its data in a single database, and only a limited, predictable number of clients of that data, such that just writing a few rest endpoints does the trick, graphql is likely to be overkill.
Looking at first sight to your authorLoader implementation, you could write a generic function with that implementation but receiving the name of the table to be called with knex and use it in every relation you need within the context, am i right?
One question, if author has something like company, and we want to do this query { books { author { company , Is it possible to use the last approach for getting the companies? It seems that for nested relations we need to go with Data Loader
But with joins, you can't do circular references, like get a book > all authors -> books (e.g. you need books of the second author) and so on, as you don't have a field resolver anymore. And that's basically one of the important GraphQL ideas/features (without it, it's basically a kind of RPC).
Hey Ben, excellent work, but for the conditional approach (best of both worlds) how do we deal with more nested types and fields without it getting too messy? And what about "hydrating" one-to-many relationships at arbitrary level of nesting (from flat table of query result)? E.g. Books > Authors > Reviews (note plural)
> without it getting too messy 1. Building an abstraction to deal with the mess 2. Use an abstraction like Hasura/Prisma/Postgraphile to deal with the mess for you > And what about "hydrating" one-to-many relationships at arbitrary level of nesting You basically do a bunch of group by's or you can skip joins and use subqueries: twitter.com/benawad/status/1205879524573007875
Wouldnt eager loading or doing a left join on books and author fix the n+1? Or it really needs to be on a key resolver, kinda make sense, if you'd eager load Author on the Books Query function it'll load it whether you asked for author or not. Did I understand it correctly?
@@bawad apologies, i realized you covered everything in the video, I commented this way before you mention left join but I left away and continued later.
Brilliant video Ben, thoroughly enjoyed it and really well explained. Am I right in saying that TypeOrm using cascade and populating relations in a user query with nested relations e.g. User.find({ relations: ['recipes']}) will do this automatically in a single SQL query compiled through LEFT JOIN?
Ben, field resolvers are cool, but is there a way to use Eager loading (JOINs) instead of field resolvers? I mean, use eager loading only when necessary using libraries like Typegraphql + Typeorm?
I think you could pass some custom builder abstraction into the graphql ctx. All the resolvers use that, but only at the very root does it query the database from the builder. Haven't really used graphql so not sure if possible
Great presentation but when you look at the number of ms you saved it doesn't really change/get better. Is this because the DB is so close and there is no ping?
IMO, n+1 isn’t a real issue when we have to implement pagination. With pagination, the performance gain by eager loading doesn’t trade off the complexity. Eager loading is generally not scalable.
Can you explain why dataLoader need to be created per relation. I created dataLoader per field and it works nicely. So for example all my entities have createdBy field, which is ID to user table/collection, I use dataLoader for all relations that need to get user by id and they all seem to be batched into single query.
You should not use await without try catch if you don't want errors being swallowed. You should return the queries directly without assigning the result to a local variable.
> You should not use await without try catch Not always true. Using Try/catch means that you want to handle errors that happens as close to the source as possible, which can be fine sometimes, but this makes error handling very cumbersome. It is recommended to let errors pass through until it gets to a point where it can get handled. This of course, depends on your use-case and the type of errors you want to handle, but in those times it makes sense to rethrow the error so that it can be handled a more appropriate place.
Thanks for this video on a very common problem. For many-to-one relationships, I'm doing joins like you are. But for one-to-many relationships, I still don't have a way to do it in only one SQL query. Do you have any insight on this?
@@bawad if you have a query "authors" and you query { id, name, Books { id, title} } ; with something like "SELECT * FROM author JOIN book ON book.author_id = author.id" your sql query will return too many lines and I believe your Query will return too many authors?
yeah you just need to format the data you get back, something like: const rows = knex(); const books = []; rows.forEach(row => { books.push({ id: row.bookId, title: row.bookTitle }) }) return { id: row[0].id, name: row[0].name, books }
@@bawad I think that's still going to give you back the authors repeated, no? For Authors with many Books the result rows would be Authors x Books. For a DB with JSON functions it would be easy to transform related rows to one row but without such a function one needs to loop through as many nested relationships as queried to check if the parent has changed. Maybe a "nested map" function? But that's probably not a thing. Anyway, this is a general result transformation problem, not N+1, but would still be great if you could tackle it and show us!
yeah this doesn't work for arbitrary levels, I haven't created a function for that before usually I take it case by case I'll have to think about how an arbitrary one would work
why not create views table for relationship table and make query to view table. in that way we don't have to join every time or use any data loaders...
I really don't understand why people always need to find clever ways of working around SQL. Back in the Ruby on Rails days they also had this N+1 Problem when they were using ORM. This was 2011, but Relational DB's have solved this N+1 Problem in like the 90s or even earlier. At this point i have to ask: why not access the db directly from the client with SQL. Graph Ql seems to me like "not invented here" methodology.
My young friend you just start scratching the surface of the challenges of serious application development. I was experimenting with GraphQL in early 2019, since I am open to learn everything new and innovative, and quickly realized that it is great for "hello-world-sized" demo applications, but when you are working on real business applications from the arena of warehousing, manufacturing, logistics, and others; this technology is a total dead-end and waste of time. SQL is with us for 40 or so years, nothing can replace it. Do you remember ODATA, for example? With your nifty resolvers you are just overcomplicating your system and make them dead slow, and enormously hard to maintain or debug, especially with typeless languages like JavaScript, Python. If you want a flexible query engine, simply receive SQL queries from your clients and return back JSON structures, dead easy and the most performant solution, the simplest to maintain and debug. No-SQL databases are anyway totally unusable for complex business database systems/applications.
@@iAmCracky Facebook has 8500 programmers to maintain the system. Facebook is not a business application, it is a good business, but it is an entertaining system and it is dead simple vs a manufacturing or ERP system.
Hi, i want to use dataloader with mongoose, i am using this www.instapainting.com/blog/use-dataloader-to-batch-and-cache-arbitrary-mongodb-queries examle, it works, but how can i use projection.is there any easy solution which i didn't know
Be careful with dataloader. As with any caching mechanism, your implementation should NOT rely on some cache representation. That's exactly how to get stale reads and buggy logic
I think uploading in Nest should be similar to how you do it in apollo since nest-graphql is just a wrapper around apollo-server. I haven't tried it but I think the pain in uploading is usually in encoding a multipart request and you need to put your upload type in your input as a top-level parameter if it's nested inside another object it never works at least in my experience with apollo-express-server. so your mutation should be like the update profile example below type Mutation { updateProfile(input: ProfileUpdateInput, avatorFile: Upload, coverFile: Upload): ProfilePayload! } notice the files with type 'Upload' they are top-level if you embed them inside another structure they don't work at least on apollo-express-server. I hope it helps
@@ebundala Ok, thank you. I will try it. Then what about the resolver? is it using Interceptor like in Rest? can you write the function? I'm sorry. I just used Nest
A kitten dies every time an open 'where in (.....) is used' on a relational DB. If you select always the fields you want to return, the DB will not do the Join if is not needed. So, you don't need to do anything on the code. DB's are clever enough to not to do the join if you are not returning any value from the right side. Always use Joins, even for 1 result. Don't be stupid. Sincerely, your DBA.
> A kitten dies every time an open 'where in (.....) is used' on a relational DB. Is that slower than a query with a join? I assumed the performance profile is something like: select * from users u inner join articles a on u.id = a.userId; => 10 ms select * from articles where id (1, 2, 3); => 10 ms worst case > If you select always the fields you want to return, the DB will not do the Join if is not needed Oh, good to know, thanks!
@@bawad Roundtrips are not the only metering to consider here. You have to consider also the work on the DB.Join is the best, because it's all an internal DB issue. When having a Join, DB just have to play with indexes to get the right records, and then return the right data once all is resolved. That's very quick (Playing with indexes is very quick, the best O() value you can achieve on a DB). The IN clause is very unoptimizable..., the DB can do either a select for each id internally, and then return the results, or if there are more than certain amount, it just selects from lower to bigger and filters the results, which can be even worse. So at best, you are just repeating the individual selects, just on the DB side (It can last a lot of time, so you will have a single roundtrip but much much slower) and at worst, when you have 100 or more ids inside the IN, the DB may do a select of thousands of hundreds or records and then loop while filtering.
Hi, Ben! I wrote some code about preparing graphql query to DB query (3 level depth query makes 1 DB query with fields you exactly want to have) and wrote post for it: could you tell your opinion please, thanks: post: rootcomponent.com github: github.com/AleksandrNi/apollographql-query-compiler
This is the first vid that truly explained this N + 1 problem so easy a newbie would understand, and showed clearly the examples with and without. Nice job!
Been looking for documentation about N+1 problem and couldn't find anything practical about it. This video just saved my life. Thanks.
True story, i was asked this question in an interview, the interviewer just asked how would you fetch a foreign key in a one to many database.
I didnt understand the question then, gave a non optimized answer , 6 months later i realize why i was rejected. 😢
Btw great video, thanks man.
You are the first person i subscribe to on youtube. 😁
12:35 No need to reinvent the wheel. There are a couple of projects who do that. Notably graphql-parse-resolve-info
good point
This is the reason I'm always 100% advising people do not use GraphQL unless they know what they're doing. This is the most-basic thing about GraphQL and the thing I've had trouble with myself in the past. It's difficult to solve unless you understand how GraphQL is querying and managing resolvers. I didn't know it had a name, but I'm glad someone else is pointing this out. It's gotta be the one thing that I've consistently noticed in GraphQL services.
The importance of this video cannot be overstated. Well done.
What I like about DataLoader is two fold; one is that it can be used with any data source you may have, as explained in the video. The other is separating the concerns of fetching that data into a loader, rather than the resolver. Separating the two makes it easy to maintain and substitute the underlying fetching logic and it can even have multiple loaders for one resolver, based on some condition.
While DataLoader may provide caching capabilities, it is not actually the main purpose of it; it is a data loading mechanism and it doesn't matter where the data comes from, it just ensure that a signal value by its key always gives you the same value (like what _batching_ does).
You continue to have some of the best content on TH-cam. Well done Ben.
That's the best and simple explanation of N+1 problem I have ever encountered.
Love the pragmatism. As always, measure before doing any optimisation.
Hi! I don't understand what you mean. Is the idea that one should always measure before optimizing a commonly-held belief?
@@ascensionblade indeed, sir. It's common practice in our industry to adopt measures to improve performance without even knowing whether we need it all. Developer time is very expensive.
@@eliseumds it depends, a year ago i wouldve worked for u fulltime without expecting 50-100k a year. Today thats a different story.
In NoSQL DBs is common to use denormalized schemas, so in your example, we could just store the author data within the book records themselves and we wouldn't have this problem. Obviously we would then need to update the books on author changes, but this is fine if we assume that the number of reads is way larger than the number of writes. A safe assumption for most use-cases.
Wow, couldn't have demonstrated the topic in a simpler way. Thanks so much, great pace and great use of examples.
thanks!
Everything you need to know about n+1 problem and its solution options. Incredible to have achieved all of this in 16 min.
Anyone that knows SQL intimately will watch this video and hopefully say: "WTF! This is progress? So much code and META to do such a simple task".
😂
Yep. All so we can pick what fields we need
This was actually my drawback using GraphQL coming from SQL, but the alternative is REST.
Thankyou so much. It's super helpful. I think example with dataloader ( can be seen at 8:00 minute ) to solve this n+1 is better, because in this way we have the power of resolvers i.e. reusability and dry code.
Other than this, where we are doing every thing in single resolver, no possibility of reuse, wet wet code everywhere.
Brilliant presentation on using dataloaders and joins! 👍🏾
What's the point with the "plumbing to get back the authors in order"? Is it the case that simply returning "authors" does not ensure they are returned in order?
(in addition to the sorting, it ensures the returned array is the same length as the input array of keys - also a requirement of dataloader)
Ben gives very clear explanation of N+1 problem. Thank you, Ben!
Java jpa solves n+1 using entity graph that tells it how/when to fetch. Have to say I like that solution better than writing full sql or dataresolver.
Thanks Ben, This was awesome. especially learning about the 4th GraphQL argument.
One solution I use is to have the AuthorLoader (because your gql api can also have a authorById or any others "entrypoints") then in the hydrate function you can fill the AuthorLoader cache with its prime() function. It's a mix between your join and dataloader solution, to get only one database query and keep the dataloader advantages in complexes gql queries.
interesting, that's kinda of cool
I'm from the REST world. I respect GraphQL and I'm pretty new in it because I didnt' have a chance to use it in my projecects. I saw a lot from Ben's channel and Ben is doubtless great teacher and explanator. But I also saw a lot of things from GraphQL those are 'reinvent the wheel'. As much as I understood, the point of GQL is to optimize req/res between client and server, to make more comfortable env for developers... Internet is more and more faster so there is a question when the speed becomes criticatl? And I saw a lot of complexity in the code for GQL implementations. So I didn't see quite enough reasons to jump to GQL in some of my projects, when it is much more simpler and readable with the REST, without all those packages and libraries like Apollo. But from new guys in web development, I suppose it's a good idea to start and continue with GraphQL. And finally I hope next tool for communication between server and client will be much simpler than this one. Why something is new and maybe quite good when it makes the things much confusinge and a lot of headache.
At 10:29 there is an example with JOIN. I think there is a hidden bug in addition to the inconvenience of the extra data being pulled.
Before the proper mapping is done using hydrate(...), there is an object of shape { id, title, authorId, name }. Does this not mean that any column names which overlap between the joined tables will get overwritten? i.e. what if the book has a `name` column and so does the author - how does the hydrate(...) function distinguish the two?
You'll want to use aliases in that case
This is incredibly cool and still relevant 2 years later. I do have a question though: the hydrate function is doing the job of a field resolver, meaning e.g. if we want to add another sub-field on the author in the future, and that field is a relational field in a third table, the complexity of that hydrate function increases. Is it possible to keep the field resolvers but still have conditional joins? It seems like it might be possible because the context function takes in the request object... but I'm not sure how hard it would be to create that doesPathExist function when the request object is the parameter, or whether that fully solves the problem in the first place
Nice presentation of what I would call "resolving sub-queries". The next problem might be to resolve entities/items in use multiple. Eg think of a collection of books each having author(s) and reviews. And the reviews in themselves having authors as well. How would we make sure to "dataload" authors last?
Had to implement DataLoader and needed exactly this! Thanks
Could you please tell me how to log the query like you?
with knex you use debug environment variable github.com/benawad/graphql-n-plus-one-example/blob/master/package.json#L7
@@bawad I already tried it with dotenv before and it was't working but now looking at your code I tried it with corss-env and it works, I wonder why?
Not sure, I would imagine it works with dotenv too
this is an excellent video as usual, but I'm wondering how to do a generic dataloader factory, especially when it comes to re-ordering the result of the query (the trick of AuthorMap in your video). The way you are doing it assumes that the type of the keys is a string or a number, as after, you are creating your AuthMap object which is a Record where, in your example typeof Key is the value of you Author.id.
If your Author.id is a string or a number, no issue. But what about if the field of the author you want to "cache' is not a number of a string ?
Below is a piece of code where I'm struggling to define the refMap record. I had to transform the type of the key in string to make it works (need also to enforce TTypeOfKey to extends string|number)
export const buildLoader = <
TEntity extends MyBaseEntity,
TTypeOfKey
>(
entity: TypeObject,
keyName: keyof TEntity
): DataLoader => {
const loader = new DataLoader(
async (valuesofkey: Readonly) => {
const vals: TTypeOfKey[] = valuesofkey.map((val) => val);
const unsortedValues = await getConnection()
.getRepository(entity.name)
.find({ [keyName]: In(vals) });
const refMap: Record = {};
{
const theValue: TTypeOfKey= (val[keyName] as unknown) as TTypeOfKey;
refMap[theValue] = val;
});
return valuesofkey.map((val) => refMap[val]);
}
);
return loader;
};
usage :
const AuthorLoaderId = buildLoader(Author, "id");
const AuthorLoaderAge = buildLoader(Author, "age");
const AuthorLoaderWhatever = buildLoader(Author, whatever);
If your data is stored in sql, you can also make use of views and transactions (or procedures) which is way easier to scale off and prevent redundant code.
Data is in different databases?
SQL doesn't allow more than 999 params in the "in" clause.
Excelent content!!! Simple in describing the problem and then showing different solutions
Hi Ben,I am facing the same issue in spring Boot.Do you have any reference or course you have done on N+1 problem in GraphQL spring boot?
What if you are using a document database like MongoDB and you have nesting? Say for example, I have Bands with Albums but I am also nesting Songs in there. In this case, I could use includes on the Albums but I would have to use $in for fetching the Songs in a different query. See another way?
Do Prisma resolvers already solve N+1 problem?
I saw this video and went ahead to try out my GraphQL server with SQL debug from prisma and it looks that when I Query a 1:N relationship it does not trigger n+1 SQL queries.
Very good video
how to able to see info value, just cannot console.log to read the value
the dataloader version can make Long sql queries that can fail if you hit the limit of the database, the join version do not get this problem
You solve this by simply limiting the amount of data that can be requested, like you should do anyway in a real-world project. Otherwise you have bigger problems, such as opening yourself to DDoS by someone requesting a lot of data from GraphQL in a single request for example.
The most common way to avoid this is to use pagination. There's cursor pagination which returns nodes and edges (like a graph) and a cursor, that can be used to get the next graph of data. You can also just use a regular pagination, which takes a page and offset and returns a list of items within the thresholds you have set (like max items that can be returned)
hi, could you make a video about handling error in graphql mutations or queries, I'm having an issue about connection reset on graphql server due to a lot of timeout sockets are being hanged when throwing exception in case of error provided input. Any idea how to solve it?
Thanks
I know you mentioned Join Monster, but why use Apollo Server at all rather than an automated db frontend (postgraphile, edgedb) or other graphql server (hasura)?
One reason would be that there are multiple data stores within your organisation. Say you are a company with some data in salesforce, some in a sql db, some in mongo, some data accessible from 3rd party API. Now imagine you have a team making some dashboard or Web app that needs data from these disparate sources. You can present all of these different data sources to multi different front end graphql client projects via a single graphql server.
If your organisation has all its data in a single database, and only a limited, predictable number of clients of that data, such that just writing a few rest endpoints does the trick, graphql is likely to be overkill.
Will be nice to see how do you take advantages of graphql-parse-resolve-info. Great job! Thanks
Looking at first sight to your authorLoader implementation, you could write a generic function with that implementation but receiving the name of the table to be called with knex and use it in every relation you need within the context, am i right?
Yes
One question, if author has something like company, and we want to do this query { books { author { company , Is it possible to use the last approach for getting the companies? It seems that for nested relations we need to go with Data Loader
you can do 2 joins to fetch all that in 1 request
@@bawad but doing that would mean that i would had to check the path [books, author, company] in the books query. Am i right ?
yeah
But with joins, you can't do circular references, like get a book > all authors -> books (e.g. you need books of the second author) and so on, as you don't have a field resolver anymore. And that's basically one of the important GraphQL ideas/features (without it, it's basically a kind of RPC).
Hey Ben, excellent work, but for the conditional approach (best of both worlds) how do we deal with more nested types and fields without it getting too messy? And what about "hydrating" one-to-many relationships at arbitrary level of nesting (from flat table of query result)? E.g. Books > Authors > Reviews (note plural)
> without it getting too messy
1. Building an abstraction to deal with the mess
2. Use an abstraction like Hasura/Prisma/Postgraphile to deal with the mess for you
> And what about "hydrating" one-to-many relationships at arbitrary level of nesting
You basically do a bunch of group by's or you can skip joins and use subqueries: twitter.com/benawad/status/1205879524573007875
Wouldnt eager loading or doing a left join on books and author fix the n+1? Or it really needs to be on a key resolver, kinda make sense, if you'd eager load Author on the Books Query function it'll load it whether you asked for author or not. Did I understand it correctly?
yes
@@bawad apologies, i realized you covered everything in the video, I commented this way before you mention left join but I left away and continued later.
no worries
Good description of the problem.
Brilliant video Ben, thoroughly enjoyed it and really well explained. Am I right in saying that TypeOrm using cascade and populating relations in a user query with nested relations e.g. User.find({ relations: ['recipes']}) will do this automatically in a single SQL query compiled through LEFT JOIN?
yep
Ben, field resolvers are cool, but is there a way to use Eager loading (JOINs) instead of field resolvers? I mean, use eager loading only when necessary using libraries like Typegraphql + Typeorm?
You've actually covered my concerns until the end of the video. Thank you.
Thank you a lot, Really easy for newbie to grasp the concept
I think you could pass some custom builder abstraction into the graphql ctx. All the resolvers use that, but only at the very root does it query the database from the builder. Haven't really used graphql so not sure if possible
Great presentation but when you look at the number of ms you saved it doesn't really change/get better. Is this because the DB is so close and there is no ping?
yeah the performance is going to vary depending on the query and how much data is being fetched
IMO, n+1 isn’t a real issue when we have to implement pagination. With pagination, the performance gain by eager loading doesn’t trade off the complexity. Eager loading is generally not scalable.
Can you explain why dataLoader need to be created per relation. I created dataLoader per field and it works nicely. So for example all my entities have createdBy field, which is ID to user table/collection, I use dataLoader for all relations that need to get user by id and they all seem to be batched into single query.
That's what I mean, a dataLoader per relational field
Great video - what is the VS Code plugin that helps highlight brackets/parentheses pairs?
marketplace.visualstudio.com/items?itemName=CoenraadS.bracket-pair-colorizer-2
Very neatly explained. Thank you!
Hi Ben, Do you know how to access "info" in a type-graphql resolver?
I think its:
@Info() info
You should not use await without try catch if you don't want errors being swallowed. You should return the queries directly without assigning the result to a local variable.
agreed, just wanted to make it a little more readable
> You should not use await without try catch
Not always true. Using Try/catch means that you want to handle errors that happens as close to the source as possible, which can be fine sometimes, but this makes error handling very cumbersome. It is recommended to let errors pass through until it gets to a point where it can get handled. This of course, depends on your use-case and the type of errors you want to handle, but in those times it makes sense to rethrow the error so that it can be handled a more appropriate place.
Very well presented sir.
Thanks for this video on a very common problem.
For many-to-one relationships, I'm doing joins like you are. But for one-to-many relationships, I still don't have a way to do it in only one SQL query. Do you have any insight on this?
Why doesn't a join work for a one-to-many relationship?
@@bawad if you have a query "authors" and you query { id, name, Books { id, title} } ; with something like "SELECT * FROM author JOIN book ON book.author_id = author.id" your sql query will return too many lines and I believe your Query will return too many authors?
yeah you just need to format the data you get back, something like:
const rows = knex();
const books = [];
rows.forEach(row => {
books.push({ id: row.bookId, title: row.bookTitle })
})
return {
id: row[0].id,
name: row[0].name,
books
}
@@bawad I think that's still going to give you back the authors repeated, no? For Authors with many Books the result rows would be Authors x Books. For a DB with JSON functions it would be easy to transform related rows to one row but without such a function one needs to loop through as many nested relationships as queried to check if the parent has changed. Maybe a "nested map" function? But that's probably not a thing. Anyway, this is a general result transformation problem, not N+1, but would still be great if you could tackle it and show us!
yeah this doesn't work for arbitrary levels, I haven't created a function for that before
usually I take it case by case
I'll have to think about how an arbitrary one would work
what is knex in your code
why not create views table for relationship table and make query to view table. in that way we don't have to join every time or use any data loaders...
no matter what you choose, you can always put a caching layer in front
I love the idea of graphql but this was the first thing that put me off it... the number of independent network requests
Yeah it's definitely possible to optimize it, but it is a little involved
I'm working on a graphQL clone - I'm gonna have to tackle the N+1 problem eventually.
Why do you have question marks instead of IDs ?
I'm not sure
It's logging the prepared statement, which is the preferred way of logging SQL queries for both security and readability concerns.
well explained and a good overview of the solutions.
Hi Ben, you are right about join-monster.. too bad though... seemed promising.
Where you are printing these query request ?
console
@@AliYasir5247 no bro its not console
Great explanation! Thank you.
Superb tut Ben!
Great video as always ben
I really don't understand why people always need to find clever ways of working around SQL. Back in the Ruby on Rails days they also had this N+1 Problem when they were using ORM. This was 2011, but Relational DB's have solved this N+1 Problem in like the 90s or even earlier. At this point i have to ask: why not access the db directly from the client with SQL. Graph Ql seems to me like "not invented here" methodology.
Hi Ben,
Let we know what's you gonna learn in 2020?
I actually don't know yet
this is very useful for me thank you so much
Is there a reason you used knex for this video? I usually see you using typeorm.
I'm panicking, thinking you know something I don't...
I was wondering the same thing especially as there are rumours that typeorm is waning?
I still use TypeORM
just wanted to make this video in javascript for the devs that aren't familiar with type-graphql syntax
@@bawad 👌Now I can sleep tonight.
Great video, as always!
What is wrong with the sort function? :'( It's right there
My young friend you just start scratching the surface of the challenges of serious application development. I was experimenting with GraphQL in early 2019, since I am open to learn everything new and innovative, and quickly realized that it is great for "hello-world-sized" demo applications, but when you are working on real business applications from the arena of warehousing, manufacturing, logistics, and others; this technology is a total dead-end and waste of time. SQL is with us for 40 or so years, nothing can replace it. Do you remember ODATA, for example? With your nifty resolvers you are just overcomplicating your system and make them dead slow, and enormously hard to maintain or debug, especially with typeless languages like JavaScript, Python.
If you want a flexible query engine, simply receive SQL queries from your clients and return back JSON structures, dead easy and the most performant solution, the simplest to maintain and debug. No-SQL databases are anyway totally unusable for complex business database systems/applications.
I think Facebook does quite wel using GraphQL
@@iAmCracky Facebook has 8500 programmers to maintain the system. Facebook is not a business application, it is a good business, but it is an entertaining system and it is dead simple vs a manufacturing or ERP system.
@@miklosnemeth8566 If Facebook is dead simple, they wouldn't need 8500 programmers.
Does any of these techniques work in 3+ step queries?
user {
post {
comments { }
}
}
Very well explained
It's great, Thanks! 👍
Only in Javascript would 5 months be considered dead.
True
Great video. Thanks
If you watching this in 2022, you can use prisma and graphQL using nexus with apollo server. Done! 🙂
exactly that's what I use, no more n+1 issue only the data I need
somehow I only see cons of graphql..
Great video Ben! Please consider a video on Apollo federation. Thanks!
I'm a bit skeptical th-cam.com/video/j7LAhP608RU/w-d-xo.html
Hi, i want to use dataloader with mongoose, i am using this www.instapainting.com/blog/use-dataloader-to-batch-and-cache-arbitrary-mongodb-queries examle, it works, but how can i use projection.is there any easy solution which i didn't know
Thanks buddy ❤
Awesome! Thank you!
I think you had 'Books' & 'Authors' mixed up, but the rest was very good :p
It is pretty cool, to take a look how the DL is implemented using some of nodejs(event loop) quirks -> github.com/graphql/dataloader
Be careful with dataloader. As with any caching mechanism, your implementation should NOT rely on some cache representation. That's exactly how to get stale reads and buggy logic
There are only two hard things in Computer Science: cache invalidation and naming things.
-- Phil Karlton
[And also a favorite of Martin Fowler].
Hello... Can you make a video upload file on Nest and Graphql? thank you
haven't tried uploading files with Nest, but when I do, sure
@@bawad Oh, okay. Thank you, I will always wait for the good news ... :)
I think uploading in Nest should be similar to how you do it in apollo since nest-graphql is just a wrapper around apollo-server. I haven't tried it but I think the pain in uploading is usually in encoding a multipart request and you need to put your upload type in your input as a top-level parameter if it's nested inside another object it never works at least in my experience with apollo-express-server. so your mutation should be like the update profile example below
type Mutation {
updateProfile(input: ProfileUpdateInput, avatorFile: Upload, coverFile: Upload): ProfilePayload!
}
notice the files with type 'Upload' they are top-level if you embed them inside another structure they don't work at least on apollo-express-server.
I hope it helps
@@ebundala Ok, thank you. I will try it. Then what about the resolver? is it using Interceptor like in Rest? can you write the function?
I'm sorry. I just used Nest
@@ebundala And should I use a store like this? stackoverflow.com/questions/49034156/nestjs-upload-using-graphql
A kitten dies every time an open 'where in (.....) is used' on a relational DB.
If you select always the fields you want to return, the DB will not do the Join if is not needed. So, you don't need to do anything on the code. DB's are clever enough to not to do the join if you are not returning any value from the right side.
Always use Joins, even for 1 result. Don't be stupid. Sincerely, your DBA.
> A kitten dies every time an open 'where in (.....) is used' on a relational DB.
Is that slower than a query with a join?
I assumed the performance profile is something like:
select * from users u inner join articles a on u.id = a.userId; => 10 ms
select * from articles where id (1, 2, 3); => 10 ms worst case
> If you select always the fields you want to return, the DB will not do the Join if is not needed
Oh, good to know, thanks!
@@bawad Roundtrips are not the only metering to consider here. You have to consider also the work on the DB.Join is the best, because it's all an internal DB issue. When having a Join, DB just have to play with indexes to get the right records, and then return the right data once all is resolved. That's very quick (Playing with indexes is very quick, the best O() value you can achieve on a DB).
The IN clause is very unoptimizable..., the DB can do either a select for each id internally, and then return the results, or if there are more than certain amount, it just selects from lower to bigger and filters the results, which can be even worse. So at best, you are just repeating the individual selects, just on the DB side (It can last a lot of time, so you will have a single roundtrip but much much slower) and at worst, when you have 100 or more ids inside the IN, the DB may do a select of thousands of hundreds or records and then loop while filtering.
Thanks for not shouting.
I see VIM keys, I upvote
looks like you haven't sleep well in a while
yeah my eyes usually look like that even when I sleep +9 hours
@@bawad It's like you've been sleeping less for months.
Hi, Ben!
I wrote some code about preparing graphql query to DB query (3 level depth query makes 1 DB query with fields you exactly want to have) and wrote post for it:
could you tell your opinion please, thanks:
post: rootcomponent.com
github: github.com/AleksandrNi/apollographql-query-compiler
So much extra stuff to avoid writing a join.
😻😻😻😻
Thanks
Thank you, I learn a lot from your videos. Looking forward to some videos and tips regarding Apollo type generation. `apollo-codegen`
AMazing bro!!!!