Great video - Kudos to the interviewer for making the environment so comfortable. Few things: 1. I could not understand the choice of data base. Ideally this should be a combination of TimeSeries DB + Data warehouse? 2. Certain key components/Aspects like rule engine ( for acting on events), Notification systems ( for notifying the interested subscribers) were missing? 3. 90 days retention is a very less SLA. Down-sampling the data for lowering the volume and storing it for long term could have been discussed. 4. I thought that the interviewer wanted to go beyond just visualization - To automated actions ( alarms etc.) and analytics too.
Good Observation. Felt like this interview is going in the wrong direction the moment a NoSQL DB was chosen. I would say time series DB or an Elastic search system would have been a good choice. The key takeaway for me was how well Hozefa communicated his thoughts and solutions. Very good communicatorr.
I have a suggestion for all videos under this platform First they are super helpful when it comes how to carry out the whole design..like how to estimate or begin with But all of them lack cross questions from interviewee side..I mean in real life we can be bombarded with minute detail level questions Like in this video...how is data enrichment working or how come we are making data collection to be configurable without having whole business use cases Bottomline : make it more tougher :)
Heard the interviewer made mention as part of the requirements that system could scale up to a billion users meaning events could be at least double of that (depending on the metrics that want to be tracked). I think in that case maybe a nosql wouldn't be the best persistent data storage decision. Maybe an OLAP kind of database (like clickhouse) should be used. This will definitely have a drastic positive impact in the query time for both visualization (retrieving) and inserting of events and will also help in creating way faster aggregates. Also another improvement that can be made with the design is maybe the queue can come after the validation/scrubbing service and not before. It could help save some space in the queues and not have them overwhelmed because only validated data will get into the queue and invalidated ones are discarded . Only when I see the queue should come before is if we are validating a very large batch of payload at a time then maybe we can stick with this design because validation might take some time for extremely large batches.
It will be beneficial to use a queue in case clients are the trusted ones and validation is not required or minimal. This applies to internal services, when you are building an infrastructure solution for internal usage within the company.
i would probably write all non-real time events straight to a data lake with high throughput. And later ingesting those using distributed data processing platforms like spark or Hadoop. Only gold or silver state data should be stored in DB for analytical purposes.
@@opencompare This exactly. a distributed schemaless db would require hundreds of instances just to handle the writes. to query this much data would take a lot of cpu and time. so, query the whole thing once per hour/day/etc and aggregate it into many tables of a relational db where the aggregates can be queried quickly/cached.
Not satisfied with the discussion. Scope of the question is not clear. Are we building a system for analytics computed out of logging data or are we building a system that has logging and analytics as separate components? Interviewee could have discussed about: 1. Grain of data that is sent by logging system. Is it individual events or aggregated counts? 2. Database design that is optimized for analyzing time-series data 3. Could have expanded machine-generated events and user-generated events and have different treatments on those datasets down the line.
Question to people proficient in designing backend systems - is this a good example of an interview or the design? I personally found this to be crossing out checkboxes in an interview. There isn't enough trade off discussions or building towards a solution. This seems like an inconsistent brain dump of a known solution.
I agree. Interviewer did a great job (real interviews wouldn't ask this broad of a question tbh), but i doubt this candidate would make it to next level. If the question was specifically about real-time user events, the answer might pass. but this is not a valid solution for big data. actual solution requires many services and multiple databases to aggregate the data for various use cases. Not one giant db which handles all writes and all queries. storage is cheap, so a solution with a single db doesn't really make sense for big data/analytics
from my experience this is not a real world interview. This was only drawing circles and rectangles without talking about data model, time series database, database schema, failure detection, monitoring. The candidate would be bombarded with questions right away. This is a "nice" design to draw but that doesn't take you through onsite with any serious company.
This interview is too short so there is not enough time to talk about some of the details. The design interview should be at least 40 minutes. The candidate only had 21 minutes. There's not enough time to do deep dive and it seemed rushed. There is not time to talk about scaling the individual components. Sampling is not scaling. The interview should be longer so that the candidate can talk about how many servers are needed, how much disk space required for X number of years/month, how many requests can be served per second, etc. The requirements list is too short. I feel we didn't spend enough time on the requirements. How would you determine the level of the candidate based on this interview performance?
Feedback: 1. should have talked more details about data storage and how the storage would support faster queries. Some sample queries as example must be shown and these queries are served. 2. No mention of how logs are stored and indexed for faster search. 3. Didn't justify the usage of queue?
very good... both interviewer and interviewee did excellent job... lot to learn from this video... i have an interview tomorrow with amazon, hope this helps....
Load balancer is redundant if you're using a queue. Events should be published to the queue right away and available consumers (validation service) will handle events as they become available.
I believe you dont expose the queue directly and it has to sit behind a service which actually pushes the data onto the queue. And since this service, needs to scale up and down, we should need LBs in front of front end servers.
Yup exposing an implementation detail like queue directly to the client will hurt the system in the long term when there comes a requirement to modify the design.
You shouldn't expose the queue directly to the event payload also it will help with "Load balancing"😃 especially for the scale the interviewer mentioned (definitely should have multiple queues ) .
Queues usually benefits of having fast protocols like TCP and UDP (in case you don't care about data loss), exposing these protocols to the end user is not safety.
whats the process of archiving looks like ? how/who gonna move data from main db to archive db and what would happen to precomputed visualisation data ?
The design is a little superficial . In the context of monitoring systems, the crucial 'dive deep' question pertains to data aggregation and the trade-offs between storage capacity and performance. The real world monitor system like cloudwatch and prometheus (push vs pull) have be mentioned during interview as well.
Why was visualization a big piece of the discussion. Design was metrics and logging, which lacked depth. It's whole blob of logging data coming, could be stored in timeseries DB or even object store like S3 then moved to DW like Redshift. Why NOSql DB needed in this case.
not sure where you get the idea that it is a whole blob? imagine a webapp: you would want to be logging individual events so that if browser is closed you don't lose any. s3 would work but is not best choice: imagine having a lambda for every single user event that wrote to s3
Low latency as a NFR didn't make sense to me. Nothing on priority or transactional data like money is involved. This is something passive and it will be used later to make business decisions
17:10 WOW. I mean there were nit picks before this point but this is a big NO. The analytics platform HAS to save each and every event no matter what. It doesn't matter if this is being used by 1 user or trillions users, you have to store each and every event. The response to the scale problem, would be to scale out queue and ingestion service as the number of event increases.
@@joed5714 Actually its not. We heavily use sampling to keep up with upstream and its a standard practice when it comes to exorbitantly high (like 10000+ B events per second). We have a data pipeline to ingest packet header from all the routers. A router can process 5 Gbps and there are 1000+ routers and it is impossible to ingest all those events without sampling. Ofcourse unless you provision 10000+ 32 core instances
I think whether it is a BIG NO or absolutely YES is to be decided on use case. Depend on the metrics and the purpose we collect this data it might not necessary to collect metrics from every use. Sampling important statistical method that gives expected results with out going through each and every input. I know we have tools/methods and frameworks to be able collect each input with out a miss, but again do we need to do this or not has to be decided first otherwise you are jumping in to a solve a problem which doesn't exist.
Stopped the video at the same timestamp to process what was said :\ Agree that it's a big NO. For example, sampling user conversions for ads analytics is not acceptable.
Don't leave your system design interview to chance. Sign up for Exponent's system design interview course today: bit.ly/3K0lTtS
for the system needs to be in near real time, this is a clarify question to ask the interviewer instead of assuming.
Great video - Kudos to the interviewer for making the environment so comfortable. Few things:
1. I could not understand the choice of data base. Ideally this should be a combination of TimeSeries DB + Data warehouse?
2. Certain key components/Aspects like rule engine ( for acting on events), Notification systems ( for notifying the interested subscribers) were missing?
3. 90 days retention is a very less SLA. Down-sampling the data for lowering the volume and storing it for long term could have been discussed.
4. I thought that the interviewer wanted to go beyond just visualization - To automated actions ( alarms etc.) and analytics too.
Good Observation. Felt like this interview is going in the wrong direction the moment a NoSQL DB was chosen.
I would say time series DB or an Elastic search system would have been a good choice.
The key takeaway for me was how well Hozefa communicated his thoughts and solutions. Very good communicatorr.
This interview is simply too short. At least need another 10 minutes for the design discussion.
I have a suggestion for all videos under this platform
First they are super helpful when it comes how to carry out the whole design..like how to estimate or begin with
But all of them lack cross questions from interviewee side..I mean in real life we can be bombarded with minute detail level questions
Like in this video...how is data enrichment working or how come we are making data collection to be configurable without having whole business use cases
Bottomline : make it more tougher :)
Candidate doing what they showed in video would get downleveled in best case scenario 😅
Metrics and logging sound like two “separate” design questions 🤷🏻♂️
Heard the interviewer made mention as part of the requirements that system could scale up to a billion users meaning events could be at least double of that (depending on the metrics that want to be tracked). I think in that case maybe a nosql wouldn't be the best persistent data storage decision. Maybe an OLAP kind of database (like clickhouse) should be used. This will definitely have a drastic positive impact in the query time for both visualization (retrieving) and inserting of events and will also help in creating way faster aggregates.
Also another improvement that can be made with the design is maybe the queue can come after the validation/scrubbing service and not before. It could help save some space in the queues and not have them overwhelmed because only validated data will get into the queue and invalidated ones are discarded . Only when I see the queue should come before is if we are validating a very large batch of payload at a time then maybe we can stick with this design because validation might take some time for extremely large batches.
It will be beneficial to use a queue in case clients are the trusted ones and validation is not required or minimal. This applies to internal services, when you are building an infrastructure solution for internal usage within the company.
i would probably write all non-real time events straight to a data lake with high throughput. And later ingesting those using distributed data processing platforms like spark or Hadoop. Only gold or silver state data should be stored in DB for analytical purposes.
@@opencompare This exactly. a distributed schemaless db would require hundreds of instances just to handle the writes. to query this much data would take a lot of cpu and time. so, query the whole thing once per hour/day/etc and aggregate it into many tables of a relational db where the aggregates can be queried quickly/cached.
Great video. Very natural and realistic, not like the rehearsed and phony ones like many other videos on YT.
Event Sourcing and projections for visualization would have been amazing here
Introducing a Data Catalog would really help with managing PII and auditing where and how sensitive data is being used through data lineage.
I am a system design interviewer and a hiring manager and I will probably give him a NO.
Too many application level assuming, too few hard core technical details.
And why does he talk about cache at all? No cache needed in this system
Not satisfied with the discussion. Scope of the question is not clear. Are we building a system for analytics computed out of logging data or are we building a system that has logging and analytics as separate components?
Interviewee could have discussed about:
1. Grain of data that is sent by logging system. Is it individual events or aggregated counts?
2. Database design that is optimized for analyzing time-series data
3. Could have expanded machine-generated events and user-generated events and have different treatments on those datasets down the line.
Question to people proficient in designing backend systems - is this a good example of an interview or the design? I personally found this to be crossing out checkboxes in an interview. There isn't enough trade off discussions or building towards a solution. This seems like an inconsistent brain dump of a known solution.
I agree. Interviewer did a great job (real interviews wouldn't ask this broad of a question tbh), but i doubt this candidate would make it to next level. If the question was specifically about real-time user events, the answer might pass. but this is not a valid solution for big data. actual solution requires many services and multiple databases to aggregate the data for various use cases. Not one giant db which handles all writes and all queries. storage is cheap, so a solution with a single db doesn't really make sense for big data/analytics
from my experience this is not a real world interview. This was only drawing circles and rectangles without talking about data model, time series database, database schema, failure detection, monitoring. The candidate would be bombarded with questions right away. This is a "nice" design to draw but that doesn't take you through onsite with any serious company.
How can you stay focused during the interview when the interviewer is so attractive?
This interview is too short so there is not enough time to talk about some of the details. The design interview should be at least 40 minutes. The candidate only had 21 minutes. There's not enough time to do deep dive and it seemed rushed. There is not time to talk about scaling the individual components. Sampling is not scaling. The interview should be longer so that the candidate can talk about how many servers are needed, how much disk space required for X number of years/month, how many requests can be served per second, etc.
The requirements list is too short. I feel we didn't spend enough time on the requirements.
How would you determine the level of the candidate based on this interview performance?
This guy interviewed himself
Feedback:
1. should have talked more details about data storage and how the storage would support faster queries. Some sample queries as example must be shown and these queries are served.
2. No mention of how logs are stored and indexed for faster search.
3. Didn't justify the usage of queue?
Hey Rahul, thanks for watching and leaving your feedback! Appreciate it!
Ok good interview with Imran Hashmi :p
I think this interview will not fly. lots of flaws
very good... both interviewer and interviewee did excellent job... lot to learn from this video... i have an interview tomorrow with amazon, hope this helps....
how did it go ?, i have mine in second week of jan..
@@designpathy Good luck y'all. Can I chat with you I have one coming up soon.
@@designpathy how did your went
Its surprising that there was no discussion on OLAP storage solutions,since we will be analysing these metrics as end product
Not in depth at all
I think for time series data we should be using RDBMS with Sharding or even better have the graph being generated from In memory DB.
Load balancer is redundant if you're using a queue. Events should be published to the queue right away and available consumers (validation service) will handle events as they become available.
I believe you dont expose the queue directly and it has to sit behind a service which actually pushes the data onto the queue. And since this service, needs to scale up and down, we should need LBs in front of front end servers.
Yup exposing an implementation detail like queue directly to the client will hurt the system in the long term when there comes a requirement to modify the design.
You shouldn't expose the queue directly to the event payload also it will help with "Load balancing"😃 especially for the scale the interviewer mentioned (definitely should have multiple queues ) .
wrong.
Queues usually benefits of having fast protocols like TCP and UDP (in case you don't care about data loss), exposing these protocols to the end user is not safety.
whats the process of archiving looks like ? how/who gonna move data from main db to archive db and what would happen to precomputed visualisation data ?
The design is a little superficial . In the context of monitoring systems, the crucial 'dive deep' question pertains to data aggregation and the trade-offs between storage capacity and performance.
The real world monitor system like cloudwatch and prometheus (push vs pull) have be mentioned during interview as well.
dear team, please provide the name of the tool that the user is using for drawing the architecture
I think it is Whimsical
@@leandrovieira2981 Thanks Bro !!
Hozefa is a beast!!!!
NoSQL DB for a time series data. What a Joke!!! Can't believe FB EM giving this sort of design
This could be a case for Kafka for message processing queue with event driven API in mind..
chup
Small correction at 6:00:
For money/banking system, consistency should be more prioritized over availability.
Why was visualization a big piece of the discussion. Design was metrics and logging, which lacked depth. It's whole blob of logging data coming, could be stored in timeseries DB or even object store like S3 then moved to DW like Redshift. Why NOSql DB needed in this case.
not sure where you get the idea that it is a whole blob? imagine a webapp: you would want to be logging individual events so that if browser is closed you don't lose any. s3 would work but is not best choice: imagine having a lambda for every single user event that wrote to s3
Low latency as a NFR didn't make sense to me. Nothing on priority or transactional data like money is involved. This is something passive and it will be used later to make business decisions
This interview gonna fail, bad example
Why? Can you please explain?
This interview went well imo. The system he described is what we use in my org. They took 3 years to develop but our boss designed it in 20 mins.
may I know the tool name they used?
Can a load balancer directly insert to a queue?
Yeah good point..isnt LB by defailt part of MQ
I mean number of partitions or consumers can do the same thing
does meta ask system design question to sde1 role
Hey AnushkaVijay-cv7tk! Typically SDE1 candidates will not be asked system design questions
uhm...uhm...uhm...uhm...uhm...
....................................................is it how it works in real system???????
"What if..." that's what happens all the time in reality.
18:32 this question had no clear answer given
Nice:)
This is not a great solution. It leaves a lot of technical bits, unnecessarily assumes a lot of things. Not the right way.
Not very successful interview
17:10
WOW. I mean there were nit picks before this point but this is a big NO. The analytics platform HAS to save each and every event no matter what. It doesn't matter if this is being used by 1 user or trillions users, you have to store each and every event. The response to the scale problem, would be to scale out queue and ingestion service as the number of event increases.
@@joed5714 Actually its not. We heavily use sampling to keep up with upstream and its a standard practice when it comes to exorbitantly high (like 10000+ B events per second). We have a data pipeline to ingest packet header from all the routers. A router can process 5 Gbps and there are 1000+ routers and it is impossible to ingest all those events without sampling. Ofcourse unless you provision 10000+ 32 core instances
I think whether it is a BIG NO or absolutely YES is to be decided on use case. Depend on the metrics and the purpose we collect this data it might not necessary to collect metrics from every use. Sampling important statistical method that gives expected results with out going through each and every input. I know we have tools/methods and frameworks to be able collect each input with out a miss, but again do we need to do this or not has to be decided first otherwise you are jumping in to a solve a problem which doesn't exist.
Sampling is done by all the major tech players for large applications. It is completely valid to suggest.
Stopped the video at the same timestamp to process what was said :\ Agree that it's a big NO. For example, sampling user conversions for ads analytics is not acceptable.
How is this guy a manager. Probably will fail intern interviews.
Who is the interviewer pls include her LinkedIn ID
yikes