Twitter System Design | System Design Interview Question

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 พ.ค. 2024
  • This is a solution for System Design Interview Question where you need to design a Social Network like Twitter.
    This is one of the most common Design Interview questions of all time.
    Recommended Videos:
    How to to select the right Database for a Large Scale System: • Database Design Tips |...
    Facebook System Design: • Facebook System Design...
    Netflix System Design: • Netflix System Design ...
    TinyURL System Design: • TinyURL System Design ...
    Summary of this video: www.codekarle.com/system-desi...
    Architecture diagram: github.com/codekarle/system-d...
    Author: / sandeep1904
    If you like this video, please help us grow by sharing this video with your friends on Facebook, connections on LinkedIn and anyone who can benefit from this.
    PS: This is not the real architecture of any such platform. This is my take on how I would answer that problem.
    #codekarle #systemdesign #twittersystemdesign #system #design #interview #tips #faang #twitter #twittersystemdesign

ความคิดเห็น • 166

  • @spacemonkey9080
    @spacemonkey9080 3 ปีที่แล้ว +99

    You are soon to become everyone's go-to system design mentor...keep up the good work

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +12

      It's great to hear this!! Thanks for the nice words!
      Do share the channel with your colleagues, it helps :)

    • @zombiecrab0
      @zombiecrab0 3 ปีที่แล้ว +5

      @@codeKarle This is easily the best system design channel on TH-cam. I'd pay to get such quality content. Haven't seen new videos from you in a while. I understand the effort that goes behind making a video, but please keep them coming.

    • @abhirb12
      @abhirb12 3 ปีที่แล้ว +2

      He has disappeared from the scene. codeKarle is easily the best ever with crystal clear explanation and with very strong fundamentals. Undoubtedly is the best!

    • @VinayKumar-ze2ww
      @VinayKumar-ze2ww ปีที่แล้ว

      @@zombiecrab0 I saw his course on udemy

  • @alpsagarwal
    @alpsagarwal ปีที่แล้ว +6

    Thanks, Sandeep, for the great content. I cleared a FAANG interview by watching your videos for the role of TPM. Cheers!!

  • @lalitvij1516
    @lalitvij1516 3 ปีที่แล้ว +20

    great explanation, even better than paid system design resources.

  • @poornimamuthukumar9397
    @poornimamuthukumar9397 3 ปีที่แล้ว +18

    Thanks for the great content! Love the way you explain concepts in great detail and have a consistent approach across all designs. Some ideas for future videos that we are looking forward to -
    1. Designing google drive
    2. Designing logging and alerting system
    3. Designing deployment service
    4. Designing shared docs like google docs.

  • @piyushasutkar5423
    @piyushasutkar5423 2 ปีที่แล้ว +10

    This is the best end-to-end flow I have seen for twitter system design.
    It would be great if you can have separate videos for scaling and load balancing redis, kafka, Cassandra and point to those like you did for urlsortener, asset service, etc !!
    Overall, thanks a lot :)

  • @manveersingh5822
    @manveersingh5822 2 ปีที่แล้ว +4

    Sir, where are you nowadays .. you are a gem sir. Please keep posting for people like me to excel in life! Thanks for your content and my contentment :)

  • @RicardoBuquet
    @RicardoBuquet 8 หลายเดือนก่อน +2

    Great material man. I have worked for 11 years at Groupon where we basically used very single of the technologies you are describing. All the information you share is GOLD for new and experienced engineer.s Keep it up. I have seen 2 of your videos so far and I'm already convinced that this is one of the best channels out there.

  • @bhavulgauri7832
    @bhavulgauri7832 3 ปีที่แล้ว +4

    I hope you get to a million subs soon man. This is amazing, just amazing!

  • @yagyanshbhatia5045
    @yagyanshbhatia5045 2 ปีที่แล้ว

    This is such a great channel. So glad to have found this!

  • @minionburns
    @minionburns 3 ปีที่แล้ว

    bhai! just wanted to say, you are making me watch all your videos! its just amazing! Keep posting more videos please

  • @supunwijerathne4259
    @supunwijerathne4259 ปีที่แล้ว

    Top Quality Content. Covers both breadth and depth keeping the simplicity.

  • @godolsss2139
    @godolsss2139 3 ปีที่แล้ว +6

    awesome explanation from Initial requirement gathering to deep design. I love your videos man. you deserve more subs/views.

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +1

      Thanks!! Just getting started, we'll get there soon :)
      Do share the videos with your friends/colleagues to help to get there sooner :)

    • @abhirb12
      @abhirb12 3 ปีที่แล้ว

      He has disappeared from the scene. He is easily the best ever with crystal clear explanation and with very strong fundamentals. Undoubtedly is the best!

  • @namansharma8701
    @namansharma8701 3 ปีที่แล้ว +1

    Great Content, Everything explained with such simplicity. It's going to help everyone. Keep posting

  • @trendingbuzz9974
    @trendingbuzz9974 2 ปีที่แล้ว

    Such simple and perfect explanation - Loved it

  • @pb9308
    @pb9308 2 ปีที่แล้ว +6

    @codeKarle This content is easy to understand but it goes into great depth and talks about shortcomings and ways to address them to maintain scale/distributed nature. I am amazed at how easily the content is presented. It's very high quality indeed. Please put out more content!. One suggestion if I may mention - please spend couple of minutes for scale (network requests per min/hr/day/month, DB storage needed per week/month/year, Redis size considerations) as well a very hight level DB schema design at least for important tables. This will help us a ton. Kudos to your efforts!

  • @karanarora8088
    @karanarora8088 3 ปีที่แล้ว

    I am so happy that I found this channel. Simply the best.
    This is my first comment ever on youtube.

  • @ersinerdem7285
    @ersinerdem7285 3 ปีที่แล้ว

    At first I was about to dismiss ir bcz of the sound, then the diagrams caught my eye and gave it a chance. Now I consider myself lucky to find this channel 😃

  • @rupambasu9722
    @rupambasu9722 3 ปีที่แล้ว

    Thanks. Awesome explanation. Loved your way of teaching

  • @abhirb12
    @abhirb12 3 ปีที่แล้ว

    He has disappeared from the scene. codeKarle is easily the best ever with crystal clear explanation and with very strong fundamentals. Undoubtedly is the best!

  • @vchandra6315
    @vchandra6315 3 ปีที่แล้ว +1

    Hi - Your System Design Videos are awesome!! Good Job! You have covered all the core points(scaling issues, mitigation, etc). By the way, Could you Please post one or two videos on How to distinguish between System Design vs Object oriented design questions? for e.g, Can we apply the same technique for questions like Design an Elevator or Design a parking lot system, etc. Appreciate your help! Thanks again!

  • @sanjayts
    @sanjayts 2 ปีที่แล้ว +3

    This is awesome. Something worth pointing out that Redis becomes a SPOF in this design -- we are relying too heavily on our Redis instance for the timeline cache and when it goes down we will end up thrashing the DB since no specific persistent mapping in available in Cassandra to handle the timeline feature. One suggestion would be to have a separate timeline table in Cassandra partitioned by the userid and sorted by tweet timestamp (desc) and tweet_id as the data field. So something like (user_id, tweet_ts, tweet_id). This table will be populated in an async manner whenever a new tweet happens (quite possibly as a new service which listens to the Kafka tweet stream).
    In this case, timeline request becomes a "top K" read of this table for a given user_id followed by a bulk tweet GET API for the list of tweet ids retrieved.

    • @aiml84
      @aiml84 ปีที่แล้ว +1

      Cassandra is really bad at aggregation

  • @muralij6078
    @muralij6078 2 ปีที่แล้ว

    Sandeep - Kudos to you for very clear explanation. I guess for creating timeline for user who follow famous/Hot users, you were describing the 'pull' approach as opposed to the 'push' which can be used for regular users.

  • @SwapnilSuhane
    @SwapnilSuhane 2 ปีที่แล้ว

    Ek Number !! you are making a big impact to all learners... keep up the great work 👍

  • @bhaskarsharan4280
    @bhaskarsharan4280 3 หลายเดือนก่อน

    So basically timeline creation is
    Active users : using redis
    Live users : immediately send via WS
    passive users : create when they come online
    when celebs tweets, normal users : using a pull based approach + updating in redis
    when celebs tweets, other celebs : if they are live sent over WS, else update timeline in redis

  • @free2rythm100
    @free2rythm100 2 ปีที่แล้ว

    you make it really easy for me , and your explanation is really great.

  • @03pampa03
    @03pampa03 3 ปีที่แล้ว +1

    Amazing video. Very well explained. Thanks for posting a ton of great content Sandeep!

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว

      Thanks Nishant!

  • @user-kl3be6on2d
    @user-kl3be6on2d 8 หลายเดือนก่อน

    great video. one of the few good ones on TH-cam

  • @PrashantAgrawal-ox3yx
    @PrashantAgrawal-ox3yx หลายเดือนก่อน

    Thanks for the awesome content and explanation. Your contents are always awesome!! It would be great if you could guide a little bit about high level schema design and api design as well. It will help get the visualisation of how to store and access the data.

  • @RishinderRana
    @RishinderRana 2 หลายเดือนก่อน

    Everytime before a tech interview I look for your videos and since you're not uploading a lot recently, youtube makes it really hard to dig those out. This tells me something about the work life balance you have right now, lol.

    • @CaffeinatedVagabonds
      @CaffeinatedVagabonds หลายเดือนก่อน

      His current company policy doesn't allow to make videos on TH-cam. It is against policies of many companies

  • @jivanmainali1742
    @jivanmainali1742 3 ปีที่แล้ว +2

    I really enjoy a lot watching your videos. Its meditation to me.

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +1

      That's great to hear!! Thanks for the kind words :)

  • @sammyshm
    @sammyshm 2 ปีที่แล้ว

    Thanks for the great content. Please keep on posting newer topics.

  • @sandipbhaumik
    @sandipbhaumik 3 ปีที่แล้ว

    While describing, I felt the sound quality can be improved. When you are asking about feedback and comments at the last of the video, it is very clear.

  • @vivekbk7561
    @vivekbk7561 3 ปีที่แล้ว +1

    Great Video. I have a question - "Tweet processor puts back tweets for live users into kafka to be pickeup by notificaion service" - here we have to maintain seperate topic or partition for each user right? wont it be too many topics/partitons?

  • @JB-lh4db
    @JB-lh4db 3 ปีที่แล้ว

    Thanks for posting these informational videos. It seems like the subtitles for the twitter design video are out of sync roughly between the 30 and 40 min mark. Good luck with your future videos.

  • @mohammedsardar3779
    @mohammedsardar3779 ปีที่แล้ว

    Thanks for your time sharing the knowledge.

  • @mrabbas9
    @mrabbas9 ปีที่แล้ว +1

    No one can draw and explain these in 45 minutes, while also constantly being interrupted by the interviewer's questions. Through these videos you are putting wrong expectations in interviewers head :)

  • @sushmitagoswami7320
    @sushmitagoswami7320 2 ปีที่แล้ว

    The best explanation of twitter system design.

  • @kmurali08
    @kmurali08 2 ปีที่แล้ว

    Very nicely articulated system design video. 👍🏻👍🏻

  • @sumitsach1
    @sumitsach1 2 ปีที่แล้ว +1

    Informative video. It would be helpful if you had covered how user followers/graph db would be sharded.

  • @codediva007
    @codediva007 20 วันที่ผ่านมา

    nice content. I am preparing for sde2, will update the results here. ty.

  • @dashmeetkaur5914
    @dashmeetkaur5914 2 ปีที่แล้ว

    Excellent! Thank you Sandeep!!

  • @ManasKumarPanigrahi
    @ManasKumarPanigrahi 2 ปีที่แล้ว

    Great Lesson sir. Can we have a learning session on how to start implementing these, what all needs to be considered etc. etc.

  • @jovzfondevilla895
    @jovzfondevilla895 3 ปีที่แล้ว +3

    You deserve million views!

  • @kumarmanish9046
    @kumarmanish9046 2 ปีที่แล้ว +1

    13:00 why is graph service not using neo4j etc i.e. noSQL graph db instead is simply using relational mysql ?

  • @rohit-ld6fc
    @rohit-ld6fc ปีที่แล้ว +1

    overall great video!! but we need more information about sharding. other system designs have used a same db for user + tweets to make things easy..

  • @animeshsharma902
    @animeshsharma902 2 ปีที่แล้ว

    Thanks for the great content! Love the way you explain concepts in great detail . But I have a query that user service and graph service have the different database. then how user are linked with follower user. This is many to many kind relation. Are we storing the user detail Graph service DB.

  • @cbest3678
    @cbest3678 3 ปีที่แล้ว +1

    Thanks for the great content.. wondering how we will mark the user active, inactive and passive .. which service will do that .. I beleive some worker. Is it something based on last timestamp that user has logged in and twiter service will check if that user is active or ?passive .... also how tweet processor will get the data about the user that its live?

  • @talivanov93
    @talivanov93 3 ปีที่แล้ว +1

    Great video, great channel. Thank you agian!

  • @77loutube
    @77loutube 2 ปีที่แล้ว

    Really like the architecture you make with Kafka at the heart of them. Have you used ksqldb? Do you bring it in your architecture?

  • @NoWarForever
    @NoWarForever 2 ปีที่แล้ว +1

    Why don't use Kafka as source of truth and add a kafka connector that reads from kafka and publishes to Cassandra? Otherwise, we would need some reconciliation service between Cassandra and Kafka to make sure that both datasources are aligned or we need to make sure that both Cassandra and Kafka are available and the write was ACKed in the injection service by both Cassandra and Kafka.
    I see that there is a lot of REDIS but it would make more sense to have always KAFKA as a broker and add connectors that can write to REDIS? Maybe not in all use cases but most of the ones are not latency sensitive.

  • @anchalsrivastava3733
    @anchalsrivastava3733 ปีที่แล้ว

    I would like if you mention what an interviewer can ask or how he can cross question. Some mock interviews will also help. It looks easy when you are not cross-questioned and going in your own way.

  • @meenalgoyal8933
    @meenalgoyal8933 3 หลายเดือนก่อน

    Thanks for the content! One question, you mentioned that you only cache the feed for active users. So as part of tweet processor service, where you are creating the cache entry for user's feed, how do you identify if the followers of a given user is an active user?

  • @YashRaithatha1989
    @YashRaithatha1989 3 ปีที่แล้ว

    Thanks for creating such awesome content.
    I have a query. How do we handle a case when the Tweet Ingestion Service is able to persist Tweet info in Cassandra but unable to persist in Kafka ( due to system crash or some network failure) or vice versa ?

  • @narendrapatel4955
    @narendrapatel4955 3 ปีที่แล้ว

    Excellent video, keep it up!

  • @rajeshg3570
    @rajeshg3570 ปีที่แล้ว

    This is the most detailed and best videos for twitter system design. I've couple of questions/comments here .. How about integration with S3 for storing any images or videos and using something like CDN to speed up the content delivery like videos.. ? Do you think they can be part of this system ? Please clarify.

  • @elachichai
    @elachichai 3 ปีที่แล้ว +1

    11:35 Fundamental question: How do you know if your cache is reliable and up to date? We have eviction policy alright, will cache entry be cleared out for a particular user with any write/update to User DB? Thats the only way cache can go out of sync?

  • @zuowang5185
    @zuowang5185 4 หลายเดือนก่อน

    do you actually need the graph db, or just keeping a Uid:follower db and a Uid:following db would surffice?

  • @varoon5413
    @varoon5413 3 ปีที่แล้ว

    Hi sir, loved your explanation. Keep up the great work, will check out your other videos too..
    Could you suggest me a good book that gives me an intuition into designing such data systems? Thank you :)

  • @harmishonline
    @harmishonline 3 ปีที่แล้ว

    Details kicked ass! Awesome...

  • @vinodnagar-nj9sf
    @vinodnagar-nj9sf ปีที่แล้ว

    its very lucid to watch the videos however i suggest you stand at one of the corner of the board so that the whole board could be visible.

  • @jialieyan5263
    @jialieyan5263 ปีที่แล้ว

    Hey Sandeep, thanks for the video, it is really helpful. I have two questions
    1.Can I use DynamoDB instead of Cassandra as database of tweets?
    2. For streaming service, it use Hadoop, what data is stored in Hadoop, is that same data in Cassandra?
    Thanks

  • @sandeepkarambelkar3918
    @sandeepkarambelkar3918 3 ปีที่แล้ว

    Good explanation!!

  • @tejashcpatel
    @tejashcpatel 3 ปีที่แล้ว

    Great explanation. I love your videos. But one suggestion. Please use a proper microphone hanging on your neck area. The sound is not very crisp.

  • @niesh20us
    @niesh20us 2 ปีที่แล้ว

    Very nice content Sandeep. Any plans to make system design on document storage system like google doc, dropbox?

  • @successsaint5760
    @successsaint5760 ปีที่แล้ว

    Awesome content sir.
    Would you please create new videos for 'System Design for Proximity Server'?
    thanks.

  • @kunal4350
    @kunal4350 3 ปีที่แล้ว

    Thanks for video.
    Can you please make video for Designing distributed counter where it tells how many users have opened the website at any point of time ?

  • @chaquirbemat6263
    @chaquirbemat6263 2 ปีที่แล้ว

    thank u for the amazing content

  • @RK-xl2uq
    @RK-xl2uq 3 ปีที่แล้ว

    very detailed design, loved it.

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +1

      Thanks!! Glad that you liked it!

  • @abcd12272
    @abcd12272 3 ปีที่แล้ว

    What will be the key in redis when search service is using it? there can be many phrases that refer to the same search term

  • @rk2119
    @rk2119 3 ปีที่แล้ว +2

    V good
    But DB shud we not use NoSQL like cassandra

  • @anushree3744
    @anushree3744 2 ปีที่แล้ว

    Why do we need a separate service to GET the tweets? is it because there will be lots of GET requests for timeline generation?
    Thanks again Sandeep for your effort.

  • @pratyushsingh3967
    @pratyushsingh3967 5 หลายเดือนก่อน

    which graph service do you recommend using... Neo4j or something??

  • @lazypenguin3156
    @lazypenguin3156 2 ปีที่แล้ว

    For the search service, is there a way to filter based off of just the content from people the user follows?

  • @ayushagarwal4247
    @ayushagarwal4247 3 ปีที่แล้ว

    Great Stuff!!

  • @Asha-se4wv
    @Asha-se4wv 3 ปีที่แล้ว

    Great content videos. any plans to make videos on 1) BookMyShow 2) DropBox ?

  • @evangeloskostopoulos8173
    @evangeloskostopoulos8173 หลายเดือนก่อน

    why do we use a MySQL DB for the followers etc? isn't a KV type of storage better? or even some Graph DB?

  • @amitmandliya6577
    @amitmandliya6577 3 ปีที่แล้ว

    thank you!

  • @sirlener.r.magalhaes2630
    @sirlener.r.magalhaes2630 3 ปีที่แล้ว +1

    Hi! Your designs are very simple and clear. Congratulations! Just wondering... why don´t you use KSQL´s tables to keep the users and graph updated instead of REDIS, so when another microservice needs the information it only have to lookup at the KSQL view. For example, the tweet processor should access this information directly without have to ask to the Graph service.

    • @matthayden1979
      @matthayden1979 2 ปีที่แล้ว +1

      System design is very subjective. Actual implementation is very different from the way explained and much more complex technologies being used. Every day, tons of new tweets and tons of new users added globally. This is very high level design specification. You can have your own design which suits the purpose.

    • @hrishidypim
      @hrishidypim 2 ปีที่แล้ว +1

      Redis is cache, will be faster than limited memory mapped index file data in Kafka.

  • @andriidanylov9453
    @andriidanylov9453 ปีที่แล้ว

    Thanks
    😁👍

  • @Max-zf5ot
    @Max-zf5ot ปีที่แล้ว

    As per this design, wouldn't we violate our NFR of quick rendering of timeline for passive users?

  • @nishat2ahmad
    @nishat2ahmad 3 ปีที่แล้ว

    Hi @codeKarle,
    thanks for great content.
    I have few questions
    1) in timeline redis cache, we are storing only tweetID or content of tweet as well?
    2) delete/edit of tweet will also be going though same flow as create tweet?
    3) There will be upvote count for each tweet also, are we storing upvote count also in same timeline redis list? because this will change very frequently.

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +3

      1. Just the tweet Id. Tweet content can be fetched at runtime from a different cache which stores tweet Id-content mapping.
      2. Delete edits can be handled using the same flow, but I would rather just update or delete the tweet content and update the corresponding cache that stores tweet id-content mapping. When the timeline needs to be shown, the updated content would be shown to the users and if the tweet is deleted, it'll be ignored.
      3. Upvotes would be stores separately like you mentioned and the data would be merged later while the timeline is rendered.

  • @rohan8arora
    @rohan8arora 2 ปีที่แล้ว

    very hard to watch other system design videos now, these are pretty comprehensive videos.

  • @mehulparmar9976
    @mehulparmar9976 2 ปีที่แล้ว

    What should be the partition key in database for twitter?

  • @RajeshYadav-ly4gq
    @RajeshYadav-ly4gq 3 ปีที่แล้ว +1

    Great Job

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว

      Thanks!! Glad that it was helpful :)

  • @shrikanthnarayanan2
    @shrikanthnarayanan2 2 ปีที่แล้ว

    How come no one talks about delete-tweet flow?
    What happens to pre-computed timeline feeds? Would you delay timeline creation? or would you track which tweet is part of which pre-computed timeline?

  • @anonymousgod2006
    @anonymousgod2006 2 ปีที่แล้ว +1

    @CodeKarle Why are all your designs anti microservice architecture like User Service being single point of truth for every user and can be queried by other services whereas in Microservices, Each service does not call other service directly. This question is something I have been asked in my interviews, can you please let me know how to respond to such questions as I follow mostly your approach in interviews.

  • @raghugrinus4779
    @raghugrinus4779 11 หลายเดือนก่อน

    When a user post the tweet you are saving in Casandra. Can you please explain the strategy you are using to store the tweet for all the followers.

  • @pallavibansal84
    @pallavibansal84 3 ปีที่แล้ว +1

    Thanks for the video. Great detail and depth.
    I have a couple of questions ..
    What's asset service?
    Why do you need both, Tweet injestion service and tweet service. Can tweet service not write (and read) to Cassandra "and" send the tweet to Kafka? I'm wondering why do we need a separate service for only injestion ... More the services, higher the cost in terms of maintenance, deployment etc ..

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +6

      For Asset Service, refer to: th-cam.com/video/lYoSd2WCJTo/w-d-xo.html . It is basically a video serving platform for all the video/ image content, and you can think of it to be similar to TH-cam/ Netflix.
      The main reasons having two services were:
      1. Tweet injestion is something that runs relatively at a much lower scale than Tweet Service. So, a small spike in read traffic on Tweet Service, can potentially impact the injestion flow big time.
      2. The number of DCs in which Tweet Service would be present could be more than the number of DCs in which the Injestion service is present because Tweet Service is being called by tons of other components as well.
      3. There is a good probability that both these flows are maintained by different teams.
      You are right about the maintenance cost, that would increase, that's probably worth it. Deployment cost would roughly be similar, since at the very core, the number of CPUs and Cores would still be approximately same whether we have one service doing it all or two services if we have a good utilization of the hardware.
      Hope that answers :)

    • @pallavibansal84
      @pallavibansal84 3 ปีที่แล้ว

      @@codeKarle Yes, thanks a lot for answering

  • @drdr3496
    @drdr3496 2 ปีที่แล้ว

    Great content. If I may add some feedback: The sound is pretty bad, you might want to use a microphone on your T-shirt.

  • @bhattago
    @bhattago 3 ปีที่แล้ว

    Thanks bro for such a detailed design..appreciate all the hard work.
    A naive question, doesn't storing tweets in Elastic search cluster result in duplication of data (they are already present in Casandra) ?
    In other words, can't Search Service just work out of Casandra ? Am I missing anything (pros/ cons) ?

    • @shamsularefinsajib7778
      @shamsularefinsajib7778 10 หลายเดือนก่อน

      if you see some articles regarding elastic search, they are set in a way that will be synced with mongo/cassandra to handle the search. I think it is for scalability, using elastic search instead of searching in the cassandra.

  • @guitarist_covers
    @guitarist_covers 3 ปีที่แล้ว

    At 35:25 you say we could update timelines of other famous users immediately. Why? What benefit does this provide?

  • @zuowang5185
    @zuowang5185 4 หลายเดือนก่อน

    is it worth to mention hadoop any more in this day and age?

  • @kumarvipin874
    @kumarvipin874 3 ปีที่แล้ว +3

    Thanks for the video , awesome content and very detailed explanation. But I have a query on what would be the partition key for cassandra cluster , if tweetId is the partition key then How do we generate the user home timeline ,Wouldn't it be slow read if we query all cluster nodes and gather the tweets

    • @befeffe
      @befeffe 3 ปีที่แล้ว

      I have the same doubt. on one hand the recommendation is to have it pre-computed but then if data is shared based on tweet so will the home timeline is pre-computed on a single server. Really appreciate the clarification on this

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +4

      So there are multiple things stored here, for different kinds of query patterns.
      First is the tweetId(Partition Key), tweet details, text, time, etc... This would be used by use cases where we need to process a particular tweet, let's say to show a tweet on the UI or to fetch the details/content of a list of tweet ids, which a lot of services will call.
      The other use case is where we query tweets of a user.
      Here we'll store userId(PK), tweet Id, time, some other metadata... This is where the query happens on the User Id. .
      Now if we want to fetch tweets of a user, we'll first query the second data store, get the tweet ids, and then create the response by querying the first data store and getting the tweet contents. Similarly, other use cases can also be handled.
      Hopefully, that answers :)

  • @sumankalyanghosh4838
    @sumankalyanghosh4838 ปีที่แล้ว

    Can you also describe how do we built trending tweet flow?

  • @protyaybanerjee5051
    @protyaybanerjee5051 2 ปีที่แล้ว

    There should obviously be a post-cache in front of the Cassandra cluster. I'm not too sure of read efficiency of the Cassandra

  • @girirajgupta9489
    @girirajgupta9489 3 ปีที่แล้ว

    Great video..One specific question though how many records do we keep in the cache for each user and what happens when a user has seen all the cached tweets, DB query?

    • @codeKarle
      @codeKarle  3 ปีที่แล้ว +1

      If capacity permits, we can store forever. This however will involve a lot of cost, and that might not be acceptable. In that case we can keep some timewindow let's say 4 weeks of content in Cache for each user. There is no right or wrong thing here, it's just a cost vs latency tradeoff.
      Post the 4 weeks, we can query the DB to get other tweets that should be shown to the user using the same flow as that of Passive users.
      Once a user has seen all the tweets, we will not have any more tweets to show and we can maybe then show some users that this person can follow, to make the UX look decent enough.

  • @juanjoselopezmartinez4332
    @juanjoselopezmartinez4332 ปีที่แล้ว

    Very good video and channel. Finally I find a nugget of gold in this sea of garbage.

  • @YashRaithatha1989
    @YashRaithatha1989 3 ปีที่แล้ว +2

    Awesome content.
    I think it's better that "Tweet Ingestion Service" directly push the tweet info into Kafka only and then Tweet Service reads from Kafka and store it in Cassandra. Ingesing in both Cassandra and Kafka at the same time can make Tweet Ingestion Service a bit slower and also we cannot ingest in Kafka and Cassandra atomically from Tweet Ingestion Service.
    Thoughts ?

    • @hrishidypim
      @hrishidypim 2 ปีที่แล้ว

      Good point. no chance of loosing in case of back pressure to service by throwing data to Kafka immediately.

    • @sanjayts
      @sanjayts 2 ปีที่แล้ว +5

      It need not be atomic; those are two different flows and can be done in parallel. Sometimes, pushing to Kakfka might fail, sometimes inserting into Cassandra might fail and these failures need to be handled independently. I personally disagree with only pushing to Kafka because the tweet action would no longer be synchronous and you won't have a ACK for the client who is waiting for their tweet to be successfully posted.

  • @fengquansong6475
    @fengquansong6475 3 ปีที่แล้ว

    Very helpful content. Only one suggestion: change a better mic. The audio quality is very poor. Thanks.

  • @mayankb03
    @mayankb03 2 ปีที่แล้ว

    any downside for using NoSQL for user DB?