when indexes are useless | The Backend Engineering Show

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 พ.ค. 2024
  • head to database.husseinnasser.com to get a discount coupon for my Introduction to Database Engineering course
    In this episode of the backend engineering show, I’ll discuss three instances where indexes can be useless and might add overhead to your operations. Let us discuss.
    0:00 Intro
    1:34 What is an Index?
    4:00 The Cost of Indexes
    6:40 Most values are the similar
    13:00 WHERE UPPER(NAME)=‘RICK’
    17:10 Composite Index
    23:00 How do I know if I’m using an Index
    Become a Member on TH-cam
    / @hnasr
    🔥 Members Only Content
    • Members-only videos
    Support my work on PayPal
    bit.ly/33ENps4
    🧑‍🏫 Courses I Teach
    husseinnasser.com/courses
    🏭 Backend Engineering Videos in Order
    backend.husseinnasser.com
    💾 Database Engineering Videos
    • Database Engineering
    🎙️Listen to the Backend Engineering Podcast
    husseinnasser.com/podcast
    Gears and tools used on the Channel (affiliates)
    🖼️ Slides and Thumbnail Design
    Canva
    partner.canva.com/c/2766475/6...
    🎙️ Mic Gear
    Shure SM7B Cardioid Dynamic Microphone
    amzn.to/3o1NiBi
    Cloudlifter
    amzn.to/2RAeyLo
    XLR cables
    amzn.to/3tvMJRu
    Focusrite Audio Interface
    amzn.to/3f2vjGY
    📷 Camera Gear
    Canon M50 Mark II
    amzn.to/3o2ed0c
    Micro HDMI to HDMI
    amzn.to/3uwCxK3
    Video capture card
    amzn.to/3f34pyD
    AC Wall for constant power
    amzn.to/3eueoxP
    Stay Awesome,
    Hussein
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 45

  • @hnasr
    @hnasr  2 ปีที่แล้ว +5

    head to database.husseinnasser.com to get a discount coupon for my Introduction to Database Engineering course

  • @carlosdelgado5632
    @carlosdelgado5632 2 ปีที่แล้ว +27

    I happen to realize about this about 2 weeks ago when I created an index on a column with "status" and saw no increase of performance at all, as the matter fact it actually decreased so I guess I learned it the hard way 😂

  • @squirrel1620
    @squirrel1620 2 ปีที่แล้ว +17

    I was just troubleshooting a long running query and found that one of the tables being joined was missing an index. The SQL EXPLAIN showed exactly what index should be added. Added it and bam, 40s query down to 1s

  • @talhaabdurrahman9407
    @talhaabdurrahman9407 2 ปีที่แล้ว +11

    the way you said bahrain made me so happy

  • @emjaytripleo
    @emjaytripleo 2 ปีที่แล้ว +2

    Just bought your course Yesterday, looking forward to it

  • @noorulamin2652
    @noorulamin2652 2 ปีที่แล้ว +1

    explained very well thank you Hussein Nasir for such awesome content we really appreciate your content & hard work keep it up 👌

  • @RustinRoberts
    @RustinRoberts 7 หลายเดือนก่อน

    Great video. This helped my understand indexing a bit deeper. Thank you

  • @marsilinouzaky2748
    @marsilinouzaky2748 2 ปีที่แล้ว +7

    Great video Hussein, hope u have a great vacation.
    I think two more ways indexes can be useless if it's on a column that changes value very very often where maintaining a rebalanced tree will become challenging and introduce a lot of overhead. Also if the rows are not that many then index might not provide a lot of benefits.

    • @mohamedmohamedy3085
      @mohamedmohamedy3085 2 ปีที่แล้ว

      exactly.
      one of the reasons we generate random test data, is to force the DB engine to use the index.

  • @sirajul-anik
    @sirajul-anik 2 ปีที่แล้ว

    on point 2, i had figured out this one a few months ago. I was totally shocked.

  • @HSBTechYT
    @HSBTechYT 2 ปีที่แล้ว

    I had a question about composite indexes in nosql/mongo
    How does mongo know what's left and what's right ?
    And let's say I have a index for userid property but _id (inbuilt) is also indexed right ?
    So will this affect performance?

  • @bashardlaleh2110
    @bashardlaleh2110 2 ปีที่แล้ว +1

    The composite index thing is really interesting

  • @Gabriel-sc3mg
    @Gabriel-sc3mg 4 หลายเดือนก่อน

    and how does indexes works when we make a join by a PK it will search the results by Index and then search in the other table? great video btw

  • @VivekYadav-ds8oz
    @VivekYadav-ds8oz 2 ปีที่แล้ว +3

    I had questions regarding the propagation of query from backend to database - why do we introduce so much overhead in this part?
    We use textual commands and pass it to the database for lexing *for every single query* (queries get cached but this only helps if query is exactly the same), then parsing, then actually doing its actual job. But since most of the query is programmatically generated, why don't we have a binary format for this, where we just send a blob C-style structure over the network. It would have all the information database needs without needing to filter through the human junk.
    Also, I am not an engineer (yet!) so no real world experience, but do databases run on different machines than the server? For all examples and usecases encountered by me, they've always been on the same machine, so using the network (especially TCP for localhost networking!) seems unnecessarily expensive. There should be an alternative method communication that doesn't require network but uses some Unix-y message passing of some sorts.

    • @rishabhanand4270
      @rishabhanand4270 2 ปีที่แล้ว +8

      yes, databases are best kept away from your server. This way it's decoupled and you can scale each other independently. A server should do exactly what it's name suggests: serve your clients. Whatever requests your client has, it should note it down and send it to the "kitchen" where "food" is prepped, cooked and sent to the clients. Sometimes "food" is also cached :).
      Coming to the network problem: it's not always a full TCP connection establishment, there is a connection pool from your server to the database which is almost always active and your server can just use these existing connections to quickly send over the raw query. Yes, there is still a little overhead but it's still better than keeping your database on the server itself when we talk about scale.
      Finally, we don't send "parsed" queries over the network because then you are parsing it on your Server. As explained by Hussein in this video, the parsing is dynamic, i.e, the database looks at a query and comes up with the most efficient plan based on statistics which it stores in the database itself. So you're loosing all the good database optimizations if you just send the parsed query directly to the database. Also, if tomorrow the database changes it's parser (maybe an upgrade), you will have to upgrade all your server instances to have the same version of the parser. It's a headache nobody wants. Let the database do it's job of caching, looking at your query to plan and executing your query.

    • @bepamungkas
      @bepamungkas 2 ปีที่แล้ว

      To beat the server query planner, you'll need actual knowledge of the data. AFAIK, only SQLite allows you to do that by embedding the entire logic into its driver; making it a classic serverless RDBMS.
      note: Of course there is cloud serverless db like cosmos or aurora, but those are a bit more obscure in implementation due to several abstraction layer. SQLite use simplified model where db interface is treated as part of the application itself.
      Regarding your last question: Yes. It is advisable to separate db with backend due to predictability factor. Roundtrip latency is generally an acceptable cost compared to resource exhaustion during peak load (since on single-machine scenario, every additional load will be multiplied by at least two: backend processing and db access). And latency is usually much more predictable compared to load.
      Having said that, for localhost you could use UNIX socket- or named pipes on Windows- to avoid TCP "overhead". However, on obscure edge cases where you need do bug hunting, TCP-based connection generally easier to debug. And using TCP from the get go means you're not locked to single scenario, or worse, made wrong assumptions due to difference in system behavior.

  • @shiskeyoffles
    @shiskeyoffles 2 ปีที่แล้ว +1

    Similar situation!
    Saving in watch later

  • @mihaidorobantu8574
    @mihaidorobantu8574 2 ปีที่แล้ว

    Regarding the first name case - can we not just create an index on a normalized version of the first name table?

  • @wilsonovasea
    @wilsonovasea 2 ปีที่แล้ว

    good video.. even better to display with 1.5 speed

  • @rishavagarwal6531
    @rishavagarwal6531 2 ปีที่แล้ว

    Thank you sir for the video :)

  • @buddy.abc123
    @buddy.abc123 2 ปีที่แล้ว +1

    Enjoy your vacation bro, we will all be here when you return.

  • @lakhveerchahal
    @lakhveerchahal 2 ปีที่แล้ว +3

    I'm not sure of this but this might be the case (it can be wrong) -
    When u only do
    select * from employees where condition;
    This * can cause the db to do heap scan instead of index scan because it anyways need to fetch every column from heap, so to skip two hops it goes to table scan in the first place.
    Please correct me if it's wrong.

    • @ili101
      @ili101 2 ปีที่แล้ว +2

      Doesn't sound logical to me. If you have million rows and you do "SELECT * WHERE id = 42069" I don't see way it will scan million rows instead of using the index then getting this 1 row from the heap. From what I know select * is only bad for network traffic and CPU/RAM/IO for fetching/formatting the output. Except probably if you select *only* from the indexes then you don't need the heap at all and skip this extra step?

    • @marsilinouzaky2748
      @marsilinouzaky2748 2 ปีที่แล้ว +1

      The index doesn't store all columns values so it doesn't really matter what columns you're looking for, instead what matters in this case is the where clause column, the DB will scan this index based on the where column, get the primary key, then it has access to all columns related to this primary key, so specifying one column or * won't make much of a difference if you have an index that will be used to filter your rows.
      Someone can correct me if I'm wrong :)

    • @lakhveerchahal
      @lakhveerchahal 2 ปีที่แล้ว

      Yes I guess, it'll still use the index scan 99% of the time, it's just bad for the I/O.

    • @mohamedmohamedy3085
      @mohamedmohamedy3085 2 ปีที่แล้ว

      selecting only happens after finding the rows.

  • @kooshasangari2555
    @kooshasangari2555 2 ปีที่แล้ว

    ⭐ Thanks Man, You are #1 ⭐

  • @marawanlotfy1477
    @marawanlotfy1477 2 ปีที่แล้ว +1

    hi hussein
    love from egypt man ❤️

  • @rishiagrawal4260
    @rishiagrawal4260 2 ปีที่แล้ว +1

    can you please make a video on HLS ? and more importantly how you explore/read untouched technology ?

    • @hnasr
      @hnasr  2 ปีที่แล้ว +1

      I think this will help th-cam.com/video/1-KmLc0c2sk/w-d-xo.html

  • @jasdeepsinghgrover2470
    @jasdeepsinghgrover2470 2 ปีที่แล้ว

    When the index is implied but not mentioned.. Like object type(having index) is car and we are searching for BMW but we never mentioned object type to be car in the query.

  • @henrydesousa8264
    @henrydesousa8264 2 ปีที่แล้ว

    Regards from Costa Rica.

  • @dixztube
    @dixztube 2 ปีที่แล้ว

    Trying to learn kotlin by just reading the documentation. Ugggggh. I like JavaScript but need to leave my comfort zone.
    Hope your family is doing well. Thanks for a great channel !

  • @johnnychang3456
    @johnnychang3456 2 ปีที่แล้ว +1

    Another one is when you do a LIKE query with percent sign on the left of your where clause. E.g, select sth from xx where xxx LIKE %q%
    Since % means all possibilities, the database has no way to apply any index scan.

    • @mohamedmohamedy3085
      @mohamedmohamedy3085 2 ปีที่แล้ว

      also, iLike with/without the leading % will cause a full table scan.

  • @emptymeta5849
    @emptymeta5849 2 ปีที่แล้ว

    Yo, big fan of your outage content.
    Could we get some content on the recent roblox haloween outage?
    Have you heard anything about that?
    Edit: Didn't see the part about vacation, hopefully we get some good outage content when hes back

  • @erlangparasu6339
    @erlangparasu6339 2 ปีที่แล้ว

    please give sql code or benchmark result when it useless

  • @ahmedtawil7705
    @ahmedtawil7705 2 ปีที่แล้ว

    How you make it ?!!
    really when i face a problem , i just find that you talking about in newly videos!! 😂😂
    Thank you Hussein

    • @hnasr
      @hnasr  2 ปีที่แล้ว

      I read minds 😍

  • @hitmusicworldwide
    @hitmusicworldwide 2 ปีที่แล้ว

    Hey! Let us know how life is back home ! That sounds fantastic!

  • @rahulbera454
    @rahulbera454 2 ปีที่แล้ว

    500th like ❤️

  • @mughees52
    @mughees52 2 ปีที่แล้ว

    Welcome back to bahrain in Advance

  • @Vijay-Yarramsetty
    @Vijay-Yarramsetty 2 ปีที่แล้ว

    commenting for the sake of TH-cam algorithm, to help him monetize from the video.

  • @electronlibre4163
    @electronlibre4163 2 ปีที่แล้ว

    Don't we say indices instead of indexes? 😁