Creating a 100 gigabyte index on 1 billion+ rows

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 33

  • @efeegbevwie9467
    @efeegbevwie9467 8 หลายเดือนก่อน +4

    While these indexes are being created, does the database lock the tables? Will this make your application unresponsive if the tables cannot be accessed?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      You can still do DML (for example, inserting something into the table) for the btree indexes. However, this is not permitted when creating the fulltext! More info in the docs:
      dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes

    • @ArthurHenriqueta
      @ArthurHenriqueta 8 หลายเดือนก่อน

      @@PlanetScale what is the solution on this case?

  • @SebastianMares
    @SebastianMares 8 หลายเดือนก่อน

    Oh, and apropos full-text index. Wondering why after updating MySQL 5.7 to 8.0 queries that took < 1s to execute are now taking 6 seconds. Similar setup to yours where the FTI is over one column and the query only checks that column. 700k rows. Not sure if I should simply rebuild the index.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Hard to say without more info about the query and DB. Have you confirmed that after migrating/upgrading the index still exists? What does "SHOW INDEXES FROM table_name;" say?

  • @Kaldrax
    @Kaldrax 8 หลายเดือนก่อน

    Sounds like a usecase outside of bioinformatics for interleaved bloom filters. The end result looks still pretty slow to me.

  • @TalhaMajeed-e6j
    @TalhaMajeed-e6j 8 หลายเดือนก่อน

    I already have PK, which is string, can I make another column as FTS_DOC_ID field ?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      You could add it! However, if you try to build a FULLTEXT index on a table without it, it will automatically rebuild the table to incorporate that index so you don't have to handle it manually. Just makes the index creation take longer.

  • @DavidYoo-m7z
    @DavidYoo-m7z 8 หลายเดือนก่อน +1

    Yeah this is awesome

  • @rickdg
    @rickdg 8 หลายเดือนก่อน

    The real test is doing queries while the table is getting written into.

  • @SebastianMares
    @SebastianMares 8 หลายเดือนก่อน

    How many rooms are there in the test data? Asking because I always thought an index makes sense if the cardinality is high (= there is a lot of variance in the data). I would be surprised to see such a big impact on speed if there were only a handful of rooms.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      This this setup, the population script for the DB put 1000 messages in per room, meaning there were over a million different room names.

  • @userasd360
    @userasd360 8 หลายเดือนก่อน +1

    how did you manage the uptime for the server in case of the index whichtook 7 hours?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      This would definitely be a consideration if this was in production! For the purpose of the video, this was for testing the app pre-release. You specify the fulltext index at the time of table creation, and have it build incrementally as new messages arrive. However, this will add some overhead for table insertions.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      This would also be a great use case for PlanetScale branching + deploy requests.

  • @mrmehdigh9622
    @mrmehdigh9622 8 หลายเดือนก่อน +2

    Does the full text index work for all languages? for example a language like Arabic?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      Great question! In the video we're using a utf8mb4 character set, but MySQL supports a large number of other character sets as well:
      dev.mysql.com/doc/mysql-g11n-excerpt/8.3/en/charset-charsets.html
      Give it a try with a different one! FULLTEXT indexes work on char, varchar, and text column types.

    • @mrmehdigh9622
      @mrmehdigh9622 8 หลายเดือนก่อน

      @@PlanetScale Thanks, So with select a good charset we can use full text index on the text column? Does the tokenization process work for a RTL (ex: Arabic or Persian) Language?

  • @SanderCokart
    @SanderCokart 8 หลายเดือนก่อน

    So what would be the next step to make it even faster?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      We have a part 2 coming out soon where we cover just that

  • @saurabh9446
    @saurabh9446 8 หลายเดือนก่อน

    So, next step would be to store that in elastic search?

  • @devperatetechno8151
    @devperatetechno8151 8 หลายเดือนก่อน

    wow, amazing

  • @otmanm4095
    @otmanm4095 8 หลายเดือนก่อน +1

    Greta video! Thx sharing

  • @amrishshah8982
    @amrishshah8982 8 หลายเดือนก่อน

    What storage engine is required for full text? Does it work on innodb?

    • @AndersGustafsson87
      @AndersGustafsson87 8 หลายเดือนก่อน +1

      yes.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Yep, this table was using innodb in the video!

  • @ericowens3023
    @ericowens3023 8 หลายเดือนก่อน

    Great Vid

  • @ayder
    @ayder 8 หลายเดือนก่อน +2

    Btree indexes never worked in any database if you query field like ‘%search_keyword% instead ‘search_keyword%’ only works. Creating a fulltext index on populated table is painful, instead create an empty table with full text index and populate data. ( insert into select … ) . Last with that many records you have to shard the db or use some partitioning to manage data

    • @finlanderxx
      @finlanderxx 8 หลายเดือนก่อน

      Populating the rows and indexes at the same time sounds as painful as doing them separately.