I loaded 100,000,000 rows into MySQL (fast)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024
  • Watch me wrangle over 100,000,000 rows of data and load it into MySQL.
    To learn more about PlanetScale, head to planetscale.co...!
    💬 Follow PlanetScale on social media
    • Twitter: / planetscale
    • Discord: / discord
    • TikTok: / planetscale
    • Twitch: / planetscale
    • LinkedIn: / planetscale

ความคิดเห็น • 165

  • @tcarreira
    @tcarreira 10 หลายเดือนก่อน +168

    you are doing it right PlanetScale. Promoting your brand on the background while you give real value to people watching this videos. Congrats.
    Even letting different people is well done (I got used to Aaron though :D)

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +18

      We could all use some more Aaron.

    • @TrondHus
      @TrondHus 9 หลายเดือนก่อน

      Every company should have an Aaron,@@PlanetScale

  • @dudaseifert
    @dudaseifert 10 หลายเดือนก่อน +170

    PlanetScale is LITERALLY the only company i'm subbed to on youtube. your content is THAT good

    • @darshanrander1950
      @darshanrander1950 10 หลายเดือนก่อน

      So true!!

    • @electrolyteorb
      @electrolyteorb 9 หลายเดือนก่อน +1

      But not the platform itself

    • @dudaseifert
      @dudaseifert 9 หลายเดือนก่อน

      @@electrolyteorb what do you mean?

    • @8starsAND
      @8starsAND 9 หลายเดือนก่อน

      Yeah, especially loading ready to go python packages…. Yeah, rocket science…

    • @Frank2212
      @Frank2212 8 หลายเดือนก่อน

      Time to unsubscribe. They laid him off. He's the only reason the videos are good

  • @bozgi
    @bozgi 10 หลายเดือนก่อน +91

    Didn't even realise that this is a company account. Loved that video

  • @69k_gold
    @69k_gold 9 หลายเดือนก่อน +21

    Using the "teach a man how to fish" rule in product marketing is one hell of a genius idea. What a win-win!

  • @ahmettek315
    @ahmettek315 9 หลายเดือนก่อน +4

    Update: I also tried this on Mac Mini M1. This time instead of using all those 17 columns, I did as you did and only imported 6 of them. For postgreSQL, it took 4 m 39 s 810 ms this time :) Almost half of your MySQL timing (and again this is in docker container).
    I knew that postgreSQL should be better. I fell in love with postgreSQL once more. Thanks.
    Ah, BTW, maybe I should be considered cheating, I use the open source postgreSQL extension Citus (but single node in a docker container - it would be insane with multiple nodes).
    (I guess planet scale is also using the Go project Vitess and should be fast on multiple nodes)

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน +3

      Good experimentation! This was loaded into vanilla MySQL, and you're right that the performance is very dependent an hardware, which DBMS you're using, etc.

  • @robbyface3707
    @robbyface3707 10 หลายเดือนก่อน +21

    Ben! My past lecturer at the UofA, he is an amazing teacher so I’m glad I can still learn from him 🎉

  • @DoubleM55
    @DoubleM55 10 หลายเดือนก่อน +15

    09:23 - Am I the only one who noticed that you got error at row one, not even importing any data :)
    Anyway, awesome video :) I always get excited when I see new one uploaded :)

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      Good catch! @dloura07 pointed that out also. Using -f to continue after errors causes it to take even longer! Definitely need a better solution.

  • @codewithantonio
    @codewithantonio 10 หลายเดือนก่อน +27

    Aaron, you look different!
    jk, always love to see new PlaneScale content! ❤

    • @your_skyfall
      @your_skyfall 10 หลายเดือนก่อน +2

      Always love your course, planning to do you Miro clone for this month

    • @ahmad-murery
      @ahmad-murery 10 หลายเดือนก่อน

      Yes, I thing he got flu 😁

  • @miguel-espinoza
    @miguel-espinoza 9 หลายเดือนก่อน +7

    as a frequent user of that same lichess database I would really appreciate that c-program you wrote :)

  • @MrPudgyChicken
    @MrPudgyChicken 10 หลายเดือนก่อน +15

    I would be curious to see the performance difference between multiple independent INSERT statements and one INSERT statement with many values. In my experience, it is much, much faster, though I have never tried importing a dataset quite this large.

    • @reed6514
      @reed6514 10 หลายเดือนก่อน

      Same experience here.
      On another note, server-side, I've had to chunk my inserts into a thousand or so rows at a time bc my production db is a diff server than my web server, and it only lets me send so much data at once.
      I also wonder how a transaction would affect speed of many separate inserts.

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +2

      Great idea! That's exactly what mysqldump does when exporting a db. I didn't show a performance comparison, but let me know what you find :)

    • @ToX82
      @ToX82 10 หลายเดือนก่อน +1

      I did that for an import tool that inserts ~50.000 rows each time. I wrote that tool a few years ago so the numbers could be slightly different, but according to my benchmarks (and my memory), using one insert per 800-1000 records is the fastest option, and it takes about 5-6 seconds. Using one insert per line takes ~30 seconds. More than 800-1000 records per insert, again turned out to be slower.

  • @Richard_GIS
    @Richard_GIS 9 หลายเดือนก่อน +1

    8:42 thats totally understandable why it takes for ever, each ';' is a commit and this takes processing power. Best would be a bulk load of the data. But the direct loading is great too

  • @HoverCatzMeow
    @HoverCatzMeow 10 หลายเดือนก่อน +5

    Love that you kept errors/issues in the video! Shows that even professionals makes simple mistakes :)

  • @athar-q
    @athar-q 9 หลายเดือนก่อน +1

    I have used SQL Loader in one of the pipelines I developed for the company I work for. That process time was reduced from 30 minutes to 2 minutes.

  • @Arkounay
    @Arkounay 9 หลายเดือนก่อน +3

    Great video, I didn't know about the "load data local infile" sql command 👍
    I would have loved a comparison with INSERT into that inserts multiples lines and once instead of one insert per games, and see if it's faster or slower than LOAD DATA LOCAL INFILE

  • @UTUBMRBUZZ
    @UTUBMRBUZZ 9 หลายเดือนก่อน +1

    I had the same problem but with a 300 hundred million rows. The real pain was hidden characters in the data and comma at the wrong places. So too import that data into the database did take over 6 hours. So i had to run the script over night. Too optimize it we did compare the changes with the first import so the second and further imports did go in 1-5 minutes.

  • @scottstroz
    @scottstroz 10 หลายเดือนก่อน +1

    I am curious to know if there would be any performance improvements in the dump and load process if you used multi-threaded dump and load methods of MySQL Shell instead of mysqldump.

  • @matrizaev
    @matrizaev 9 หลายเดือนก่อน +1

    I would rather make that first C converter load the data into MySQL. This way you could use your PGN file as a SoT and load it into both your local MySQL and PlanetScale.

  • @thejonte
    @thejonte 10 หลายเดือนก่อน +10

    Please use the torrent next time. That'll save them bandwidth and probably also be faster.

    • @I_SEE_RED
      @I_SEE_RED 10 หลายเดือนก่อน +1

      🤓

    • @thejonte
      @thejonte 9 หลายเดือนก่อน

      Thanks for calling me smart. Y'know the nerds are the ones successful in this field@@I_SEE_RED

  • @Tjkrusinski
    @Tjkrusinski 10 หลายเดือนก่อน +5

    Very good. My brutal feedback is Aaron sometimes has too many opinions sprinkled into his videos where this fella has almost none. A middle ground would be like wow chefs kiss.

  • @Klinoklaz
    @Klinoklaz 10 หลายเดือนก่อน +1

    thought you were to use awk to process that huge file. i sometimes use it to parse gigantic xml files, it's much easier to write than C. feel like it shouldn't be too slow in your usecase either

  • @NuncNuncNuncNunc
    @NuncNuncNuncNunc 10 หลายเดือนก่อน

    In general, should you run multiple tests with different number of rows in order to measure any startup time? Should the first test be using MySQL's built in LOAD DATA INFILE, i.e. baseline should be the zero-effort case?

  • @alexrusin
    @alexrusin 10 หลายเดือนก่อน +2

    Great video, as always. Thank you. I got a question: why did you use ec2 instance to load data to Planetscale? Couldn't you do it directly from your local Mac?

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +4

      Good question. Mostly for the sake of improved network performance (which can be a bottleneck in instances like this). Home internet often has very slow upload speeds :)

  • @mirzashakirbaig1950
    @mirzashakirbaig1950 10 หลายเดือนก่อน

    It's really awesome.
    Recently I tried importing a CSV into MySql database ( using Laravel and excel libraries ) which only had a few million records. It was taking 5-7 minutes even after a lot of improvisations I did.
    Using multiple tools and technologies to get the data processed is the main ingredient here. Will give it a try with my import your ideas.

  • @danielgangl6560
    @danielgangl6560 10 หลายเดือนก่อน +2

    How long should it take to load data into Planetscale? My application currently takes ~1 second for 1000 (1 thousand) in a single createMany Prisma statement.
    (1000 entries into 1 table)
    Adding more or less seems to scale linearly.
    That seems really slow to me. Wondering if there are some benchmark for that to compare to see if I am doing something wrong or if it's normal.

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      Depends on a bunch of factors including the PlanetScale plan you are on, network connectivity / proximity to your DB, how Prisma's createMany function is implemented, etc. What's your setup?

    • @danielgangl6560
      @danielgangl6560 9 หลายเดือนก่อน

      @@PlanetScale I'm only interested in the query latency. So connectivity and all that should be outside of that. My createMany simply gets an array with 1000 objects, each with 10-15 column values. Plan is Scaler. (but does it make a difference in this case with e.g. free? from the pricing page it only looks like the upper quantity would later make an issue but otherwise I don't see what would affect it)

  • @IvanRandomDude
    @IvanRandomDude 9 หลายเดือนก่อน +18

    100k inserts in 1.7s. Yep, your app with 5 active users and 1000 inserts per day definitely needs Cassandra.

    • @llothar68
      @llothar68 9 หลายเดือนก่อน +7

      But what happen when we go viral and become as rich and huge as Google? We have to be prepared for this. More Microservices, more distributed systems please and we need a registration server for the 3 users that logged in yesterday -

    • @IvanRandomDude
      @IvanRandomDude 9 หลายเดือนก่อน +6

      @@llothar68 We need separate instance for each user, to make sure it scales.

    • @indramal
      @indramal 9 หลายเดือนก่อน

      What do you expect to tell us? " your app with 5 active users and 1000 inserts per day definitely needs Cassandra."
      Concurrent connection is slow in this MySQL database?
      5 x 1000 = 5,000 raws per day, so why do we need cassandra DB for this small insets?

    • @llothar68
      @llothar68 9 หลายเดือนก่อน +5

      @@indramal You did not get the joke. Or the terrible situation the web development world is in at the moment.

    • @AlexeyKlimko
      @AlexeyKlimko 9 หลายเดือนก่อน

      The thing is that it failed after 21000 rows…

  • @thegrumpydeveloper
    @thegrumpydeveloper 9 หลายเดือนก่อน +1

    Curious why the upload to ec2 rather than direct from local. Is that just to run so you can do other things and to take advantage of network? Otherwise seems similar to upload direct rather than upload then upload again.

    • @musashigarami
      @musashigarami 9 หลายเดือนก่อน +1

      Came to comments for this, also was wondering, as surely you'd spend time uploading first to ec2? And then upload again from there.. but the first upload to ec2 negates benefits? Really would like to know the reasoning behind it

  • @gemmaatroxxibox5322
    @gemmaatroxxibox5322 9 หลายเดือนก่อน

    I was thinking it must be -> Load data infile because that has saved me many a time. But the preprocessing in C has made me think I really need to learn that, or Rust or Zig. Very impressive.

  • @ignauy
    @ignauy 9 หลายเดือนก่อน

    Loved the video. I just have a question, couldn't you just have established a connection to planetscale even before the dump?
    When you run `mylsql -u root chess < inserts.sql`, couldn't you have directly imported that into planetscale like you did import the dump in the ec2 instance?

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      Yes! A connection to the PlanetScale DB could have been established directly form my computer instead of from the EC2 instance. The advantage of the EC2 instance is that, if it is in the same region as the database, it can have better network bandwidth to the server.

  • @abimbolaobadare6691
    @abimbolaobadare6691 9 หลายเดือนก่อน +1

    Bro this is wild, the C script got me questioning my life choices. Can we train an AI model with this chess data?

  • @adityanandula
    @adityanandula 9 หลายเดือนก่อน

    This is interesting! When I tried the same amount of records from one SQL instance to another in AWS via network. It took me 6 hours. I've doubt what about the BinLog data ? Can we apply the same method with the backup SQL data chunks ?

  • @sounakmondal3342
    @sounakmondal3342 9 หลายเดือนก่อน +1

    Would really love a playlist dedicated to performance tips

  • @evko9264
    @evko9264 9 หลายเดือนก่อน

    what should we do if we have to use python, s3 buckets and aws glue just to make pretty the same: convert some txt file to csv? The problem is that we need it FAST)

  • @ropoxdev
    @ropoxdev 10 หลายเดือนก่อน +4

    Aaron where!?

  • @felixisaac
    @felixisaac 9 หลายเดือนก่อน +1

    Just curious, how did you send the dump file to your AWS instance? Is it through FTP?

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      Transferred via scp!

    • @felixisaac
      @felixisaac 9 หลายเดือนก่อน

      Thank you! @@PlanetScale

  • @OscarFromIndia
    @OscarFromIndia 9 หลายเดือนก่อน

    How can someone learn all the stuff that your doing here?? Can you share some concepts to get started?

  • @Mcs1v
    @Mcs1v 9 หลายเดือนก่อน +1

    "Load data" isn't faster if you write your loader properly, so you just won a running competition against a disabled opponent

  • @rombael8152
    @rombael8152 9 หลายเดือนก่อน

    Hardware? Intel Pentium III 500Mhz?

  • @permad5189
    @permad5189 9 หลายเดือนก่อน

    Coolio neato! Only thing that's a problem is the MySQL pronunciation, even the official mysql docs says the correct way is my s-q-l!

  • @tigranrostomyan9231
    @tigranrostomyan9231 10 หลายเดือนก่อน +2

    Did you just loaded 200+gb file in 1 second with vim..? (2:16)

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      In vim, you can control-c to cut loading short. The more you know!

  • @ahmettek315
    @ahmettek315 9 หลายเดือนก่อน +2

    Well that was fast for 100 million rows, but you didn't tell us about the hardware you are using. From video I can only tell that you are using a Mac, I guess it has M series CPU and a decent drive. Likely faster than my AMD Ryzen 9 3900x, 32 Gb RAM, Corsair MP600 1 TB NVME.
    I did what you did, converted that pgn to CSV (using Go, all 17 columns), and imported all 17 columns to postgreSQL. 100,023,791 rows affected in 8 m 27 s 347 ms. This was done in a docker container, thus I think it made it a little bit slower.
    Interestingly you wait over 23 seconds just for a count(*)???? That was really slow. It took 3.7 seconds for me on postgreSQL. I wonder what would happen if I do this on my Mac mini M1. I named my table "speedo", this query:
    select count(*) from Speedo where Termination = 'Time forfeit';
    returns 31,471,426 in 4.7 seconds. Can you also time this on that mySQL, just curious.

  • @regmiller536
    @regmiller536 9 หลายเดือนก่อน

    I love this kind of videos, they are so educational.

  • @Wald246
    @Wald246 9 หลายเดือนก่อน

    15:59 I'm always baffled that MySQL does not come with an option to import and export data using a binary file format and instead forces the use of a plaintext SQL file. Seems so inefficient

  • @tjnangosha
    @tjnangosha 9 หลายเดือนก่อน

    This video is so satisfying to watch.

  • @vishwanathbondugula4593
    @vishwanathbondugula4593 9 หลายเดือนก่อน

    Can you share the C code you have written to parse pgn to CSV ?

  • @G5STU
    @G5STU 10 หลายเดือนก่อน +1

    Great vid, loving the C, always use the right tool for the job (not what's cool) 👍

  • @dimitar.bogdanov
    @dimitar.bogdanov 10 หลายเดือนก่อน +2

    Question - why not torrent when downloading the set?

  • @equus6460
    @equus6460 10 หลายเดือนก่อน

    was able to successfully load 100 million record in SQLite3 database its about 20 GB. Would be interested to know how big was the 100 million MYSQL database.

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      The chess DB in MySQL was around 10.7 gigs after running the tests and importing the big data set. The dump was ~13.

    • @equus6460
      @equus6460 9 หลายเดือนก่อน

      ​@@PlanetScale Was planning to scale to 1 billion rows but had issues with my SSD space. Guess MySQL would scale upto 1 billion rows..
      Was impressed with SQLite3 ability to scale up with ease to 100million records.
      Administration overhead for SQLite3 is much less compared to other databases such as Postgress , MYSQL and Oracle that's my opinion.

  • @Mesinko6
    @Mesinko6 9 หลายเดือนก่อน +1

    Maybe using batch insert would taken less time

  • @igboanugwocollins4452
    @igboanugwocollins4452 10 หลายเดือนก่อน

    This is phenomenal, thank you very much for this masterpiece

  • @abdelkadermh
    @abdelkadermh 9 หลายเดือนก่อน

    Can you please make a video of using ARK or SED to transform the data and show how fast it can be?

  • @jeetchheda8916
    @jeetchheda8916 9 หลายเดือนก่อน

    please tell me this load script can be developed in PostgreSQL

  • @avramcs
    @avramcs 10 หลายเดือนก่อน +1

    what a suprise lol, great vid

  • @amranimohamad4741
    @amranimohamad4741 9 หลายเดือนก่อน

    what about mojo isn't it making py for effeciant

  • @bilalthepunjabi
    @bilalthepunjabi 9 หลายเดือนก่อน

    I automate all these process of downloading, extracting, schema building and migrating into database in a Laravel project.
    I initially choose python for this task, but it is slower which I amazed too so php with short lifespan and chunking with recursive calling make all this happen with realtime updates over socket...!

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      Cool!

    • @datamini01
      @datamini01 9 หลายเดือนก่อน

      Hello bilalthepunjabi,
      Could you please share the source code you implemented with PHP (Laravel)? I would greatly appreciate it. Currently, I am also exploring some solutions to accomplish this task with PHP, but I think it may not be as fast as Python.

  • @AlexeyKlimko
    @AlexeyKlimko 9 หลายเดือนก่อน

    9:23 query failed, 100000 rows were not inserted. Read all of the output.

  • @bowenwang
    @bowenwang 9 หลายเดือนก่อน

    Can you provide a link to your C code that converts pgn to csv

  • @renjithvk492
    @renjithvk492 3 หลายเดือนก่อน

    How you got inserted 'white_elo' value to 'while_elo' column? 😊

    • @PlanetScale
      @PlanetScale  3 หลายเดือนก่อน

      Someone is paying attention! It got fixed off screen.

  • @wes337
    @wes337 10 หลายเดือนก่อน +1

    Great video! Would actually love to see the code for that C program you wrote to convert to CSV

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน

      The pause button is your friend here ⏸

    • @phpnotasp
      @phpnotasp 9 หลายเดือนก่อน

      @@PlanetScaleSarcastic comments aside, even going frame-by-frame in your video, you cannot see all the code.

  • @vardmans
    @vardmans 10 หลายเดือนก่อน

    Would you mind to share that C script?

  • @januzi2
    @januzi2 9 หลายเดือนก่อน +1

    You don't have to use select count(*). You could check the schema for the next autoinc or just ask for the id from the last record: select id from chessgame order by id desc limit 1; It will use the index and will be a lot faster than the whole table scan.
    It will return the correct number under the conditions: 1. the table has the autoinc column, 2. nothing was deleted, 3. the query doesn't contain any "where".

  • @gokukakarot6323
    @gokukakarot6323 3 หลายเดือนก่อน

    where is the c code?

  • @RandomGeometryDashStuff
    @RandomGeometryDashStuff 10 หลายเดือนก่อน

    07:38 where are mystery numbers (64, 256, 32) coming from?

    • @lian1238
      @lian1238 10 หลายเดือนก่อน

      I think the numbers could be higher in a real application

    • @tcarreira
      @tcarreira 10 หลายเดือนก่อน

      When creating a MySQL table and using VARCHAR data types with specific sizes like 32, 64, or 256, you are essentially defining the maximum length of the character data that can be stored in that column.
      The specific numbers like 32, 64, and 256 are often used in VARCHAR column definitions for several reasons, but they are not rigidly fixed; you can choose other sizes based on your specific data requirements.
      Here are some reasons behind the common use of these numbers:
      - Powers of 2: 32, 64, and 256 are powers of 2 (2^5, 2^6, and 2^8), which aligns well with the binary storage and memory management systems used in computer hardware. This alignment can make data manipulation and storage more efficient in some cases.
      - Common Data Sizes: These sizes often align with common data sizes in various contexts. For example, 32 characters might be chosen for things like usernames, which are often limited in length. 64 and 256 characters cover a broad range of data sizes, accommodating longer strings or descriptions while still keeping data within reasonable limits.
      - Historical Precedence: These sizes have been used in database design for a long time, and they have become somewhat standard. Database administrators and developers are accustomed to seeing these sizes in schemas.
      - Practicality: They are practical sizes that work well for many types of data. Using numbers like 32, 64, and 256 simplifies decisions about column sizes because they are easy to work with and understand.
      That said, there's no requirement to use these specific sizes. Your choice should always be based on your specific data needs. If your data requirements don't align with these sizes, it's perfectly fine to choose different sizes that better suit your use case.

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      Just "reasonable" values chosen for this example. For a production application, would want to validate that those are good values more thoroughly.

  • @dloura07
    @dloura07 10 หลายเดือนก่อน +2

    Did the multiple insert statemets actually work at 9:22 or did it take 1.7s in total because it failed, so it would actually take longer if it worked?

    • @lian1238
      @lian1238 10 หลายเดือนก่อน

      Yes. Seems like the error cut the operation short

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน +1

      Nice catch. You're right that the error cut it short. Using -f to continue after errors causes it to take even longer! All the more reason the use LOAD DATA LOCAL INFILE 😄

  • @CaboLabsHealthInformatics
    @CaboLabsHealthInformatics 9 หลายเดือนก่อน

    Did you really decompress and stdout the whole 200GB to the terminal?

    • @sudeshryan8707
      @sudeshryan8707 9 หลายเดือนก่อน

      i was more impressed vim was able to open 200gb file in no time.. while im struggling with 300mg text files in Notepad++ 🤣🤣

    • @CaboLabsHealthInformatics
      @CaboLabsHealthInformatics 9 หลายเดือนก่อน

      @@sudeshryan8707 please avoid to stdout the whole thing :)

  • @bitwise2832
    @bitwise2832 9 หลายเดือนก่อน +1

    FIELDS are not FILEDS.
    To run meaningful queries you need some more indices.
    id alone is limited.

  • @Pici3k
    @Pici3k 9 หลายเดือนก่อน

    Learned a ton from this. Thanks!

  • @0xtz_
    @0xtz_ 9 หลายเดือนก่อน

    man the best channel ever 😎, cool videos as always and ton of informations

  • @AndrzejPauli
    @AndrzejPauli 10 หลายเดือนก่อน +1

    Wher's "Seee yyaa" ??? ... ?? 😀
    Naa, just kidding. Great content!

  • @reed6514
    @reed6514 10 หลายเดือนก่อน

    I like that you do everything in the terminal

  • @NaourassDerouichi
    @NaourassDerouichi 10 หลายเดือนก่อน +1

    Wait, have you just loaded a 200 gig text file into the editor without any fire ?!

    • @Shuroii
      @Shuroii 10 หลายเดือนก่อน +2

      The art of vim chunked loading

  • @benjaminshtark5977
    @benjaminshtark5977 10 หลายเดือนก่อน

    its all good, the only part not usable by most people is writing that C code to convert to CSV :)
    most people are not that good in classical C programming...

  • @towhidulislam6192
    @towhidulislam6192 10 หลายเดือนก่อน +1

    where's AARON?

  • @NedumEze
    @NedumEze 7 หลายเดือนก่อน

    Surely, Sir, you're not running the same mysql version as the rest of us who have PHPMYADMIN's integrated version? Or are you?
    If you are, how is it that you can have 100M rows in one Table and I can't have even half a million in my Version in PHPMYADMIN?

  • @syntaxerorr
    @syntaxerorr 9 หลายเดือนก่อน

    Did vim just open a 214 gigabyte file in seconds? I think I've opened a 12 gig file before, not 214!

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน +1

      Tip: You can CTRL-C as vim is in the process of loading a file to cut it short, allowing you to view part of the file!

  • @gefyaqiilah6678
    @gefyaqiilah6678 9 หลายเดือนก่อน

    wow, thank you for sharing

  • @0xmassive526
    @0xmassive526 10 หลายเดือนก่อน

    Video so good i didnt even need to watch this but still did. 😅

  • @vahanmkrtchyan2504
    @vahanmkrtchyan2504 9 หลายเดือนก่อน

    Bruh we have terabytes in mysql . And all good just don't do aggregations use warehouse for that.

  • @LewisCowles
    @LewisCowles 9 หลายเดือนก่อน +1

    Where is Aaron?

  • @gfeher123
    @gfeher123 9 หลายเดือนก่อน

    Finished watching when you inserted records one by one

  • @카라멜땅콩-p2p
    @카라멜땅콩-p2p 9 หลายเดือนก่อน

    good topic

  • @MrGilbert1995
    @MrGilbert1995 9 หลายเดือนก่อน

    How make query run faster 22 sec for a count is long :P

  • @teenigma_
    @teenigma_ 10 หลายเดือนก่อน +4

    I see that while_elo 😉

  • @DavisTibbz
    @DavisTibbz 9 หลายเดือนก่อน

    Wow impressive. Came here from Google News, this video was featured 😅

  • @TheMainManNorthOfEquator
    @TheMainManNorthOfEquator 9 หลายเดือนก่อน

    Will try with Postgres 16

  • @patlecat
    @patlecat 9 หลายเดือนก่อน

    Dude you had all the data loaded into that C program. It would have been super easy and fast to then write it into MySQL directly from there! Why you go all those complicated detours and persist to use Python? LOL

  • @samislam2746
    @samislam2746 9 หลายเดือนก่อน

    Next video: Do this in mongoDB

  • @spacetabs
    @spacetabs 10 หลายเดือนก่อน +3

    True definition of BigData

    • @DoubleM55
      @DoubleM55 10 หลายเดือนก่อน +8

      Not even close :)

    • @IvanRandomDude
      @IvanRandomDude 9 หลายเดือนก่อน

      30 gigs is nothing in big data terms.

  • @alquinn8576
    @alquinn8576 9 หลายเดือนก่อน

    y use mysql when you could have loaded it into 100 LINKED ACCESS DATABASES?

  • @sudeshryan8707
    @sudeshryan8707 9 หลายเดือนก่อน

    i was more impressed vim was able to open 200gb file in no time 😅😅

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      CTRL-C is your friend :)

  • @chiebidoluchinaemerem5860
    @chiebidoluchinaemerem5860 9 หลายเดือนก่อน

    Man, Python is really slo............w

  • @KiamatChange-ke8nd
    @KiamatChange-ke8nd 9 หลายเดือนก่อน

    200 GB text data? WTH is that? 😳

  • @mbe2916
    @mbe2916 9 หลายเดือนก่อน

    Python can be fast if you stay in the c world.

  • @hqcart1
    @hqcart1 9 หลายเดือนก่อน

    dude, you could have asked chatgpt and its done

  • @HeenDee
    @HeenDee 9 หลายเดือนก่อน

    please..pgn to csv converter c source....... python very slow....ㅜ_ㅜ

  • @peacekeeper7968
    @peacekeeper7968 9 หลายเดือนก่อน +1

    You discovered that C is faster than python hahahahahahhahahahhahah

  • @IamAbhiMishra
    @IamAbhiMishra 9 หลายเดือนก่อน

    Everything is fine, but why have you stopped your free tier for certain countries including India?

  • @privacyvalued4134
    @privacyvalued4134 10 หลายเดือนก่อน +2

    Doing 10,000 separate insert prepare/execute SQL statements without surrounding them with transaction calls = extremely bad performance. It's not even close to a fair test. Not that Python is helping matters. MySQL can also insert multiple rows in a single statement, which you also weren't taking advantage of. So, yeah, of course MySQL's command-line client built-ins will outperform even the fastest languages when you're doing the worst possible thing performance-wise.

    • @Comeyd
      @Comeyd 10 หลายเดือนก่อน

      He never made any queries to MySQL from Python.
      Python was only used to generate the queries.
      Yes, he should have shown using multiple values instead of separate insert statements. I would argue though… thats effectively what he did with the last mysqldump.
      Loaded it directly from the CSV and then dumped it out.
      You also don’t need to use C to turn the original data into a CSV either. There are plenty of lands that are faster than Python that can do that.
      I was almost hoping to see something that used Awk!
      With enough shell-fu you could build an actual “pipeline” (literally, just chain the commands together! That’s what they’re there for!) that goes straight from the PGN file to MySQL with no intermediate files in between…

  • @robertgarrison1738
    @robertgarrison1738 9 หลายเดือนก่อน

    while_elo instead of white_elo lmao

  • @asdfaf343r4
    @asdfaf343r4 9 หลายเดือนก่อน

    skill issues

  • @kodosensei
    @kodosensei 9 หลายเดือนก่อน

    👀