Solving one of PostgreSQL's biggest weaknesses.

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 มิ.ย. 2024
  • Storing large amounts of data, such as time series data, in a single table is often a challenge when it comes to PostgreSQL. There are other databases out there that can be used for timeseries, however, it does mean giving up many of the features that makes postgres so desirable.
    Fortunately, there is another option. TimeScaleDB is a postgres extension that is optimized for time series data. In this video, we provide a brief introduction into TimeScale DB and put it to the test.
    This video was sponsored by TimeScaleDB. If you're looking to use a time series database for your own needs, then I highly recommend TimeScale, you can get set up with a free trial using the link below.
    console.cloud.timescale.com/s...
    Join this channel to get access to perks:
    / @dreamsofcode
    My socials:
    Discord: / discord
    Twitter: / dreamsofcode_io
    My Equipment:
    Voice over: kit.co/dreamsofcode/voiceover
    Coding: kit.co/dreamsofcode/coding
    Video Links:
    Github Repo: github.com/dreamsofcode-io/ti...
    TimeScaleDB: www.timescale.com/?...
    00:00 - Intro
    00:56 - Timeseries Data
    02:40 - Getting Started
    07:33 - HyperTables
    12:38 - Continuous Aggregates
    16:39 - Results
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 279

  • @dreamsofcode
    @dreamsofcode  9 หลายเดือนก่อน +7

    To get started with a free trial of TimeScaleDB use the following link!
    console.cloud.timescale.com/signup/?

    • @VinhLeQuang-nl8zm
      @VinhLeQuang-nl8zm 9 หลายเดือนก่อน

      What operating system and terminal are you using?

    • @NostraDavid2
      @NostraDavid2 9 หลายเดือนก่อน +1

      Note: I hope you've pulled down those instances, because I see a few passwords in the video. Seeing the complexity, I presume it's a single-use password, generated by a password manager?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      @@NostraDavid2 They're down :) thank you for checking though!

    • @ftamas88
      @ftamas88 9 หลายเดือนก่อน

      would love to see a more in-depth one about compression and the others :o

    • @tomasruzicka9835
      @tomasruzicka9835 9 หลายเดือนก่อน

      LOGLINES: loglines are a great example of time series data. It's also usually some dynamic object with a timestamp.

  • @marcoscarvalho660
    @marcoscarvalho660 9 หลายเดือนก่อน +119

    In Postgre we trust

    • @realscience7274
      @realscience7274 9 หลายเดือนก่อน

      It's bloated and slow. F SQL.

    • @isaacfink123
      @isaacfink123 9 หลายเดือนก่อน +2

      ​@realscience7274 what's the alternative?

    • @spicynoodle7419
      @spicynoodle7419 9 หลายเดือนก่อน

      ​@@isaacfink123MySQL/MariaDB and better architecture

    • @FunctionGermany
      @FunctionGermany 9 หลายเดือนก่อน +28

      @@isaacfink123 spend 20 man years engineering your own database system with rust, obviously.

    • @anarcus
      @anarcus 9 หลายเดือนก่อน +1

      ​@@FunctionGermanyexactly, SurrealDB

  • @stephenreaves3205
    @stephenreaves3205 9 หลายเดือนก่อน +70

    Also worth pointing out that while p95 time was smaller going to hyper table, the p95 memory went up by a lot

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +7

      Great call out!

    • @projekt95
      @projekt95 6 หลายเดือนก่อน +1

      Does the memory size correlate with the chunk size (chunk_time_interval) of the hypertable? So a smaller chunk size with lots of input is better to save memory usage?

    • @mmkumars
      @mmkumars 3 หลายเดือนก่อน +1

      3MB to 3GB

  • @JimCarnicelli
    @JimCarnicelli 7 หลายเดือนก่อน +2

    Excellent introduction. Thank you.

  • @FunctionGermany
    @FunctionGermany 9 หลายเดือนก่อน +24

    i gotta respect your dedication to reading and replying to criticism and question in the comments.
    when this channel grows more you're not gonna be able to keep up with all comments but if you keep doing this amount of work i think you'll maintain high integrity and trust.
    i've seen a bunch of promising tech channels dip their toes too far into the sponsor or hype train pond, releasing content without enough reflection or community consultation (e.g. joshtriedcoding).
    i'm looking forward to your future content :)

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +3

      Thank you! I'm definitely taking the feedback on board and working out solutions for future content. After any sponsorship commitments I'm currently dedicated to, I'll be taking a pause on them in order to work out how best to proceed. I have a couple of solutions in mind so will share them with the community when they become more fleshed out.

    • @savire.ergheiz
      @savire.ergheiz หลายเดือนก่อน

      Poor josh only tried hence the channel name. So nothing wrong he only delivers what he promised 😂

    • @FunctionGermany
      @FunctionGermany หลายเดือนก่อน +2

      @@savire.ergheiz sometimes josh REALLY doesn't try hard at all unfortunately. he's dropped some pretty massive L's in the past.

  • @charlesm.2604
    @charlesm.2604 9 หลายเดือนก่อน

    This channel is slowly becoming my favorite place on the internet

  • @Stublet
    @Stublet 9 หลายเดือนก่อน +2

    Thank you for this video. As it happens I was thinking about looking into timescale as well as influx and a few other time time series dbs out there. Postgres is not my favorite db of choice, but I"m a big proponent of using the right tool for the job and timescale checks a lot of my boxes here. Also props for featuring neovim. :)

  • @wembleyleach
    @wembleyleach หลายเดือนก่อน

    Thanks for the awesome video!

  • @coderentity2079
    @coderentity2079 7 หลายเดือนก่อน

    Great! Just what my fellow programmer needed.

  • @user-ye5jt7ro6c
    @user-ye5jt7ro6c 7 หลายเดือนก่อน +7

    This is great! One point of feedback: instead of writing each parquet file out to disk as a csv, and then COPYing from the csv file, try converting the parquet file to csv and streaming it directly into the STDIN of the COPY command. Forego the reading and writing to disk. I expect this to speed up your load script considerably.

  • @rdatta
    @rdatta 9 หลายเดือนก่อน +1

    Excellent introduction to TimescaleDB. Would very much like to see more features and capabilities.

  • @themarksmith
    @themarksmith 9 หลายเดือนก่อน +3

    Excellent video, would love to see more on TimeScaleDB!

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      It's a really cool extension! I shall probably do a video on self hosting it next

    • @themarksmith
      @themarksmith 9 หลายเดือนก่อน

      @@dreamsofcode v interested in self hosting!!!

  • @CristianHeredia0
    @CristianHeredia0 9 หลายเดือนก่อน

    This was great. Thanks for sharing. What do you use to make coding slides/video/transitions?

  • @LeeStoneman
    @LeeStoneman 9 หลายเดือนก่อน

    Fascinating insight into large dayda issues.

  • @gnydick
    @gnydick 9 หลายเดือนก่อน +5

    These kinds of add-ons are great, but they also come with maintenance and reapair nightmares. Imagine there is an outage of data feeding in, or the db itself suffers an outage. You need a way to be able to reliably and accurately rebuild possibly large amounts of materialized views from arbitrary offsets.

  • @lhxperimental
    @lhxperimental 9 หลายเดือนก่อน +3

    In older versions of Postgres you had to use table inheritance to achieve partioninging. But since last several years, they have declarative partitions although you still have to create the partition tables yourself.

    • @MuratcanBerber
      @MuratcanBerber 9 หลายเดือนก่อน

      There is an extension called pg_partman that can help creating partitions automatically. We also use timescaledb in our Production databases to deal with time series data.
      Another think I want to mention that, creating a dummy timestamp data it isn't that hard. You don't need to download anything. You can use PostgreSQL generate_series() command which can accept start and end_time and use random numeric values for any time range.

  • @ilyapunko1127
    @ilyapunko1127 9 หลายเดือนก่อน +21

    We are using timescale db in prod. But, actually, this solution has a lot of problems yet. You can check issues. But anyway it's pretty simple and affordable variant for timeseries.

    • @mikopiko
      @mikopiko 9 หลายเดือนก่อน

      Have you tried ciickhouse yet?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +3

      What issues are you currently facing? We're using it in production as well and have had a good experience so far.

  • @daves.software
    @daves.software 9 หลายเดือนก่อน +3

    Did you have an index on the started_at column for the regular table?

  • @hansiboy5348
    @hansiboy5348 9 หลายเดือนก่อน +2

    Lovely video! I would love to see this used in a project using sensor data or another type of real time data

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      That's a great use case for it. At my current place of work, we use it for sampling IP attributes over time.

  • @DiogoBaeder
    @DiogoBaeder 9 หลายเดือนก่อน +1

    Pretty impressive indeed! Makes me want to compare that with how Apache Pinot performs. Thanks for the video!

  • @Kanikkl
    @Kanikkl 9 หลายเดือนก่อน +5

    Awesome demo! Thank you! This really opens up another technological layer to me. In the past I had no connection to db optimization other than using indexes in mysql. But this is so much more and so useful! This is actually applicable to an use case that has been bugging me for years. Different type of timeseries data that we are currently cashing through a php scheduler in a redis storage. Will share this with the team 🎉

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      That's great to hear! I'm glad the video was helpful! ❤️

  • @Terandium
    @Terandium 9 หลายเดือนก่อน +15

    So, self hosting for FREE is too much work.. using a docker image is also to much work? So lets use this way more expensive cloud instance that totally isn't sponsoring me...

    • @SXsoft99
      @SXsoft99 9 หลายเดือนก่อน +1

      My ex "IT guy" that was managing servers was so lazy or I don't know what he was but for him to copy pasting 7-10 console commands was sooo hard
      it was literally hes job, after 2 days of him not doing anything i was like "give the me god damn connection credentials and root access" 1 day later after just a bit of research i finished putting up the entire scaffolding, and documented it since some people find it hard to "google it"
      I dont know if its a trend or people just want everything on a plate but learning is hard, thinking is hard, breathing is hard

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +10

      It's good to be skeptical. However in this case you've missed the point.
      Yes, Timescale sponsored the video, but they had 0 decision making on how I deployed timescale. I even left links for other deployment models in the project readme.
      I went with what would be the quickest for the video. The hypervisor on macOS would skew the results, and setting up an instance of timescale would detract from the content. If that's something people are interested in, I'm happy to do another video on it.

    • @Terandium
      @Terandium 9 หลายเดือนก่อน

      @@dreamsofcodeThanks for the reply, in. the case of the video I totally get it. I think it's very cool you pointed out the free possibilities as well. Just the wording about it sounds quite off to me haha, and to my friend as well. its how I got to this video in the first place.
      Either way, thanks for the reply as it provided a lot of context.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      @@Terandium Thank you for the feedback, I really do appreciate hearing your thoughts! I should have done a better job to explain my reasoning for choosing the managed service.

  • @vslabs-za
    @vslabs-za 9 หลายเดือนก่อน +4

    Interesting, thanks. My big data table is more than a few TB and over 90 billion (no typo) rows. But it's in that "dreaded vendor database", Oracle. It's a 2 node Oracle database cluster on 2017 h/w, processing and analysing over a billion WAN IP traffic flows per day (SQL & PL/SQL code). Would love to try out PostgreSQL though. But then there's my other pursuits (like machine learning on IP traffic for anomaly detection)... and not enough hours in a day. This is a great time for s/w engineers. So many fascinating technologies, products, languages, and concepts to explore.
    The Golden Age of Software Technologies. 🙂

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน +1

      Yayyy! Definitely Golden Age
      Timescale was dogfooding and reached 10 billion records per day in a single node, single hypertable :)

  • @spartan_j117
    @spartan_j117 9 หลายเดือนก่อน +3

    Thanks for the video. But don't you know about "\timing" in psql?

  • @davidbanhos7308
    @davidbanhos7308 9 หลายเดือนก่อน +2

    It is a amazing video! Congratulations! I have a question as a developer - sorry, it could be a silly question. How should be the workflow on the hyper and non-hyper tables? Will be required the code to call the hyper table version programmatically per case bases? Insert/delete/update only on the non-hyper table? Or is that a way to do it transparently?

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน

      It's totally transparent. Just insert in the hypertable and the partition works behind the scenes.

  • @herozero777
    @herozero777 4 หลายเดือนก่อน

    Hi really loved the video you've made. It is quite informative. I had one question, what does database migrations do (at 10:00) if not load the data into the new database?
    Is it in this case just setting up tables in the database instance?

    • @dreamsofcode
      @dreamsofcode  4 หลายเดือนก่อน +1

      Yes! The migrations just set up the table schemas rather than loading the data in. You can do this in the migration step, however migrations sometimes run in a txn which would probably be quite large.

  • @mrCrashCube
    @mrCrashCube 6 หลายเดือนก่อน

    Verry nice Video! Thank you!
    at 9:30 you declared a reference in a hypertable, that is not working for me. after consulting google i found out that in hypertables it is not possible to create references... how did you solved that?

  • @samuelserot4158
    @samuelserot4158 9 หลายเดือนก่อน +2

    Hello and thank you for your videos :)
    I've been interested in TimescaleDB for a while now, but my data is not completely timeseries-oriented. Each record I want to analyze can evolve over time in my current DB, so we would have mutable events if I keep my current implementation. I suppose this is not acceptable with TimescaleDB? Wouldn't it be better to transform every event that modifies one of my record in my current model into a unique event to the tsdb, with a ban on modifying the data afterwards?
    I hope I've made myself clear :)

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      You can use a unique index with a hypertable which will allow you to prevent duplicates, and then you can use an UPDATE to upsert any data in. It should be doable with tsdb!

    • @samuelserot4158
      @samuelserot4158 9 หลายเดือนก่อน

      Thanks for your answer!
      I thought that too many updates could be a problem with tsdb but it probably depends on the technology, I'll have to test it :)

  • @peterndungu41
    @peterndungu41 9 หลายเดือนก่อน +1

    I have a question is it alright to view partioning of tables to how binary search works, like you get more performance since you are not looking through the entire db

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      That is exactly how it works.

  • @richardbennett4365
    @richardbennett4365 9 หลายเดือนก่อน

    How does the window get those baby blue, peach, pink, and green colors in the information bar?

  • @moondevonyt
    @moondevonyt 9 หลายเดือนก่อน +1

    mad props for the deep dive on postgres and timescale DB
    it's clear you've got mad skills and dedication when it comes to this stuff
    i have to admit though, i don't fully buy into the idea that postgres can't handle big data efficiently without the need for something like timescale DB
    sure, for time series data there might be an edge, but postgres has shown some mad chops with large datasets in the right setups
    but, timescale DB sounds like a game changer for specific use cases
    keep grinding and dropping that knowledge

  • @TLAngus1337
    @TLAngus1337 9 หลายเดือนก่อน +17

    Looks interesting, but in your performance comparison you could have added an index on started_at to the normal postgres table. I think that would have increased its query speed substantially (albeit increasing storage)

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      A well designed index can definitely help, although it does impact write performance as well, which in time series data sets can cause issues.
      You can also add them to hyper tables, if needed. Generally, partitioning is always going to be better for time series and then adding indexes for other features is the best approach in my experience.

    • @hheheks
      @hheheks 9 หลายเดือนก่อน +11

      @@dreamsofcode Yes, but your comparison here is very unfair. Calling create_hypertable() function actually creates an index automatically on the column that you use for time partitioning, which in your case is the started_at column.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +4

      @@hheheks I hear you, but I think there is a big difference still.
      Hypertables don't create indexes, only partitions. The issue with just using an index on the start time is that there's still a row lookup. It won't have all the necessary data for aggregations and would require a row lookup still (unless you use the include keyword on the columns you want in the index). You also have to re-balance the index for every insert, which will cause much slower insertion times.
      I think it's incorrect to compare the two because of the nuances with indexing, which honestly could be a video by itself. I'm happy to do a video to explain the drawbacks of indexing vs partitioning.

    • @hheheks
      @hheheks 9 หลายเดือนก่อน +6

      @@dreamsofcode You are incorrect. create_hypertable() has argument ’create_default_indexes’ which defaults to true, and you are not overriding it. Go check your partitioned table. The parent table does not have the index, but each partitioned chunk has.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +3

      @@hheheks You're correct. That's my mistake. Let me run this without the create_default_indexes flag and share the findings!

  • @trueberryless
    @trueberryless 9 หลายเดือนก่อน +2

    I'm not calling you out for a dirty cheat because the video is very interesting 😉. Thank you and keep it going!

  •  9 หลายเดือนก่อน

    can I get the keywords for setup the desktop environment as video do ? Thanks

  • @oscarlacueva
    @oscarlacueva 9 หลายเดือนก่อน

    The question is, why would you want to do read queries in a operational DataBase? Usually you load the data useful for analytics/ML/reporting in an analytical DB which is designed to perform read operations

  • @ragectl
    @ragectl 9 หลายเดือนก่อน

    Smooth inclusion of the number of TH-cam subscribers. I am convinced 👍😉

  • @Fanaro
    @Fanaro 9 หลายเดือนก่อน

    The world needs more of this channel.

  • @chudchadanstud
    @chudchadanstud 2 หลายเดือนก่อน

    If I plan on running a db locally I usually just stil to sqlite. It's 3x faster than redis fully optimised and you can ignore the n+1 problem.

  • @jshet
    @jshet 9 หลายเดือนก่อน +1

    What software do you use for the graphics and editing?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      I mainly use Davinci resolve with some After effects for certain things as there's much more documentation on AE. I have a new channel coming soon which I'll talk about motion graphics using DR and AE from a coding POV.

  • @DalpakaNBA
    @DalpakaNBA 9 หลายเดือนก่อน

    i love postgres the most too

  • @fahimferdous1641
    @fahimferdous1641 9 หลายเดือนก่อน +7

    My good sir, you have definitely piqued my interest for it. I remember playing with postgres and mouse genome toy dataset on a dual core and hard disked machine. That's when I had discovered influxdb, around 8 years ago. But it was never interesting to me to play with time series data or databases. I am definitely going to try out this one.
    And yes, I would 10/10 love to watch a more detailed video. Also, if possible, perhaps you can make a video on Prometheus and OpenTelemetry?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      I'm glad you enjoyed it! We've been using it at my place of work for huge time series data and it's been a joy to work with.
      Those are some great ideas. I shall add both to my video backlog.

  • @rakeshprasad4969
    @rakeshprasad4969 9 หลายเดือนก่อน

    first of all big fan of your content. regarding opensource, your comment on terraform? they were opensource too, till they are not. could you post some content on that?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      That's a great point! I know some other creators have done a great job covering terraform as well. I believe Timescale would remain open source due to postgres however!

  • @FlorianWendelborn
    @FlorianWendelborn 9 หลายเดือนก่อน +3

    Using Timescale for Skyblock Finance and loving it so far. It’s a bit complicated to set-up the continuous aggregates properly but once it's running it's a lot faster

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      Just checked it out, very cool!

  • @christopherjackson2157
    @christopherjackson2157 9 หลายเดือนก่อน

    What about dedup tables in file servers? They can get huge and really slow and buggy. I've not yet come across a satisfactory way for dealing with them. Tho I'm by no means a db expert

  • @oscarljimenez5717
    @oscarljimenez5717 9 หลายเดือนก่อน +2

    Will Timescale be the Planetscale for PostgreSQL? Or the Vitess ?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      Planetscale is pretty cool! You certainly can use timescale as just a managed PostgreSQL, although it's more optimized for time series data.

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน

      Timescale now also offering Dynamic Postgresql instances. Announced this week, just pay storage for what you use ;)

  • @Povanky
    @Povanky หลายเดือนก่อน

    Hi, do you have any suggestions how to improve speed of full text search combined with filtering on other columns? Table size is more than 600Gb, filtering columns are provided by user and are different each time

    • @Povanky
      @Povanky หลายเดือนก่อน

      Stored tsvectors with indexing are fast, but when it's combined with other column filters performance become really bad

  • @dinoscheidt
    @dinoscheidt 9 หลายเดือนก่อน +1

    Yeah, at our start-up (solar analytics) timescaleDB broke completely down, stopped auto sizing so we needed to migrate very quickly to HBase like tesla uses. Became evident, that MS did not test their product at more than almost big data size data.

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน

      That's a sad experience Dino. Just curious, what's the size of your database? We ( I work for Timescale) recently reached 10B rows per day and our dog fooding project. The storage is over 350TB at this point. We have a very proactive community, join our Slack and let's talk about the tech-design issue you have :)

  • @zeroorcero
    @zeroorcero 9 หลายเดือนก่อน

    What terminal do you use?

  • @and_rotate69
    @and_rotate69 9 หลายเดือนก่อน +20

    I think you forgot to point out that spliting a table into partitions requires more storage, i believe u did point this in the last video but the storage in question isn't small to be ignored

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +8

      That's a fair point, but compared to other solutions such as indexing the storage impact is pretty minimal. It's no where near 2xing or anything like that and given storage is pretty cheap, its a fair trade off IMHO.

    • @and_rotate69
      @and_rotate69 9 หลายเดือนก่อน +3

      ​@@dreamsofcodei agree, i'd rather pay like 1$ for like 30 to 35gb of storage than a pile of money for some compute time

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      @@and_rotate69 I do think it's worthwhile me doing a video on indexing as there's a few other comments about it as an option. Appreciate the feedback as well!

    • @keyboard_g
      @keyboard_g 9 หลายเดือนก่อน

      If you previously indexed by the partitioning field, that can then be dropped which can buy back space, and compute time on inserts.

  • @guerra_dos_bichos
    @guerra_dos_bichos 9 หลายเดือนก่อน

    I have lite 200Gb of data but it's not related to time in any sense (well maybe dates), what would be the best solution there?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      Is it possibly related to space or is there anything else that it can be partitioned on? If not then your best bet is decent indexing based on common queries. I'm going to do a video going into deeper detail on indexing at some point!

  • @myt-mat-mil-mit-met-com-trol
    @myt-mat-mil-mit-met-com-trol 7 หลายเดือนก่อน

    I wonder how timescaledb would work with postGIS extension

  • @Zeioth
    @Zeioth 7 หลายเดือนก่อน

    I had a project where we were writing 30gb per day. A mongo cluster was the only option there.

  • @gunjitmittal3456
    @gunjitmittal3456 9 หลายเดือนก่อน

    i really like your terminal. Can you/anyone tell/guide me how I can also achieve a similar look?

  • @helixx23
    @helixx23 หลายเดือนก่อน

    I also want to learn more of timescale

  • @kampanartsaardarewut422
    @kampanartsaardarewut422 9 หลายเดือนก่อน +2

    can traditional postgres brin index achieve the same goal?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      Indexing would help, but it won't help for every query. Partitioning tends to be the better option for timeseries, however. This is due to not needing to rebalance any indexes when insert or update more data.

  • @spacenomad5484
    @spacenomad5484 9 หลายเดือนก่อน +3

    12:00 Aah yes, the cloud. 5 days running cost = $170
    That's the budget for an entire month of 8Core 64GB RAM dedicated server with 1TB NVMe SSD and like 10TB traffic included.
    bUt YoU dOn't EvEn NeEd To SeT uP tHe ClOuD
    Hire a pro to set up your dedicated server, schedule backups, deploy your apps, and you'll break even with the cloud offering by the end of week 2.

    • @ruslan_yefimov
      @ruslan_yefimov 7 หลายเดือนก่อน

      I completely agree :) Cloud is good for development stage though

  • @talideon
    @talideon 9 หลายเดือนก่อน +1

    Assuming they host in AWS, there's an old saying: friends don't let friends deploy to us-east-1.

  • @miguelacuna7148
    @miguelacuna7148 9 หลายเดือนก่อน

    when a video about your os setup? :D

  • @smithrockford-dv1nb
    @smithrockford-dv1nb 9 หลายเดือนก่อน

    Am I reading the data wrong or has the hypertable used waaaaay more ram at 12:09? The difference between 0.0032GB and 3.1GB (x968) is not be just glanced over in my opinion...

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +2

      You're correct, it does use a lot more RAM, but that's by design due to the way hypertables cache their data in memory. That could be seen as a negative, but in comparing timings, using RAM is a good thing imho as it allows for optimizaton. However, I should have called this out on the video.
      Thank you for raising it!

  • @Fanaro
    @Fanaro 9 หลายเดือนก่อน

    Is there an equivalent but for graph (nodes and edges) data?

    • @spicynoodle7419
      @spicynoodle7419 9 หลายเดือนก่อน

      Yes there are many, they are called graph databases. Neo4j is the most popular iirc

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน

      You can also use recursive queries combined with hypertables which allow you to build a kind of tree system but not graph. Probably apache age (graph extension for pg) will work well too. Never tried myself :)

  • @lugalzagissi
    @lugalzagissi 2 หลายเดือนก่อน

    Why do we need the refresh policy for the continuous aggregate if it's updated continuously?

    • @dreamsofcode
      @dreamsofcode  2 หลายเดือนก่อน

      Mainly because its like a "live view" so it tracks changes in memory but needs to periodically refresh in order to commit them to the underlying materialized view.
      Think of it similar as git, with changes vs commits.

    • @lugalzagissi
      @lugalzagissi 2 หลายเดือนก่อน

      @@dreamsofcode but will my results of querying the aggregate depend on whether or not it was refreshed recently?

  • @bryce.ferenczi
    @bryce.ferenczi 9 หลายเดือนก่อน +1

    12:35 bro really clocked $168 of usage with a "hello-world"

    • @sandworm9528
      @sandworm9528 9 หลายเดือนก่อน

      Jeeeesus

  • @CielMC
    @CielMC 9 หลายเดือนก่อน

    Did you make your Linux look like MacOS or do you just have a great macos tilting WM? I'm using yabai rn but it's quite disappointing...

    • @CielMC
      @CielMC 9 หลายเดือนก่อน

      Also, is that tmux at the bottom?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      It is tmux at the bottom! I have a video on my tmux configuration on my channel which goes into it!
      So, a lot of my windows are video editing unfortunately. I do have some utilties for screen recording and centering with xdotool, but mostly it's editing magic that makes it look like it does.

  • @GymEnthusiastNZ
    @GymEnthusiastNZ หลายเดือนก่อน

    isn't there incremental materialised view extension for PG? surprised not mentioned... or tested..

    • @dreamsofcode
      @dreamsofcode  หลายเดือนก่อน

      IVM is a great extension! I don't believe it's supported by any cloud providers as of yet (at least wasn't last I checked).
      I'll happily do a video on it though!

  • @richardbennett4365
    @richardbennett4365 9 หลายเดือนก่อน

    Why is the syntax of these languages written like COBOL or something equally scores old?

  • @xpusostomos
    @xpusostomos 7 หลายเดือนก่อน

    Actually, technically Postgres natively supports time scale data. Postgresql does not. Because the feature was in the Postgres Berkeley code but removed by the Postgresql team.

  • @monirz001
    @monirz001 9 หลายเดือนก่อน

    What is the theme you are using for your terminal?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      I use Tmux and Catppuccin theme!

    • @monirz001
      @monirz001 9 หลายเดือนก่อน

      @@dreamsofcode Got it thanks!

  • @nickcalibey6178
    @nickcalibey6178 9 หลายเดือนก่อน

    Alright, where did you get that killer wallpaper?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      I got it from freepik! I'd share the url but TH-cam doesnt like it

  • @wayneeseguin
    @wayneeseguin 9 หลายเดือนก่อน

    Same ❤🎉

  • @dragonwave2652
    @dragonwave2652 9 หลายเดือนก่อน

    Just wanted to write to use timescale and you said it😂

  • @ongayijohnian8787
    @ongayijohnian8787 หลายเดือนก่อน

    Mahn, please do the dadbod plugins for NvChad

  • @vanvothe4817
    @vanvothe4817 9 หลายเดือนก่อน

    Do you use client gui or command line for sql?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      I use command line for sql! psql works for me. I believe there are some decent guis that colleagues of mine use.

    • @vanvothe4817
      @vanvothe4817 9 หลายเดือนก่อน

      @@dreamsofcode Sometimes I use the GUI but I usually use pgcli because it has autocompletion.

  • @jajwarehouse1
    @jajwarehouse1 7 หลายเดือนก่อน

    The averages you are getting from the materialized view are different from the trips_hyper. This is happening because you are taking the monthly averages of daily averages.

    • @jajwarehouse1
      @jajwarehouse1 7 หลายเดือนก่อน

      What is more interesting now that I look at it, the averages between the standard table and the hyper table are very slightly different, as well, which is not making much sense if the data in both are identical.

  • @spartan_j117
    @spartan_j117 9 หลายเดือนก่อน

    12:03 what does "P95 memory" metric means?

    • @jdmichal
      @jdmichal 9 หลายเดือนก่อน +1

      P95 means "95th percentile". So 95% of observed values are = that value.
      So P95 memory consumption would mean that 95% of the runs consumed the same or less memory. 5% consumed the same or more memory.

  • @isnimshchikov
    @isnimshchikov 9 หลายเดือนก่อน

    I feel like the pricing is a bit overboard. A similar droplet on digotalocean would cost almost 10 times less per hour. Sure, you will have to spend 30 minutes to set up a db, but then you won't be paying 900$ per month, only around 100$

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      I hear ya. I think that is the case with most managed services. I know for SOC2 compliance you'd likely need a little more than the droplet, but given that this is an open source extension, if you've got the bandwidth to self host, then that option is available. That's one thing I really like about TimescaleDB.

  • @user-qr4jf4tv2x
    @user-qr4jf4tv2x 9 หลายเดือนก่อน

    would like a video on postgres streaming database

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      This is a great idea!

  • @Petoj87
    @Petoj87 7 หลายเดือนก่อน

    Wouldn't it be more fair if you at least had a index on the normal table?

  • @jojohehe3251
    @jojohehe3251 6 หลายเดือนก่อน

    Is this time scale setup useful for a transactional database such as for a booking system like uber ? Could we structure the data search that every minute of the day is regarded as a bookable resource for every operator and then search through that time series data?

  • @keyboard_g
    @keyboard_g 9 หลายเดือนก่อน

    I’m storing many TB in Postgres across partitioned tables. To create a new partition for a day “my slice of choice” is a couple line procedure.
    For massive inserts, just write directly to the partition as if it were a free table. When I’m bulk loading I do this into a table with no index or relationships, then when don’t attach the partition and everything gets built once at the end.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      You should also check out pg_partman and pgcron if you're raw managing the partitions yourself! Those are pretty awesome extensions.

  • @weberlin5821
    @weberlin5821 9 หลายเดือนก่อน

    Compare the P95 memory for both queries. TimescaleDB query used ~1000x morememoery

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +6

      RAM is meant to be eaten :)

    • @weberlin5821
      @weberlin5821 9 หลายเดือนก่อน

      @@dreamsofcode Indeed! 😂

  • @dagtveitgmail
    @dagtveitgmail 9 หลายเดือนก่อน +2

    Clickhouse runs circles around it if you need some more advanced and heavy querys

  • @theIbraDev
    @theIbraDev 6 หลายเดือนก่อน

    my biggest Postgres database is 5 million company records at 8 GB, reaching 50 GB seems impossible

  • @christian15213
    @christian15213 7 หลายเดือนก่อน

    but isn't this a worlkload DB for analytics not a transactional DB. would be helpful if that was your usecase

  • @cheebadigga4092
    @cheebadigga4092 9 หลายเดือนก่อน

    So it's basically just smart caching?

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      Yep! There's a lot of smart caching under the hood, as well as managing partitions, which enables the smart caching.
      It's open source (albeit written in C) so one is able to check what they're doing as well!

  • @cmilkau
    @cmilkau 9 หลายเดือนก่อน +1

    bandaid. RDBMS should really be able to handle this case well without additional work, at most some index tweaking.

  • @asdasdaa7063
    @asdasdaa7063 หลายเดือนก่อน

    GIB MORE PostgreSQL's VIDEOS >:(

  • @ariseyhun2085
    @ariseyhun2085 9 หลายเดือนก่อน +1

    Lovely ad 👏

  • @rudiservo
    @rudiservo 9 หลายเดือนก่อน +2

    I find this interesting and strange because Mysql has partition out of the box since 2008, that's 15 years ago and now I take it for granted.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      PostgreSQL has it as well! But it is much more manual than using something like timescaleDB.

  • @magfal
    @magfal 7 หลายเดือนก่อน

    For a lot of these usecases columnar is an even better solution.
    Hydra's fork of Citus Columnar is my goto for billion row problems.

    • @magfal
      @magfal 7 หลายเดือนก่อน

      I'd love for Timescale to include first party support for Columnar hypertables to handle append only data a lot faster and more compact.

  • @Fanaro
    @Fanaro 9 หลายเดือนก่อน

    Please do a video on your Linux setup! I wish your stuff were available on Archcraft as an out-of-the-box setup.

  • @kokizzu
    @kokizzu 9 หลายเดือนก่อน +2

    id rather use clickhouse for timeseries '__')

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      Clickhouse is pretty great. There's definitely some caveats when using it (as with all columnar stores) but if it's the right choice for you then I'm vibing.

  • @gjermundification
    @gjermundification 9 หลายเดือนก่อน

    How is `create_hypertable` defined? What would this be named if I was to roll my own.

  • @professortrog7742
    @professortrog7742 9 หลายเดือนก่อน +1

    No index on the standard table. Oh look, now it is slow.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      We once had a database table of about 400GB where they had attempted to put an index on it. The problem with indexing is you need to use the correct index for your query otherwise it can actually hinder performance.
      I'll be doing another video on this phenomenon 😁

  • @mikopiko
    @mikopiko 9 หลายเดือนก่อน +1

    In short, you can solved Postgres biggest weakness by extending it with timescaledb and using a so called "hypertable" which cuts down the query time by a lot.

  • @victortesla2935
    @victortesla2935 9 หลายเดือนก่อน

    Nvchad for java, pls 🙂

  • @dalicodes
    @dalicodes 9 หลายเดือนก่อน

    The github link is broken

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      Thank you for letting me know! Should be fixed now.

  • @minhgiangho1410
    @minhgiangho1410 9 หลายเดือนก่อน

    I created a hypertable with 32 columns and inserted 30 million rows to test. With every symbol (a column of that table), the query to slow at the first call (more than 10 seconds), and really fast (less than 50 ms) on subsequent calls. In my opinion, the first call is too slow and unacceptable

    • @jonatasdp
      @jonatasdp 7 หลายเดือนก่อน

      Have you checked if this is a Postgres TOAST slowing you down? As all the columns doesn't fit in a single row, you'll need an extended storage which will be done on an external table.

  • @matthwz
    @matthwz 9 หลายเดือนก่อน

    A suggestion for an upcoming video could be about your Arch Linux setup. (wm/tools, etc)

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      Great suggestion! I shall add it to my video backlog.

  • @gokukakarot6323
    @gokukakarot6323 2 หลายเดือนก่อน

    3.1GB vs 3.2MB I am shocked how its faster.

  • @victortesla2935
    @victortesla2935 9 หลายเดือนก่อน

    Which linux distribution you use?

    • @victortesla2935
      @victortesla2935 9 หลายเดือนก่อน +1

      Arch?

    • @mikopiko
      @mikopiko 9 หลายเดือนก่อน

      @@victortesla2935 Looks like macOS

    • @cheebadigga4092
      @cheebadigga4092 9 หลายเดือนก่อน +1

      he's on macOS bro

    • @victortesla2935
      @victortesla2935 9 หลายเดือนก่อน

      @@cheebadigga4092 no as far as I remember in the nvchad video he said he is installing it in arch

    • @wondays654
      @wondays654 9 หลายเดือนก่อน +2

      @@cheebadigga4092it’s arch, the Mac type thing is just animation. I asked in a previous video.