Game Engine Programming: A Simple Package Format

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ม.ค. 2022
  • We design and implement a package format suitable for large programs with lots of data.
  • เกม

ความคิดเห็น • 108

  • @thetastefultoastie6077
    @thetastefultoastie6077 2 ปีที่แล้ว +190

    Wonderfully put:
    "This is actually a very true thing across all software development: When you're solving a specific problem you can often be very efficient, and when you're solving a general problem you often can't be very efficient, because efficiency requires assumptions, or invariants you can guarantee, and the more general you are trying to be, the fewer invariants you can guarantee." - Jon Blow 2022

    • @RobertHildebrandt
      @RobertHildebrandt 2 ปีที่แล้ว +14

      Timestamp for my future self: 2:50

    • @nano8640
      @nano8640 2 ปีที่แล้ว +2

      idk, this is *generally* true, but in programming languages that emphasize polymorphism, there are facilities (usually a strong type system) for guaranteeing invariants

    • @GonziHere
      @GonziHere 2 ปีที่แล้ว +10

      ​@@nano8640 You expect a valid pointer. You can either "assume" that it's valid (and crash when it isn't) Or you can "check" that it actually is valid (and pay the price for it). No facility will help you with that for free. Or, more "high level" example: you can provide user with an input to fill out the email. You can again trust the user, or you'll need to write a validation for it. (or use browser email fields, which just means that the validation is written elsewhere, by someone else, but still runs on the CPU).

    • @astroid-ws4py
      @astroid-ws4py 2 ปีที่แล้ว +2

      This is also true in chip design, Custom made FPGAs or ASICs can be faster then using a general CPU or GPU when doing just one specific task.

    • @nano8640
      @nano8640 2 ปีที่แล้ว +1

      @@GonziHere what part of this invalidates what i said? i did say this is *generally* true :)
      but i think your examples are flawed, email validation happens somewhere, if you don't validate the form you are just kicking the can down the road, and parsing an email executes in microseconds
      in the pointer case, there are literally several fields of pl theory dedicated to this very problem! linear types, borrow checkers, smart pointers, etc
      but yes, there are plenty of examples where generalizing makes things slower. a good programmer will generalize *up to* these cases unless performance doesn't matter

  • @malusmundus-9605
    @malusmundus-9605 2 ปีที่แล้ว +91

    I've been coding a long time (15 years) and have been learning butt-loads of great stuff from this channel. If you are a younger programmer and are unsure of this man's coding ability I assure you he excels where it counts- pay attention to what he says! This is especially true if you make games.

  • @KnThSelf2ThSelfBTrue
    @KnThSelf2ThSelfBTrue 2 ปีที่แล้ว +6

    I feel like it's so rare that I find a video of Jonathan Blow not deeply in the middle of something. Kinda refreshing watch someone cook something up relatively from scratch.

  • @flyingscarf5863
    @flyingscarf5863 2 ปีที่แล้ว +28

    Hi Jonathan, I hope you are doing well.
    I’d like to thank you for your work on The Witness and for sharing these videos on TH-cam.
    Sometimes technical about game design, programming and sometimes more on the human side, talking about mental health, perception and development, they are relevant and quite welcome!
    Thanks!

  • @educate9946
    @educate9946 2 ปีที่แล้ว +25

    Oh yes, the "simp" format.

  • @Realspace2
    @Realspace2 2 ปีที่แล้ว +25

    Nit-pick: I think you missed a reserved u32 in the TOC header struct (bytes 4-7 are non-explicit padding)

    • @jblow888
      @jblow888  2 ปีที่แล้ว +26

      Indeed, this happened. Thanks!

  • @MichiGombocz
    @MichiGombocz 2 ปีที่แล้ว +5

    i remember sitting in a lecture at my university (Graz, Austria). first semester, was convinced lectures are important.
    ive watched your channel before starting to study and looked up to Jonathan as a mentor!
    i skipped game-dev-days at my uni becuase i had a lecture - 2 days later i learned that you gave a talk at my small univerity while i was in a lecture without mandatory attendance, missing your talk. It still hurts a few years later!
    Thanks for being such a great person, you care to share, it's just lovely.

  • @microcolonel
    @microcolonel 2 ปีที่แล้ว +7

    As for being able to see the package as files during development, it might be interesting to write a FUSE filesystem or whatever the equivalent is on 'doze; then you could just mount it. As for reading the entire file vs mmapping it, they're basically equivalent, you might as well just mmap every time. Implementing your packfile system with an mmap-style API also resolves the "free" problem, since you don't really need to free mmap regions.

  • @Albileon
    @Albileon 2 ปีที่แล้ว +3

    Nice to see you back on TH-cam!

  • @samurainak5828
    @samurainak5828 2 ปีที่แล้ว +6

    Glad to see you back on TH-cam. Love your videos

  • @timtreichel3161
    @timtreichel3161 2 ปีที่แล้ว

    This is great. The problem was simple enough, so that I could understand everything and I learned a view things. For example, at first I did not understand why the ToC would be at the end of the file. intuitively I would defenitely think, that a ToC should be at the start of the file. However, after your simple explanation it was obvious why having it at the end of the file is more efficient (when adding content later on). Furthermore you seemed to be in a good mood, which made this really enjoyable.

  • @dimarichmain
    @dimarichmain 2 ปีที่แล้ว +1

    Soog these blogs are back. You gave me some ideas and inspired me a lot.

  • @timallanwheeler
    @timallanwheeler 2 ปีที่แล้ว +5

    Thanks for the video! It was nice and self-contained, with a lot of good best practices. Loved how it all came together in the end on the real game. The way Jai efficiently casts between types (and

    • @10e999
      @10e999 2 ปีที่แล้ว

      I agree. Especially for the "self-contained" part.

  • @Dorumin
    @Dorumin 2 ปีที่แล้ว +2

    12:10 I figured that we did streaming I/O (whether lines or chunks) because of disk latency and support for streams/pipes which are super prevalent in Unix command line applications. Of course, streaming is often unnecessary complexity, but the difference from loading the file and executing the processing on the CPU is still orders of magnitude, so it's hard to make a definitive statement for all workloads

  • @32gigs96
    @32gigs96 2 ปีที่แล้ว +53

    Welcome back on TH-cam, king

  • @koosa6289
    @koosa6289 2 ปีที่แล้ว

    You're like the Da Vinci of modern gaming and i can just watch one of your sort of,, behind the scenes'' videos and just leave the comment. That's pretty sweet.

  • @double051
    @double051 2 ปีที่แล้ว +1

    Great topic

  • @MenkoDany
    @MenkoDany 2 ปีที่แล้ว +16

    For those interested, Twitch VOD link until TH-cam finishes processing something other than 360p: www.twitch.tv/videos/1267465449

    • @jan2679
      @jan2679 2 ปีที่แล้ว +1

      VOD starts at 14:10
      www.twitch.tv/videos/1267465449?t=0h14m12s

  • @Sp1derFingers
    @Sp1derFingers 2 ปีที่แล้ว

    Welcome back! :)

  • @bruterasta
    @bruterasta 2 ปีที่แล้ว +2

    Holly shit official upload.

  • @celeb_17
    @celeb_17 2 ปีที่แล้ว +2

    OMG! I'm so happy!

  • @letheward6
    @letheward6 2 ปีที่แล้ว

    To avoid casting everywhere, maybe use u64 for language built-in array/string count? Personally I don't find using s64 for this more natural, and we lose 1 bit. Probably can search for casting in the entire game code base and code from other users, and find out which choice is better.

  • @NKCSS
    @NKCSS 2 ปีที่แล้ว +3

    1:24:00 why not add a version number at the start? That way you don't have to just padd out 'reserved' stuff now; just implement what you need, version the load/create; that way you can just write what you need, when you need to update it later, you can if you want, and a migration/upgrade would just be read_specific_version, write_new_version (specify any additional data it would need).

    • @colinkennedy
      @colinkennedy ปีที่แล้ว +1

      That question comes up at th-cam.com/video/bMMOesLMWXs/w-d-xo.html (2:06:30). He doesn't really answer it. Apparently it's easier for users of the format to not have to change their code

  • @notuxnobux
    @notuxnobux ปีที่แล้ว

    You want that memcpy in the put because otherwise you might get unaligned memory write which can crash the program depending on the cpu architecture.

  • @StardidiMarcelis
    @StardidiMarcelis 2 ปีที่แล้ว +3

    Out of curiosity: have you considered using hashes of the filename / IDs instead of a string path? That would allow for fixed-size Entry_Info and save some package/memory size.
    Complexity not worth the performance gains?

    • @wiipronhi
      @wiipronhi 2 ปีที่แล้ว

      That is also an optimisation you can do if your willing to deal with the added complexity.
      A simple implementation is simple to manage but at a certain point you might want to use a non standard hashing algorythem because it doesn't allocate, runs faster or takes up less space in memory maybe is more optimised for hash less/no collisions (I've seen dictionaries which simply assume all hashes are unique) given your style of paths. These can lead to very strange and hard to track down bugs once these sorts of structures are deeply embedded in code (especially if someone has just changed the hashing algorythem)...

  • @NKCSS
    @NKCSS 2 ปีที่แล้ว

    I thought you would build it in a way, that, while developing, if a file exists on disk, you'd load that, if not, read it from the package; that way, you can have your packaged up contents, but still work on stuff without having to constantly repackage, but from the first 14 minutes, it seems like you went for an on/off toggle approach. I wonder why that is; will watch the other 3h15m to see if I find out 😊

  • @sergesolkatt
    @sergesolkatt 2 ปีที่แล้ว

    Yesssssss

  • @totheknee
    @totheknee 2 ปีที่แล้ว

    Middle endian is a thing. For example, 32-bit integers on a PDP-11. 0xcafebabe would be stored as 0xfe 0xca 0xbe 0xba.

  • @microcolonel
    @microcolonel 2 ปีที่แล้ว +1

    Pakfiles win for co-compression and such, but if you don't have compression then the filesystem doesn't have to be slower than the average pakfile; but NTFS is definitely slower than pakfiles lol.
    Heck, SQLite is faster than NTFS.

  • @Larandar
    @Larandar 2 ปีที่แล้ว +2

    I know I come late, but I find myself asking: "Why not 'just' use a good TAR library both for creating and reading the package?". I feel like it fit most, if not all, the prerequisite?

  • @Spongman
    @Spongman 2 ปีที่แล้ว +2

    by reserving memory for a file (instead of just mmapping it) you're forcing the VM system to page that file out to swap instead of just being able to discard it. if you have a read-only file on disk, and you're fine with it remaining so, you're pretty much _always_ better off mmapping the file than loading it into memory. appropriate use of madvise and FILE_FLAG_SEQUENTIAL_SCAN is advised.

    • @jblow888
      @jblow888  2 ปีที่แล้ว +8

      Sorry, but this just isn't true if you have reasonable performance expectations. Neither is it true if you want to target a lot of platforms -- most video game platforms today do not swap to disk.

    • @GiantRobot17
      @GiantRobot17 2 ปีที่แล้ว

      Bodied

    • @Spongman
      @Spongman 2 ปีที่แล้ว +2

      @@jblow888 right, i was assuming desktop OS, for which that is correct. cold read performance is going to be approximate (especially if page-faults are mitigated via async prefetching), warm read performance is going to favor mmap (no copy required), and under memory pressure the mmap is always going to win because it doesn't need to page out. of course, if you're assuming that your users never hit any memory pressure because they all have the funds to pimp out their machines like silicon valley game developers do, then i guess it's not an issue. let me know indeed...

    • @Spongman
      @Spongman 2 ปีที่แล้ว +1

      @@GiantRobot17 > Bodied
      Go Team!!!

  • @limitholdem3621
    @limitholdem3621 2 ปีที่แล้ว

    Are there links to the very first episodes of working on a package format?

  • @colinkennedy
    @colinkennedy ปีที่แล้ว

    I've watched through the whole thing and am trying to write a C++ equivalent of both the Create_Package and Load_Package. What I don't understand is this binary package format writes any bytes to-disk, okay, but then how does the Load_Package know how to reconstruct the types that those bytes represent? e.g. how does the Load_Package know that XYZ bytes refers to a string vs an array of floats vs array of ints? Jonathan tested that the package format is working by using it directly in his game so I'm wondering if the answer is just a detail of how the game loads and handles data. Right now it's hard to tell.

  • @khoavo5758
    @khoavo5758 7 หลายเดือนก่อน

    58:58 "toc!" magic number is tight!

  • @NKCSS
    @NKCSS 2 ปีที่แล้ว +1

    2:42:22 I notice you use a lot of strings to define paths/locations and a lot of repeating; why not store them in a const somehwere, part by part? That way, if you ever want to change any part of the structure, it's changing 1 single const. If your editor supports it, you can also use stuff like 'find all references' and see all the places where it's used easily, etc.
    Helped me a lot when refactoring stuff in the past, to just set a rule for myself: Never define something twice, no magic numbers/literals; always move them to a central place, give them meaningfull names and save yourself a lot of headaches.

  • @tauraamui
    @tauraamui 2 ปีที่แล้ว

    Dunno where else to ask this, but is j_blow's Twitch channel unavilable?

  • @bruterasta
    @bruterasta 2 ปีที่แล้ว +7

    0:50 all_paint_data only one per level, total count 705 - excuse me what?

    • @Jack-hd3ov
      @Jack-hd3ov 2 ปีที่แล้ว +6

      I think 1000+ levels is the goal, can't remember when he said that, maybe here? th-cam.com/video/_tMb7OS2TOU/w-d-xo.html

    • @Turellio
      @Turellio 2 ปีที่แล้ว +7

      this sokoban game's level count / amount of gameplay mechanics interacting with each other seems to be a very important feature. judging by the occasional gameplay stream there's enough content for a few separate games

  • @NKCSS
    @NKCSS 2 ปีที่แล้ว

    3:01:20 Why even keep /data/levels/ ? This is the levels package, right? You could specify a base path in the package header or something if you wanted to preserve that and just have arms_lengthv2/all.entities as the file name (or, since everything is actually all.entities right now, just arms_lengthv2).

    • @NKCSS
      @NKCSS 2 ปีที่แล้ว +2

      😅I should probably wait a few minutes before commenting as you literally talked about it a few minutes after 😅

  • @totheknee
    @totheknee 2 ปีที่แล้ว

    NTFS file paths can't even be over 32 kiB long, and each component is limited to 255 bytes. Just sayin'... (Unless this is the year of the Mac/Linux Gaming Desktop? Or, you might as well simply disallow problem-programmers from having absurd directory structures over 100 deep?)

  • @AldricBocquet
    @AldricBocquet 2 ปีที่แล้ว

    What's with the "\" in the function name (1:56:02) ? Is it ignored by the compiler to let the dev align the function names properly ?

    • @vaualbus
      @vaualbus 2 ปีที่แล้ว

      I guess is a typo.... And probably a compiler bug result in not a syntax error reported?

    • @chyza2012
      @chyza2012 2 ปีที่แล้ว +2

      yes

    • @TheSandvichTrials
      @TheSandvichTrials 2 ปีที่แล้ว +7

      As you thought, it is to allow for alignment mid-identifier.

  • @fl0yd13
    @fl0yd13 2 ปีที่แล้ว

    Is there a strong reason as to why the TOC goes at the end of the file rather than straight after the header?

    • @jblow888
      @jblow888  2 ปีที่แล้ว +6

      It's so that, if you want to append stuff to an archive that's on disk, you can just put it at the end of the current data section, and write a new TOC after that. Good luck doing this if the TOC is at the beginning (that said, since all useful storage is block-based you should in principle just be able to insert into the middle of a file, but no OS seems to support this as far as I know?)

    • @fl0yd13
      @fl0yd13 2 ปีที่แล้ว

      @@jblow888 makes sense, thank you!!

    • @antonofka9018
      @antonofka9018 2 ปีที่แล้ว

      He mentions it in the video too

    • @fl0yd13
      @fl0yd13 2 ปีที่แล้ว +1

      @@antonofka9018 I must have missed it

  • @ntkidding
    @ntkidding 2 ปีที่แล้ว

    Hold on, what's this language? I can't recognize it. Function definition looks like haskell, some of the code reminds me of c++ or golang. Thanks!

    • @APaleDot
      @APaleDot 2 ปีที่แล้ว

      It's his language, jai.

  • @olleicua
    @olleicua 2 ปีที่แล้ว +3

    I'm less familiar with this language; what does the [..] array syntax do?

    • @fzy81
      @fzy81 2 ปีที่แล้ว +4

      basically equivalent to std::vector

    • @chyza2012
      @chyza2012 2 ปีที่แล้ว +8

      @WS WS [..] is a dynamic array, it has a data pointer, a size, and a reserved size, [] is just a slice, it has a data pointer and a size, [..] automatically casts down to [].
      You're not supposed to pass [..] around by value because if someone appends to it the original won't be changed, but you can freely pass [] because its just a view of the data, someone could theoretically append to it still but they'd need to manually convert it to a [..] and that's an obvious red flag.

  • @2nafish117
    @2nafish117 2 ปีที่แล้ว

    You can probably use this system to create save files too right?

  • @solii01
    @solii01 2 ปีที่แล้ว +4

    Im just at minute 10, but something I really dislike about some games is when the packages are too large. For example I used to play PUBG where basically every update they made I had to download 10GB+ with my shitty internet connection.

    • @GodOfMacro
      @GodOfMacro 2 ปีที่แล้ว +3

      that's more a delivery issue than a packaging issue no ? if the game requires heavy assets, the package is bound to be big, why big files are more an issue than small files if the end size is the same ?
      You should be able to do partial resumable downloads, streaming assets is something I've seen achieved in some games too

    • @solii01
      @solii01 2 ปีที่แล้ว

      @@GodOfMacro True! I was not criticizing the practice, just observing potential problems! They usually only did small updates but shipped the full package every time anyway.

    • @desertfish74
      @desertfish74 2 ปีที่แล้ว

      This!

  • @antonofka9018
    @antonofka9018 2 ปีที่แล้ว

    At 2:26:10, you're doing that ugly array cast. Consider adding an array_cast(type, array) as a standard helper function, because array casts are quite common when working with memory at a low level.
    D has their normal casting mechanism handle that case. In D, you can just do cast(byte[]) on a float array and it would do the same thing you did there. This is arguably worse than having a function, because it adds more meaning to the already overloaded cast keyword.
    Anyway, my point is, I think the language should provide cleaner means of doing that sort of thing.

    • @antonofka9018
      @antonofka9018 2 ปีที่แล้ว

      2:30:00 if you just return a value, no. There's this thing called RVO - return value optimization, for anyone wondering. I'm pretty sure the standard guarantees that returning that simply like that does not do any copy constructor / move semantics garbage. Moreover, move semantics are discouraged in such a scenario, because they would prevent the compiler from applying RVO.
      Now I'm not that proficient at that, I mean, I don't know what you meant by "the C stack thing", so please correct me if I'm wrong.

  • @junosoft
    @junosoft 2 ปีที่แล้ว

    Is the source available? Would it be available?

    • @CianMcsweeney
      @CianMcsweeney 2 ปีที่แล้ว +8

      Jon has said that the base engine of this game will ship with the compiler as an example

    • @junosoft
      @junosoft 2 ปีที่แล้ว +3

      @@CianMcsweeney seems fine to me

  • @Jkauppa
    @Jkauppa 2 ปีที่แล้ว +2

    you could just zip it

    • @dandymcgee
      @dandymcgee 2 ปีที่แล้ว +8

      Have you ever looked up how .zip files work? They're quite over-complicated and it's technically impossible to find their central directory, since the files can end with arbitrary length data up to 32K (which could contain the central directory header byte sequence), as well as contain an arbitrary number of central directory records. They were designed to be an append-friendly storage format in a world full of floppy disks, and have loads of deprecated nonsense. Sure, it's a very common file format, but it's not really one logical or solid file format, and is super general purpose compared to a simple, home-grown pack file. I really like RIFF containers, which are used by various tools (especially audio file formats) and are quite straightforward.

    • @Jkauppa
      @Jkauppa 2 ปีที่แล้ว

      @@dandymcgee if you have the code ready, then its simple to use, and you got the zip support, this is the java method, so if its impossible, then why its being used, zip it

    • @Jkauppa
      @Jkauppa 2 ปีที่แล้ว

      @@dandymcgee you have the compression too, in a standard form, you can access your files with OSes directly, etc benefits without fuzz

    • @Jkauppa
      @Jkauppa 2 ปีที่แล้ว

      @@dandymcgee sounds like an excuse to overcomplicate things, or maybe you are doing all things from ground up, its like you would use png but reject zip, not logical, do your own image formats too, or what Im to say, but it could be k.i.s.s. too, overall simple

    • @Jkauppa
      @Jkauppa 2 ปีที่แล้ว

      @@dandymcgee you can but dont have to solve problems that have easy simple solutions already, some ip issues that you dont want to zip it, patents to the trash

  • @Bat-Georgi
    @Bat-Georgi 2 ปีที่แล้ว

    My man uploaded in 360p.