43:37 "cause you're assuming that the person running your program is not switching the CPU under you". The audience might be laughing, but this actually is sometimes a case, even in consumer hardware. Non-US Samsung Galaxy S9 has a heterogenous CPU, with some cores supporting the atomic increment instruction LDADDAL and others not, and with Linux kernel modified by Samsung to report that all cores support that instruction. Your program would crash after being rescheduled to another core.
Wow, you brought up a good point, I mean his assumption is more or less valid in the sense that someone isn't going to hot-swap a CPU on you like a hard drive in a RAID array. However, that changes when you start looking at current changes in CPU architecture. where you can have cores that might be architecturally different. Examples of this are Big-little architecture in Arm CPUs or Intel's p-cores and e-cores in the 12th gen x86-64 CPUs. If intel for example decided to have built that CPU with the microarchitecture of the P cores and E cores unchanged then the P cores would have supported AVX-512 and the E cores wouldn't have. To avoid that issue, both the P cores and E cores support the same instruction set features according to intel.
@@Kitejrpaladin Same instruction set is not enough, other CPU characteristics are already getting switched under us. For example, Mono had a bug in which the JIT failed to flush half the instruction caches on little-cores because they first queried the size on a big-core. And IIRC, some applications/runtimes have shown false sharing because they first queried the CPUID on an e-core, which reports a model that formerly required half the stride. At this point, they might as well drop the instruction set requirement.
You sir, confirmed a theory I've had for the past year but was unable to work out: you absolutely can parse (at least partially) using SIMD. Which opens up the door to parsing with a GPU. Absolutely fantastic!
The real trick is "shotgun parsing", if you can break the input up into chunks and partially parse each chunk and recombine the results into a final parse: it looks to me like the quote escape and category bits could be separately processed and combined, for example. It would be interesting to see if that speeds things up when it's already so optimised: for example x264 is a CPU SIMD library to encode H.264 video that beat the contemporary GPU encoding libraries, many of them written by the GPU vendors!
The real problem is that not everything can be parallelized. Certain algorithms are just impossible to parallelize beyond a certain point if ever. For instance, algorithms like SHA or AES looks like they'd benefit from parallelizing at first glance but then you realize that they just can't be parallelized because each byte depends on the previous byte. And, let's not forget that maintainable code is better than efficient code most of the time. The mathematical shenanigan pulled by this library may be fast but it'd need a 3 day class to explain to anyone looking to maintain the code.
@@trinopoty yes, absolutely, but there are parsing algorithms which, to extremely oversimplify, grab chunks of text and then determine what it is, rather than reading byte by byte, char by char, etc. SHA and AES are encryption not text parsing. ;)
@@PatrickKellyLoneCoder Yes those are not text parsing, just random examples that came to mind. You can do text parsing in chunks if you can break it down properly. But, like I said, at what cost?
Has anyone managed to figure out the XOR multiplication Instruction Set and CPU Instruction he mentioned @35:54 ? The entire lecture is too darn too good ... very relevant in this day and age. All the kids think that NodeJS or Serverless is the new hot topic. Most people forget that we are still limited by the hardware and wiring that exists.
Looks to me like arithmetic shift left (zeros are rotated in on the right, rather than taking the bit from the left), or SARL in x86, combined with simple XOR. As for using AVX/2 to do it on multiple characters at once: template __m256i _mm256_shift_left(__m256i a) { __m256i mask = _mm256_permute2x128_si256(a, a, _MM_SHUFFLE(0,0,3,0) ); return _mm256_alignr_epi8(a,mask,16-N); }
Did some googling and I’m pretty sure he says “carry-less multiplication”, which has a lot of results and involves xor. The math isn’t super clear to me though.
It is the carryless multiplication, also called vector multiplication or polynomial multiplication. You can use it via intrinsics like vmull_p64 and _mm_clmulepi64_si128. It is basically "like" a multiplication but where the addition is an xor. So 3*a would be a XOR (a
@DanielLemirePhD Hey Daniel- amazing work. Can you point out any good resources for all the bit shifting magic that people have come up with over the years? Or even general guidelines / glossaries of the techniques
Impressive bitfields wizardry 😮 Thank you. This talk is giving me inspiration to rewrite a few things without branches, and learn more about "bitwise" intrinsic operators !
Patrick Proctor Of course ! Maybe I should have written « without branches when they are not really necessary ». I used to think that branches were always « harmless » but now I understand potential mispredictions.
@@yohan9577 There's a lot of information about this in the gaming space. I don't remember what it's called but there's a presentation from Sony about optimizing games, and they found quite a few cases where outright calculating large things and not needing them was better than a branch misprediction. It's weird. But yes, always always always benchmark and profile. Branching can speed up your code when correctly predicted. Inlining can actually hurt performance in certain cases. And so on.
@Patrick Kelly - It's not weird, it's a matter of cache, values dependencies and many very concrete things. I love theory, but some "expert" freshly out of their PhD should just keep a low profile instead of reducing everything to some abstract or "standard" stuff, over-abusing the "don't reinvent the wheel" to the point we'd still be with stone wheels if there weren't some "pragmatic" guy who actually look how things work and what it imply. Yes benchmark, but if you're satisfied with your benchmark believing it's the fastest thing in the world, you might be wrong.... So benchmark and also ask the old fellas who did impressive things when there were less than ONE MEGAbyte of RAM, less than 66 MEGAHz of clock, 32 bits (both latter when you were lucky or not so old). Don't reject them as "obsolete old C programmers".
If anyone else is confused by slide 40, 41; then there is an error where the order should be brackets -> 1, comma ->2, colon ->4, most whitespace -> 8, whitespace -> 16, other -> 0. Then the math checks out. Had me scratching my head for a long time...
If you have learned to program assembly language, all of these optimizations are second nature. Everyone should *learn* an assembly language and spend time hand optimizing their code. Learning assembly will make you a better programmer and influence your coding strategies in time-critical areas. That being said, optimizations often cause readability to suffer, so be wise about how you use coding strategies.
I was actually about to write the same thing. If you're in a situation where your critical path is when parsing JSON sent over HTTP requests, you're much better off just using a different protocol. Although for parsing large JSON files, the performance increase of this library can certainly be a big deal. I reckon the speaker also knows this, which is why they benchmarked it using large json files, rather than many small snippet.
JSON uses a textual representation of numbers (and the standard allows for numbers that are unrepresentable by double and float - though most parsers don’t parse all valid JSON, Haskell’s most common parser is a notable exception).
"Our backend developers spend half of their time serializing and deserializing json." From what i've been working around, JSON serializing and deserializing happens and only happens in the entrance or the exit of a request process. During the request process the "JSON" will be something like map/dict or Object. And in a typical backend server, the time saved by JSON serializing compares to DB calls will be almost negligible. But the magical branch-less parsing is so amazing and beautiful.
The typo was reported to me. I suspect it is a copy-paste error (the last character got cut off). This being said, it does remind us that not all JSON is well-formed.
You might be faster by not using the jvm which needs todo emulations of the real platform e.g use posix write/read instead of java‘s sh… and move to a language that was actually designed to be easier analyzed and optimized.
@@ccreutzig I think it is fine to run it with Java. Just use native methods. That's how java works fine in many different platforms. Every time very high performance is needed, native methods can be a good solution. This one here is easily one of them.
@@brunoais Sure, if you have a reason to use Java, go ahead. Many of us don't, and that is fine, too. That wasn't my point - all I was saying is that the video doesn't show Java.
I've used protocol buffers. They are not better in virtually every way: they are better in several ways, and worse in others: just like Avro is better in some ways and worse in others than both of those, and so on for most other formats. There's no simple "best format", and it's foolish to simplify like that, when the available options are so rich.
@Peter Mortensen something that is optimised for the machine to work with, anything but ASCII for starters, probably something that was invented a couple of decades ago
Why not? JSON has many perfectly valid use cases to pass data on between two parties. It's well standardised and has libraries in all kinds of environments. If you need to invent something on your own, you'll certainly introduce bugs every time, and the performance will be worse than JSON too. It will only be better if you can invest a massive amount of time in it. It is an argument against microservices though. In a monolithic program, you can easily communicate over memory structures, but the microservices often use JSON communication which cause slowdowns.
@@sanderd17 Umm, what? You're right about JSON being very standard with a lot of users, but it is not hard to write your own format. Maybe it's harder to write your own format in JavaScript, or other scripting languages, I haven't done that. But I have done it in C++, and it's super easy. Binary formats are even easier to write than text formats. Serialization/deserialization is really easy to test. This presentation has some cool stuff, but it's all totally unnecessary work if you just make a format that's something like a version number, checksum, and dumping a struct to disk. If it's an internally used format, you don't need to do any parsing, and the fastest hash functions can validate the data faster than your memory, much less your disk. If it's an externally used format, it needs a tad more thought given to flexibility, ease of validation, and versioning. But it's just not a hard problem. So as a game developer, JSON just seems like a mediocre data exchange format that involves a bunch of pointless CPU work, and a bunch of pointless optimization effort if you need it. But if performance never becomes an issue, then you're right that using JSON in any managed language takes basically zero development effort, and with so many people using it, it starts to make sense to use it externally from a pure business perspective. Sigh.
@@ChrisHinton42ah, there you say it, you're a game developer. Games typically have very little communication with third party applications. If all code is controlled in-house, it's indeed easy to make a good binary format. And in games, cpu time is a very limited resource, so you are allowed to spend a lot of time on optimizations. I work on business applications: communication between all different parties that require all different data. If the other party supports JSON, I know it's going to be a lot more stable and easier to develop. If they require their own text or binary format, there are always bugs in the protocol. Performance is also a nice-to-have, but not an absolute must. It's nice if a user has a snappy interface, but there's more benefit for the company in automating additional tasks.
@@ChrisHinton42 The original use case for JSON, and probably still the main one, is for browser-server communication on the internet. In those cases you are most of the time limited by network bandwidth and latency. It's very useful there, because you can parse responses from random servers without having to know the format of the message exactly beforehand. It's perfect for quick and relatively small communications over a network, and being human-readable makes it very easy to debug if something goes wrong. If you're going to pass larger amounts of data, especially if the structure is known in advance, something like Protobuf is already going to be many times faster while probably even being easier to implement (as you get all the type checking for free). It's not as useful for web APIs, but for regular applications it's just great. It still isn't the fastest, but odds are you will not be bottlenecked by disk IO... Different formats for different use cases. A Ferrari is faster than a truck, but sometimes you need to move a container. A truck is your best friend in that case.
@@Niosus Parsing JSON and passing it along without doing anything with what's inside sounds like a waste of your limiting resource: bandwidth and latency. Not to mention there's nothing about JSON that makes that better than any other format. And if you're just validating it, isn't it the case that a valid JSON message doesn't necessarily make a valid API message? I still think that the only thing JSON has going for it is that a lot of people use it. And therefore there are many libraries. Text formats are good for debugging, but its super easy to write a binary to text viewer for debug builds and viewing logs. JSON is general purpose. General purpose tools are by definition not tuned to a specific purpose. Expediency or expense is the only reason you would use one. And I guess my point is that making the specialized version is actually pretty easy. And if backends are really spending half their time serializing and deserializing JSON, then it would be way cheaper to use a custom solution. (of course, depending on traffic, any time spent with JSON could be really expensive in absolute terms) But maybe I am being naive. Why is Protobuf not useful for web APIs? I don't know much about it, but it looks approximately like what I would propose as a middle ground between general purpose and specialized.
43:37 "cause you're assuming that the person running your program is not switching the CPU under you". The audience might be laughing, but this actually is sometimes a case, even in consumer hardware. Non-US Samsung Galaxy S9 has a heterogenous CPU, with some cores supporting the atomic increment instruction LDADDAL and others not, and with Linux kernel modified by Samsung to report that all cores support that instruction. Your program would crash after being rescheduled to another core.
what about non-US s7 and s10 ? Got any more info maybe?
Wow, you brought up a good point, I mean his assumption is more or less valid in the sense that someone isn't going to hot-swap a CPU on you like a hard drive in a RAID array. However, that changes when you start looking at current changes in CPU architecture. where you can have cores that might be architecturally different. Examples of this are Big-little architecture in Arm CPUs or Intel's p-cores and e-cores in the 12th gen x86-64 CPUs. If intel for example decided to have built that CPU with the microarchitecture of the P cores and E cores unchanged then the P cores would have supported AVX-512 and the E cores wouldn't have. To avoid that issue, both the P cores and E cores support the same instruction set features according to intel.
@@Kitejrpaladin Same instruction set is not enough, other CPU characteristics are already getting switched under us. For example, Mono had a bug in which the JIT failed to flush half the instruction caches on little-cores because they first queried the size on a big-core. And IIRC, some applications/runtimes have shown false sharing because they first queried the CPUID on an e-core, which reports a model that formerly required half the stride. At this point, they might as well drop the instruction set requirement.
Good lecture for programmers interested in thinking pragmatically rather than just abstractly.
I love this! I never knew about carry-less multiplication. I also didn’t realize that disk speeds and CPU throughput were so similar nowadays
You sir, confirmed a theory I've had for the past year but was unable to work out: you absolutely can parse (at least partially) using SIMD. Which opens up the door to parsing with a GPU. Absolutely fantastic!
The real trick is "shotgun parsing", if you can break the input up into chunks and partially parse each chunk and recombine the results into a final parse: it looks to me like the quote escape and category bits could be separately processed and combined, for example. It would be interesting to see if that speeds things up when it's already so optimised: for example x264 is a CPU SIMD library to encode H.264 video that beat the contemporary GPU encoding libraries, many of them written by the GPU vendors!
The real problem is that not everything can be parallelized. Certain algorithms are just impossible to parallelize beyond a certain point if ever. For instance, algorithms like SHA or AES looks like they'd benefit from parallelizing at first glance but then you realize that they just can't be parallelized because each byte depends on the previous byte. And, let's not forget that maintainable code is better than efficient code most of the time. The mathematical shenanigan pulled by this library may be fast but it'd need a 3 day class to explain to anyone looking to maintain the code.
@@trinopoty yes, absolutely, but there are parsing algorithms which, to extremely oversimplify, grab chunks of text and then determine what it is, rather than reading byte by byte, char by char, etc.
SHA and AES are encryption not text parsing. ;)
@@PatrickKellyLoneCoder Yes those are not text parsing, just random examples that came to mind. You can do text parsing in chunks if you can break it down properly. But, like I said, at what cost?
@@trinopoty If I'm not mistaken, both AES and SHA are block oriented precisely such that they can use SIMD to encode multiple bytes at once.
Has anyone managed to figure out the XOR multiplication Instruction Set and CPU Instruction he mentioned @35:54 ? The entire lecture is too darn too good ... very relevant in this day and age. All the kids think that NodeJS or Serverless is the new hot topic. Most people forget that we are still limited by the hardware and wiring that exists.
Looks to me like arithmetic shift left (zeros are rotated in on the right, rather than taking the bit from the left), or SARL in x86, combined with simple XOR.
As for using AVX/2 to do it on multiple characters at once:
template __m256i _mm256_shift_left(__m256i a)
{
__m256i mask = _mm256_permute2x128_si256(a, a, _MM_SHUFFLE(0,0,3,0) );
return _mm256_alignr_epi8(a,mask,16-N);
}
Did some googling and I’m pretty sure he says “carry-less multiplication”, which has a lot of results and involves xor. The math isn’t super clear to me though.
It is the carryless multiplication, also called vector multiplication or polynomial multiplication. You can use it via intrinsics like vmull_p64 and _mm_clmulepi64_si128. It is basically "like" a multiplication but where the addition is an xor. So 3*a would be a XOR (a
@@DanielLemirePhD Oh that is some wacky stuff. Thank you for the information.
@DanielLemirePhD Hey Daniel- amazing work.
Can you point out any good resources for all the bit shifting magic that people have come up with over the years? Or even general guidelines / glossaries of the techniques
Good old bitwise tricks. Reminds me how I coded 30 years ago.
Impressive bitfields wizardry 😮 Thank you. This talk is giving me inspiration to rewrite a few things without branches, and learn more about "bitwise" intrinsic operators !
Depending on the context (and wizardry of the hardware) branching might still be faster. Make sure you benchmark before & after :)
Patrick Proctor Of course ! Maybe I should have written « without branches when they are not really necessary ». I used to think that branches were always « harmless » but now I understand potential mispredictions.
@@yohan9577 It depends on the program. If it's "small" and doesn't have many, you will probably never miss.
@@yohan9577 There's a lot of information about this in the gaming space. I don't remember what it's called but there's a presentation from Sony about optimizing games, and they found quite a few cases where outright calculating large things and not needing them was better than a branch misprediction. It's weird.
But yes, always always always benchmark and profile. Branching can speed up your code when correctly predicted. Inlining can actually hurt performance in certain cases. And so on.
@Patrick Kelly - It's not weird, it's a matter of cache, values dependencies and many very concrete things.
I love theory, but some "expert" freshly out of their PhD should just keep a low profile instead of reducing everything to some abstract or "standard" stuff, over-abusing the "don't reinvent the wheel" to the point we'd still be with stone wheels if there weren't some "pragmatic" guy who actually look how things work and what it imply.
Yes benchmark, but if you're satisfied with your benchmark believing it's the fastest thing in the world, you might be wrong.... So benchmark and also ask the old fellas who did impressive things when there were less than ONE MEGAbyte of RAM, less than 66 MEGAHz of clock, 32 bits (both latter when you were lucky or not so old). Don't reject them as "obsolete old C programmers".
Great example, thank you. DeltaJSON is really useful if you are working with JSON, it does compare, merge and graft.
If anyone else is confused by slide 40, 41; then there is an error where the order should be brackets -> 1, comma ->2, colon ->4, most whitespace -> 8, whitespace -> 16, other -> 0. Then the math checks out. Had me scratching my head for a long time...
Interesting lecture with good mix of high-level strategy and low-level (pseudo) implementation details.
This is an amazing presentation. Thank you!
If you have learned to program assembly language, all of these optimizations are second nature. Everyone should *learn* an assembly language and spend time hand optimizing their code. Learning assembly will make you a better programmer and influence your coding strategies in time-critical areas. That being said, optimizations often cause readability to suffer, so be wise about how you use coding strategies.
GTA5 uses the inverse of this for its json parsing, it takes 6 minutes to parse a 10 megabyte json file for some ungodly reason
Aren't you supposed to use classes like LeftSquareBracketMatcherStrategyFactory?
and LeftSquareBracketMatcherStrategyStringReturnerFactory?
Long live Kevlin Henney xD
Very impressive, and nice speech. Thanks.
Very nice talk.
Its really excellent approach. But I have no idea about how to check the cycles.. Is there any tool?
perf
Now I wish I had a sticker!
I think if you need that level of performance, just ditch JSON in favor of something binary based like Protobuf.
I was actually about to write the same thing. If you're in a situation where your critical path is when parsing JSON sent over HTTP requests, you're much better off just using a different protocol. Although for parsing large JSON files, the performance increase of this library can certainly be a big deal. I reckon the speaker also knows this, which is why they benchmarked it using large json files, rather than many small snippet.
If you want fast float serialization you should probably dump the in-memory representation, no?
JSON uses a textual representation of numbers (and the standard allows for numbers that are unrepresentable by double and float - though most parsers don’t parse all valid JSON, Haskell’s most common parser is a notable exception).
"Our backend developers spend half of their time serializing and deserializing json."
From what i've been working around, JSON serializing and deserializing happens and only happens in the entrance or the exit of a request process. During the request process the "JSON" will be something like map/dict or Object. And in a typical backend server, the time saved by JSON serializing compares to DB calls will be almost negligible. But the magical branch-less parsing is so amazing and beautiful.
5:15 I'm sure Douglas Crockford would at least write valid JSON, i.e. not missing a closing curly braces...
The typo was reported to me. I suspect it is a copy-paste error (the last character got cut off). This being said, it does remind us that not all JSON is well-formed.
Now i wonder what magic a binary format could do with SIMD bit magic. I know Protocol_Buffer and some other like BJSON are not doing this.
👌🏼
I have started learning Jason this year can somebody tell me what's going on on the video
So so sorry It was spoken or written
@@pictureus What Systems Planet said was not "elitist" in the slightest? he gave him the exact answer in the simplest terms possible.
You might be faster by not using the jvm which needs todo emulations of the real platform e.g use posix write/read instead of java‘s sh… and move to a language that was actually designed to be easier analyzed and optimized.
I believe that is part of what he is doing. None of his optimised code is going to work in Java.
@@ccreutzig I think it is fine to run it with Java. Just use native methods. That's how java works fine in many different platforms. Every time very high performance is needed, native methods can be a good solution. This one here is easily one of them.
I think it is fine having it running in the JVM but as a native method.
@@brunoais Sure, if you have a reason to use Java, go ahead. Many of us don't, and that is fine, too. That wasn't my point - all I was saying is that the video doesn't show Java.
@@ccreutzig Oh. I misunderstood your comment.
I've coded in Java, C and many others... Which is fine too :)
Zen2 also has AVX512
It does not.
@@DanielLemirePhD yes my bad, thanks
Zen 5 does
Nice content, but somewhat painful to listen to.
yeah i love it but i can't stop myself from hearing him say "jism" every time he says "json"
Or... stop using JSON and start using Protocol Buffers. Better in virtually every way.
I've used protocol buffers. They are not better in virtually every way: they are better in several ways, and worse in others: just like Avro is better in some ways and worse in others than both of those, and so on for most other formats. There's no simple "best format", and it's foolish to simplify like that, when the available options are so rich.
Getting rid of JSON might be a good idea then.
@Peter Mortensen something that is optimised for the machine to work with, anything but ASCII for starters, probably something that was invented a couple of decades ago
And the moral of the story is: don't use JSON
Why not? JSON has many perfectly valid use cases to pass data on between two parties. It's well standardised and has libraries in all kinds of environments. If you need to invent something on your own, you'll certainly introduce bugs every time, and the performance will be worse than JSON too. It will only be better if you can invest a massive amount of time in it. It is an argument against microservices though. In a monolithic program, you can easily communicate over memory structures, but the microservices often use JSON communication which cause slowdowns.
@@sanderd17 Umm, what? You're right about JSON being very standard with a lot of users, but it is not hard to write your own format. Maybe it's harder to write your own format in JavaScript, or other scripting languages, I haven't done that. But I have done it in C++, and it's super easy.
Binary formats are even easier to write than text formats. Serialization/deserialization is really easy to test. This presentation has some cool stuff, but it's all totally unnecessary work if you just make a format that's something like a version number, checksum, and dumping a struct to disk. If it's an internally used format, you don't need to do any parsing, and the fastest hash functions can validate the data faster than your memory, much less your disk. If it's an externally used format, it needs a tad more thought given to flexibility, ease of validation, and versioning. But it's just not a hard problem.
So as a game developer, JSON just seems like a mediocre data exchange format that involves a bunch of pointless CPU work, and a bunch of pointless optimization effort if you need it. But if performance never becomes an issue, then you're right that using JSON in any managed language takes basically zero development effort, and with so many people using it, it starts to make sense to use it externally from a pure business perspective. Sigh.
@@ChrisHinton42ah, there you say it, you're a game developer. Games typically have very little communication with third party applications. If all code is controlled in-house, it's indeed easy to make a good binary format. And in games, cpu time is a very limited resource, so you are allowed to spend a lot of time on optimizations.
I work on business applications: communication between all different parties that require all different data. If the other party supports JSON, I know it's going to be a lot more stable and easier to develop. If they require their own text or binary format, there are always bugs in the protocol.
Performance is also a nice-to-have, but not an absolute must. It's nice if a user has a snappy interface, but there's more benefit for the company in automating additional tasks.
@@ChrisHinton42 The original use case for JSON, and probably still the main one, is for browser-server communication on the internet. In those cases you are most of the time limited by network bandwidth and latency. It's very useful there, because you can parse responses from random servers without having to know the format of the message exactly beforehand. It's perfect for quick and relatively small communications over a network, and being human-readable makes it very easy to debug if something goes wrong.
If you're going to pass larger amounts of data, especially if the structure is known in advance, something like Protobuf is already going to be many times faster while probably even being easier to implement (as you get all the type checking for free). It's not as useful for web APIs, but for regular applications it's just great. It still isn't the fastest, but odds are you will not be bottlenecked by disk IO...
Different formats for different use cases. A Ferrari is faster than a truck, but sometimes you need to move a container. A truck is your best friend in that case.
@@Niosus Parsing JSON and passing it along without doing anything with what's inside sounds like a waste of your limiting resource: bandwidth and latency. Not to mention there's nothing about JSON that makes that better than any other format. And if you're just validating it, isn't it the case that a valid JSON message doesn't necessarily make a valid API message?
I still think that the only thing JSON has going for it is that a lot of people use it. And therefore there are many libraries. Text formats are good for debugging, but its super easy to write a binary to text viewer for debug builds and viewing logs.
JSON is general purpose. General purpose tools are by definition not tuned to a specific purpose. Expediency or expense is the only reason you would use one. And I guess my point is that making the specialized version is actually pretty easy. And if backends are really spending half their time serializing and deserializing JSON, then it would be way cheaper to use a custom solution. (of course, depending on traffic, any time spent with JSON could be really expensive in absolute terms) But maybe I am being naive.
Why is Protobuf not useful for web APIs? I don't know much about it, but it looks approximately like what I would propose as a middle ground between general purpose and specialized.