BEST Prompt Format: Markdown, XML, or Raw? CONFIRMED on Llama 3.1 & Promptfoo

IndyDevDan

มุมมอง 14 691

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 48

@FranAbenza 3 หลายเดือนก่อน ⁺⁷
THIS IS GOLD.
@MegaClockworkDoc 5 หลายเดือนก่อน ⁺⁹
I've been curious about this topic. I really appreciate how you approached the evaluation. I would have liked to see an n of 5 for each example to limit errors related to model entropy.
@WouterLombard 5 หลายเดือนก่อน ⁺⁸
Just love your whole approach to AI and coding in general
@MichalPolan-s8f 5 หลายเดือนก่อน ⁺⁶
Great comparison.
Something to consider is to break down the scores by model. Why?
To see if there are preferences of format by model.
E.g. we know that Anthropic likes XML and that format might be the best for their models. That does not mean that this holds true for other models.
@rcj1337 5 หลายเดือนก่อน
True
@newfrontiers5673 4 หลายเดือนก่อน ⁺²
I started using markdown but after looking over the anthropic workbench I started using xml. Havent looked back.
@horrorislander 5 หลายเดือนก่อน ⁺¹⁴
Shouldn't it be possible to layer a deterministic MD-to-XML convertor in your prompting process? Then you, as a human, could still work in MD while your LLMs get the XML they crave.
@BTFranklin 5 หลายเดือนก่อน ⁺⁶
Absolutely possible, but not as easy as you'd think at first blush. For example, the XML tags you choose have information in them, telling them "what the thing is" that you're wrapping in the tag, whereas in markdown all you really have is "sections" and various types of divisions. I can say this as an experienced programmer who tried to create a Markdown-based parser for exactly this purpose. It's *way* harder to cleanly interpret semantic divisions when all you have to work with is stuff like blank lines.
@horrorislander 5 หลายเดือนก่อน ⁺²
@@BTFranklin I don't think XML *has* to have more information, and for this particular test I assume it doesn't. If the XML prompts he's using do indeed provide more information than the markdown ones do, doesn't invalidate these results as a measure of format (and only format) effectiveness?
@DouhaveaBugatti 3 หลายเดือนก่อน
Use ai to convert it😊
@xNghtMRxEdgex หลายเดือนก่อน
Amazing video! One could argue that there's no real difference between XML and RAW formats but the power of XML is having a bunch of well pre-structured prompts that you have only to fill certain areas. Writing a good pre-formatted raw prompt can be more annoying, while with XML you can just add a few more tags here and there and refine the desired output as much as needed in a rather simpler way. Even Perplexity works well with XML and it's easier to restrict the kinds of outputs or searches with it.
@orthodox_gentleman 5 หลายเดือนก่อน ⁺³
Thanks for all your hard work! You do such a great job brother. Appreciate you very much.
@BTFranklin 5 หลายเดือนก่อน ⁺¹
This is an excellent, detailed analysis. Highly appreciated, sir. Subbed.
@rcj1337 5 หลายเดือนก่อน ⁺¹
Incredible value, please more of this type of content
@andrewandreas5795 5 หลายเดือนก่อน ⁺⁴
One of the best videos I have seen regarding all things LLMs. Do you think the results from 4o-mini replicate with 4o, 4-turbo and gpt4?
@DarrenAIDev 4 หลายเดือนก่อน ⁺¹
Always great insights, need to give promptfoo a shot!
@davidrobertson6371 5 หลายเดือนก่อน ⁺¹¹
There’s a couple things that you missed. To make this video actually useful, you need to experiment more.
- 1 you missed using yaml, it’s a dark horse and I’ve had stellar results with it. - 2 use something harder like tool calling
- 3 try instructions that are system prompt heavy
- 4 try prompts that put the Instructions as the very last thing the model sees
- Use the seed param
- use an automation that changes the temp by 0.1 for each call.
I have to say I’m a bit disappointed with the video, I mean I kind of get it, but I want to see these models tested on the bleeding edge of what they can do, I want to see it where your dialling in that last couple of percent of performance. They’re so much more powerful than the examples in the video.
@bendu-d8n 5 หลายเดือนก่อน ⁺¹
your videos always do real help, great work.
@thevirtualfront 5 หลายเดือนก่อน
What a great video and unexpectedly outcome, I’ve been using MD but am swapping to XML for complex persona instructions. Great video!
@PrincessKushana 5 หลายเดือนก่อน
Fascinating. I've been using raw with small json elements where strucutre was needed in autogen based flows. Works really well. Json does get brittle when there's too much of it though. I'm not shocked that the whole prompt in json wasn't great.
That being said, definitely going to try some xml.
@IdPreferNot1 5 หลายเดือนก่อน
Great setup. Please evaluate the Gemini Flash. Capabilities of these low cost workhorse models are the most important edge cases to understand.
@Techonsapevole 5 หลายเดือนก่อน ⁺¹
Great tests, which open model 8B or 9B is the best with long context ? To my tests Gemma2 q4_k_m performs quite well
@pyrotecx123 5 หลายเดือนก่อน ⁺⁴
Great content! Would have been nice to also compare YAML.
@Adrian_Galilea 5 หลายเดือนก่อน
YAML is nice for toying around but is an awful format once you start using it, make a google search "yaml sucks" and you'll see, I regret having adopted it in some projects.
@GraysonChalmers1 5 หลายเดือนก่อน ⁺³
I’m with you! I started with YAML and then moved to some mix of that and TOML/XML. That would be fun to have a central leaderboard for prompt format performance tracking based on different metrics like here!
@ManjaroBlack 5 หลายเดือนก่อน
This is what I’ve been looking to test myself. I suspected Markdown wasn’t performing well. I asked llama3.1 what it prefers, and it gave me XML.
@jmirodg7094 5 หลายเดือนก่อน
Good use of markdown 2 XML converters so we can conveniently write the prompt in markdown then send it as XML to the LLM.
@jindrichsirucek 3 หลายเดือนก่อน
Great content!! I was breaking my head with the way how to structure instructions especialy for meta prompting and first I was thinking about json bcs of its unlimited nesting nature. then I realized that XML might be better bcs of the problem closing brackets.. and then I realized the reason why XML is the best format is bcs LLM are trained on websites - tudum tudum tudum tada - XML formated content :D
I kind of realized all those things on my own and I was thinking, why is nobody talking about it and then 2 days lateer booom - this video :D
thx for references - Ill study what others came up with, since I kinda reinvented wheel on my own :D
Thx
@MrJohndoe845 5 หลายเดือนก่อน ⁺³
do you have this code on github? would love to play around with it myself
@jonmichaelgalindo 5 หลายเดือนก่อน ⁺²
XML is what I've been using since day 1. 😊
@jindrichsirucek 3 หลายเดือนก่อน
impresive!
@QuizmasterLaw 21 ชั่วโมงที่ผ่านมา
is this also true for RAG documents?
I read at least one place fine tuning is best or even requires JSONL
my use cases are RAG, maybe eventually JSONL but it seems formatting RAG docs right is even more important than fine tuning.
@roccobooysen3611 5 หลายเดือนก่อน
Dan, is there a way to get access to the files you used in this video? I dont have coding knowledge and am learning about prompt scripting. From the video and the files you ran it comes across as if you have a methodology to write your scripts that could help me with developing my own scripts following your examples.
@tryingET 5 หลายเดือนก่อน
Have you tought about mixing xml tags into your markdown prompt? Like claude sonnet does in the prompt generator?
@seb_balls 4 หลายเดือนก่อน
On top of that, could be interesting to provide an xsd (xml schema definition) so that the response format is fully predictable.
@HistorIAsImposibles776AC 5 หลายเดือนก่อน ⁺¹
Please, a basic video related with llms how to deploy, expected uses of local llms ... I think it will be interesting for creating a Small company's running by theamself
@okasi 4 หลายเดือนก่อน
Does it really matter with tab indentations and newlines when using XML tags? 🤔
@AI_Escaped หลายเดือนก่อน
Would it make sense to put your RAG files in XML format as well?
@12wsaqw 5 หลายเดือนก่อน
In my testing of llama3.1 8b for instruction following I find it severely lacking compared with codestral. Llama3.1 8b was unable to return a simple yes or no response. It always included a fluffy explaining response (which was correct but not requested). YMMV.
@techfren 5 หลายเดือนก่อน
Amazing content as always thank you.
@ChrizzeeB 5 หลายเดือนก่อน
Really useful video!
@techfren 5 หลายเดือนก่อน
Do you share results in any other format
@patrickeriksson1887 3 หลายเดือนก่อน
Markdown _is_ XML, but only a subset of it, which is why it performs worse than XML. Think about it: Markdown will give the LLM a clue as to how the information is structured, but it doesn’t include as much meta-data as XML. A shopping list of ingredients in Markdown would look like an unordered-list of list-items, but in XML it could be represented as a shopping-list of ingredient-items. I didn’t know XML would perform this well, but after having watched your video, I’ll be switching. Great stuff.
@eintyp4389 5 หลายเดือนก่อน ⁺³
When JSON is the worst performing format. Feels bad men. I will keep this in mind... never wouldh have guessed that it handels xml so well but then again most of the data is raw text and html wish looks like xml because of the tags so i see why llms wouldh be good at understanding and generating with it.
@JannisSchulze-xz7um 3 หลายเดือนก่อน
HOW HARD IS IT TO COPY PASTE A PROMPT INTO THE VIDEO DESCRIPTION :(
@VinCarbone 5 หลายเดือนก่อน
My approach its Xml for titles tags and inside i write in markdown.
It works and still its really human readeable
Full xml its not the best to read.
@bodyguardik 5 หลายเดือนก่อน
best prompt format is l337 sp33ch
@Brymcon 5 หลายเดือนก่อน
Markdown and xml hands down for reports. Markdown converted to vectors.

ต่อไป

เล่นอัตโนมัติ

Coding RELIABLE AI Agents: Legit Structured Outputs Use Cases (Strawberry Agent?)