I am not sure, but I think there is some kind of upper limit the renderer of LM Studio can accept.. Which is the max character limit that can be parsed from the textarea input element in the front end to the backend of LM studio. This max limit is not related to the context size set directed to the LLM engine. Unfortunately, LM Studio is a closed-source app so it is very hard to make sure of such a maximum limit. That's why it is better to test increased context size models in terminal dependent app like Ollama.
Thanks for this video. Will you please try to give it some gradually increased length of text using ollama and evaluate the result. You might do it without any GUI using the terminal where there is no limit (before testing it I believe it is better to create a clone of it using the right template and parameters when creating the MODELFILE ..The template and parameter are provided on ollama website).. I think this increase in window context size should have some bad issues (gibrish, halucination or looping) which might be observed after a specific context length . The needle injection test which they did on this model does not give a real evaluation of its performance. The only real benefit of such increase is to make dealing with big data size more easy where there should be no need for RAG ...
@@basilbrush7878 Yeah, I know this behavior, for benchmarking and testing I asked some models to recite some lengthy boring text, for some reason the communist manifesto came to my mind. :D Anyway, sometimes the models hestitates telling me it can't do that but usally it actually can do it very precisely. But then either it stops reciting at a random point or get stuck in an infinite loop reciting only the same passages. If I gave the instruction only to recite up to say chapter 2, this usually gets ignored which is to me a strong indicator that the instruction is lost just after a few thousands tokens. I've yet too see a model that runs locally and has an actually working context length "hack".
You can increase in the context size in LM Studio, it should be there in the Model Inspector if i remember correctly.
yes tried that but same issue, but thanks
I am not sure, but I think there is some kind of upper limit the renderer of LM Studio can accept.. Which is the max character limit that can be parsed from the textarea input element in the front end to the backend of LM studio. This max limit is not related to the context size set directed to the LLM engine. Unfortunately, LM Studio is a closed-source app so it is very hard to make sure of such a maximum limit. That's why it is better to test increased context size models in terminal dependent app like Ollama.
Thanks for this video. Will you please try to give it some gradually increased length of text using ollama and evaluate the result. You might do it without any GUI using the terminal where there is no limit (before testing it I believe it is better to create a clone of it using the right template and parameters when creating the MODELFILE ..The template and parameter are provided on ollama website).. I think this increase in window context size should have some bad issues (gibrish, halucination or looping) which might be observed after a specific context length . The needle injection test which they did on this model does not give a real evaluation of its performance. The only real benefit of such increase is to make dealing with big data size more easy where there should be no need for RAG ...
Would have to check.
Is there more than 8k input context window??
Not for this one.
The infinite output I thinks it's the chat template
yes
can you share your pc confg?
already in video
Doesn't do much of anything for using in agentic code generation. Looping behavior.
Would have to check.
I tried it out on Ollama /set parameter num_ctx 1024000, and my example produced 70,000 words in the response 😮
But did it also remember things from the beginning and the middle?
@testales at some stage, it started repeating the same paragraphs
@@basilbrush7878 Yeah, I know this behavior, for benchmarking and testing I asked some models to recite some lengthy boring text, for some reason the communist manifesto came to my mind. :D Anyway, sometimes the models hestitates telling me it can't do that but usally it actually can do it very precisely. But then either it stops reciting at a random point or get stuck in an infinite loop reciting only the same passages. If I gave the instruction only to recite up to say chapter 2, this usually gets ignored which is to me a strong indicator that the instruction is lost just after a few thousands tokens. I've yet too see a model that runs locally and has an actually working context length "hack".
Agreed. I think the repetition is more due to the GPU card's limitation.