Testing 1 Million Context Length of Llama 3 8B Locally

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 21

  • @antonvinny
    @antonvinny 5 หลายเดือนก่อน +1

    You can increase in the context size in LM Studio, it should be there in the Model Inspector if i remember correctly.

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน

      yes tried that but same issue, but thanks

    • @HassanAllaham
      @HassanAllaham 5 หลายเดือนก่อน

      I am not sure, but I think there is some kind of upper limit the renderer of LM Studio can accept.. Which is the max character limit that can be parsed from the textarea input element in the front end to the backend of LM studio. This max limit is not related to the context size set directed to the LLM engine. Unfortunately, LM Studio is a closed-source app so it is very hard to make sure of such a maximum limit. That's why it is better to test increased context size models in terminal dependent app like Ollama.

  • @HassanAllaham
    @HassanAllaham 5 หลายเดือนก่อน

    Thanks for this video. Will you please try to give it some gradually increased length of text using ollama and evaluate the result. You might do it without any GUI using the terminal where there is no limit (before testing it I believe it is better to create a clone of it using the right template and parameters when creating the MODELFILE ..The template and parameter are provided on ollama website).. I think this increase in window context size should have some bad issues (gibrish, halucination or looping) which might be observed after a specific context length . The needle injection test which they did on this model does not give a real evaluation of its performance. The only real benefit of such increase is to make dealing with big data size more easy where there should be no need for RAG ...

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน +1

      Would have to check.

  • @omarawad117
    @omarawad117 5 หลายเดือนก่อน

    Is there more than 8k input context window??

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน

      Not for this one.

  • @lorenzo9196
    @lorenzo9196 5 หลายเดือนก่อน +1

    The infinite output I thinks it's the chat template

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน

      yes

  • @myideaspotxyz5618
    @myideaspotxyz5618 5 หลายเดือนก่อน

    can you share your pc confg?

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน

      already in video

  • @pensiveintrovert4318
    @pensiveintrovert4318 5 หลายเดือนก่อน

    Doesn't do much of anything for using in agentic code generation. Looping behavior.

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน

      Would have to check.

  • @basilbrush7878
    @basilbrush7878 5 หลายเดือนก่อน

    I tried it out on Ollama /set parameter num_ctx 1024000, and my example produced 70,000 words in the response 😮

    • @testales
      @testales 5 หลายเดือนก่อน

      But did it also remember things from the beginning and the middle?

    • @basilbrush7878
      @basilbrush7878 5 หลายเดือนก่อน

      @testales at some stage, it started repeating the same paragraphs

    • @testales
      @testales 5 หลายเดือนก่อน

      @@basilbrush7878 Yeah, I know this behavior, for benchmarking and testing I asked some models to recite some lengthy boring text, for some reason the communist manifesto came to my mind. :D Anyway, sometimes the models hestitates telling me it can't do that but usally it actually can do it very precisely. But then either it stops reciting at a random point or get stuck in an infinite loop reciting only the same passages. If I gave the instruction only to recite up to say chapter 2, this usually gets ignored which is to me a strong indicator that the instruction is lost just after a few thousands tokens. I've yet too see a model that runs locally and has an actually working context length "hack".

    • @fahdmirza
      @fahdmirza  5 หลายเดือนก่อน +1

      Agreed. I think the repetition is more due to the GPU card's limitation.