How to Build, Evaluate, and Iterate on LLM Agents

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 9

  • @FarooqKaiser-ca
    @FarooqKaiser-ca ปีที่แล้ว +14

    Excellent talk ❤ so much to learn. What a time to be alive ❤❤❤

  • @xspydazx
    @xspydazx 3 หลายเดือนก่อน

    Intersting Grounding ! this can also be used for cased based reasoning , as we find models do not perform with fictituious examples : but with a grounding tool we can use the dataset to retarget the response according to our base ground data ( in this way we do not so much filtering as guardrailinng can also block the potential creationn of a output , even harmlees as neghative prompting also highly effects models:
    when desiinging a process such as reacT we also need to specify the type of methodology we need to use :
    CHAT TEMPLATE :
    1. **Question**: {PROMPT}
    2. **Thought**: Think step by step about how to approach this question.
    3. **Action**: Determine what action to take next:
    - [Search]: Look for relevant information online.
    - [Analyze]: Break down the problem into smaller parts.
    - [Summarize]: Provide a summary of known facts related to the question.
    4. **Action Input**: Specify any details needed for the action.
    5. **Observation**: Describe what was found or learned from the action taken.
    Repeat steps 2-5 as necessary to refine your answer.
    6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question.
    hence here we can specify the Search,Analyze,Summarize methodology to follow , this is not the tools used or the tools available it isi the step by step process mtethod to follow :
    i have also used :
    - [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first.
    - [Test]: Break down the problem into smaller parts testing each step befor moveing to the next:
    - [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps :
    we could consider these sub plans methodologys , potential graphs or sturctures : which enable the model to generate the correct type of response : eventually the model : i have will be trained to design these prompt templates : So given a Input : it can first decern the correct template to use for the task :
    as well ca even generallze this template further : again the point is to frame your template within your existing ReAcT template : these template could also have associeted tools lists : or NODES of a graph as entry points : but the aim is with these templates to eliiate a graph by the use of the prompt which invokes the same methododlgy in simplistic form :
    I found the react Template Also needed Adjustment , But i handled it in the ivoke method ! ... this was to enable human in the loop :
    Im not sure people understand this idea correctly ! as its easy to understand using trees and graphs etc but in practice :
    The human in the loop Shaould be a part of your Stooping criteria or ResponseReason ! < So if the model is not making a function call it may require more iput from the user , within the same response before beig able to output a final response to the task ! .. so in a n automated invoke method the model runs automatically Calling functiions and tools in the backgraound ( we capture these with the response reason : until the final response is reached : we all handle this inner loop , by keeping the input for the user or simple outputting thedata and logging the transactions : so by creating the Response Reason ( human in the loop we can have a serprate input where the model may ask for clarification or more informationn nto contiue to the next step ! , hence either this should be internally handled in the invoke loop or as an Embedded tool ! < so there would always be some tools whih can be added to the interface , as after traiing the model on these tools the dexcritption for the tool can even be removed i future pretrained models ! as it will have been finetuned ~ !..
    hence i created a specifi model that runs on the react method and you can use it with your methodologys like this :
    and your system prompt can be your normal reaxt prompt !

  • @hotrodoanzz
    @hotrodoanzz 2 หลายเดือนก่อน

    That's great! But if I have a csv dataset. Can you give me some advice to evaluate the system LLM with that data

  • @BenRitchie
    @BenRitchie 11 หลายเดือนก่อน

    Does the agent approach actually work though, any accuracy metrics out there

    • @gowthamkrishna6283
      @gowthamkrishna6283 3 หลายเดือนก่อน

      afaik , still people go for human evaluation, which is costly. But we might see some standard tests soon

  • @wole61
    @wole61 ปีที่แล้ว +1

    Nice Intro

  • @palashjyotiborah9888
    @palashjyotiborah9888 ปีที่แล้ว +8

    Microphone investment is necessary.