Langroid: Chat to a CSV file using Mixtral (via Ollama)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ม.ค. 2024
  • In this video, we'll learn about Langroid, an interesting LLM library that amongst other things, lets us query tabular data, including CSV files! It delegates part of the work to an LLM of your choice under the hood, and I decided to take it for a spin using Kaggle's world population dataset. We give it three different questions to answer and then I write Pandas code to check the results. The results are sometimes good, sometimes downright hallucinations!
    #pandas #llm #mixtral #litellm #ollama
    Resources
    Langroid - github.com/langroid/langroid/...
    litellm - litellm.ai/
    Mixtral (via Ollama) - ollama.ai/
    World population data - www.kaggle.com/datasets/tanis...
    Code - github.com/mneedham/LearnData...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 20

  • @geoffreygordonashbrook1683
    @geoffreygordonashbrook1683 5 หลายเดือนก่อน +3

    This kind of pandas test looks like a great way to look at and compare various models, for example comparing the same automated test and collecting results to compare.

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      That's a great idea. I guess itd be interesting to compare this type of tool vs the fine tuned coding models

  • @nmstoker
    @nmstoker 5 หลายเดือนก่อน +1

    Nice video (as always!)
    Shame it was hallucinating results - would be interesting to get to the bottom of that, perhaps with other csv datasets and/or debugging what langdroid is actually doing.

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Yeh, I need to look into that more. I found that with the example CSV file provided in the Langroid docs it was working pretty well, but I wanted to try it with something a bit more random to see if its performance generalised!

  • @AshishBangwal
    @AshishBangwal 4 หลายเดือนก่อน

    Another issue i think is code readability, the code for calculating population change generated by LLM is a bit unnecessarily complex, maybe because its trying to fit the whole operation in a single line......still its impressive how its actually able to understand Column/feature meaning 👏

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Yeh I liked that too as usually if you give an LLM a table it comes up with gibberish in response!

  • @TruthFearless
    @TruthFearless หลายเดือนก่อน

    works good just cant get the doc chat agent working with a local llm any ideas on how i could fix that

    • @learndatawithmark
      @learndatawithmark  หลายเดือนก่อน

      Yeh, same here! I tried the doc chat agent first and it just sat there spinning for ages, which is why I moved onto the CSV file based one instead.

    • @TruthFearless
      @TruthFearless หลายเดือนก่อน

      @@learndatawithmark hey I actually got the doc chat agent to work as well with a local llm if you want I can share my code here.

  • @nickmills8476
    @nickmills8476 14 วันที่ผ่านมา

    Is langroid only good for querying data? Surely, you can define new tools, so that it can do anything any of the other frameworks can do? Generating code etc.

    • @learndatawithmark
      @learndatawithmark  10 วันที่ผ่านมา

      Yep you can define your own tools. I haven't tried that out to see how well it works yet.

    • @nickmills8476
      @nickmills8476 10 วันที่ผ่านมา

      @@learndatawithmark trying to figure it out now, with a 2 agent solution for following a list of instructions. Will let you know how it goes.

  • @onhazrat
    @onhazrat 4 หลายเดือนก่อน

    🎯 Key Takeaways for quick navigation:
    00:00 📚 *Introduction to Langroid*
    - Langroid is an LLM library for querying tabular data like CSV files.
    - Setting up Langroid involves configuring the LLM, initializing the agent, and creating a task.
    - Quick check with a question about pandas to ensure Langroid is working.
    01:11 🗃️ *Loading and Exploring CSV Data*
    - Importing a world population dataset from Kaggle using pandas.
    - Displaying and exploring the dataset columns and content.
    - Configuring Langroid to query the CSV file for specific information.
    02:22 ⚙️ *Running Langroid Tasks and Analyzing Results*
    - Creating a task to find the top five countries by population.
    - Discussing the iteration limit and the time it takes for Mixt to load.
    - Analyzing Langroid's generated code and results for the top countries.
    03:30 📈 *Comparing Langroid's Answers with Pandas*
    - Comparing Langroid's answer for the biggest population increase with a manual pandas approach.
    - Highlighting discrepancies in the results and questioning Langroid's accuracy.
    - Discussing potential reasons for incorrect Langroid answers.
    04:53 🌐 *Grouping and Aggregating Data with Langroid*
    - Formulating a complex question about the average, minimum, and maximum area for each continent.
    - Running Langroid to answer the question and examining the results.
    - Contrasting Langroid's answers with manually obtained accurate results using pandas.
    Made with HARPA AI

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Neat - that sounds like a clever tool!

  • @philippechassany7279
    @philippechassany7279 4 หลายเดือนก่อน

    import langroid.language_models as lm → ModuleNotFoundError: No module named 'langroid.language_models'; 'langroid' is not a package
    :(

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      You definitely installed the langroid package with pip install langroid? Try doing import langroid to check?

    • @philippechassany7279
      @philippechassany7279 4 หลายเดือนก่อน

      ⁠@@learndatawithmarkyes langroid was installed via pip

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      So in the video I installed everything in a Python virtual env:
      python -m venv .venv
      source .venv/bin/activate
      I did also try with Poetry, but I was having problems when I did that. I dunno if you're already using a virtual env, but might be worth trying if not?

    • @TruthFearless
      @TruthFearless หลายเดือนก่อน

      hey bro you should make sure your using the right version of python i would use 3.11.3