- 51
- 28 963
Jordan Nanos
Canada
เข้าร่วมเมื่อ 26 มิ.ย. 2017
Machine Learning Architecture
TWRTW Ep #8 ft. Anthony Placeres - ROI of AI? GPT-5 and Llama 4, TSMC yields, $4.5B chatbots, Russia
With backgrounds in the design and implementation of compute infrastructure from edge to cloud, sensor to tensor, Jordan Nanos and Hunter Almgren give their take on what’s new in enterprise technology - specifically, what they read this week.
Show notes:
x.com/tsarnick/status/1847746829490016578
x.com/deedydas/status/1848751769939284344?s=46
x.com/benhylak/status/1848765957008986416
x.com/character_ai/status/1849055407492497564?s=46
x.com/andrewcurran_/status/1849627640745058683?s=46
x.com/dylan522p/status/1849944315570864588
www.cnbc.com/2024/10/28/bret-taylors-ai-startup-sierra-valued-at-4point5-billion-in-funding.html
arstechnica.com/gadgets/2024/10/fake-restaurant-tips-on-reddit-a-reminder-of-google-ai-overviews-inherent-flaws/
Show notes:
x.com/tsarnick/status/1847746829490016578
x.com/deedydas/status/1848751769939284344?s=46
x.com/benhylak/status/1848765957008986416
x.com/character_ai/status/1849055407492497564?s=46
x.com/andrewcurran_/status/1849627640745058683?s=46
x.com/dylan522p/status/1849944315570864588
www.cnbc.com/2024/10/28/bret-taylors-ai-startup-sierra-valued-at-4point5-billion-in-funding.html
arstechnica.com/gadgets/2024/10/fake-restaurant-tips-on-reddit-a-reminder-of-google-ai-overviews-inherent-flaws/
มุมมอง: 82
วีดีโอ
TWRTW Ep #7 - Nobel Prizes, Blackwell ramp, Benioff on Copilot, NYT suing Perplexity, x86 Consortium
มุมมอง 7314 วันที่ผ่านมา
TWRTW Ep #7 - Nobel Prizes, Blackwell ramp, Benioff on Copilot, NYT suing Perplexity, x86 Consortium
How to Pick a Large Language Model for Private AI -- A Brief Overview
มุมมอง 79628 วันที่ผ่านมา
How to Pick a Large Language Model for Private AI A Brief Overview
TWRTW Ep #6 - Updates on AI Datacenters, Intel/Qualcomm, Llama3.2, NotebookLM, whats next for OpenAI
มุมมอง 80หลายเดือนก่อน
TWRTW Ep #6 - Updates on AI Datacenters, Intel/Qualcomm, Llama3.2, NotebookLM, whats next for OpenAI
Exploring the Long Context Window of Llama-3.1-405B on NVIDIA Grace Hopper GH200 Superchip
มุมมอง 471หลายเดือนก่อน
Exploring the Long Context Window of Llama-3.1-405B on NVIDIA Grace Hopper GH200 Superchip
NVIDIA isn't a bubble? Armchair analysis of Big Tech's GPU spend (TWRTW Ep #5)
มุมมอง 53หลายเดือนก่อน
NVIDIA isn't a bubble? Armchair analysis of Big Tech's GPU spend (TWRTW Ep #5)
TWRTW Ep #5 - Who Can Afford to Build Frontier LLM's?
มุมมอง 83หลายเดือนก่อน
TWRTW Ep #5 - Who Can Afford to Build Frontier LLM's?
TWRTW Ep #4 - Coding Assistants, SMCI shorts, NVIDIA DOJ Subpoena, OpenAI Lawsuits, Intel Spinoff
มุมมอง 1782 หลายเดือนก่อน
TWRTW Ep #4 - Coding Assistants, SMCI shorts, NVIDIA DOJ Subpoena, OpenAI Lawsuits, Intel Spinoff
Demo and Code Review for Text-To-SQL with Open-WebUI
มุมมอง 2.8K2 หลายเดือนก่อน
Demo and Code Review for Text-To-SQL with Open-WebUI
TWRTW Ep #3 - GenAI's Impact on Work/Privacy, Immersion Cooling, OpenStack is Back, Perplexity Ads
มุมมอง 772 หลายเดือนก่อน
TWRTW Ep #3 - GenAI's Impact on Work/Privacy, Immersion Cooling, OpenStack is Back, Perplexity Ads
TWRTW Ep #2 - Nuclear Power, GPU Buildouts, Semi-Stateful Workloads, LLM security, GPT-5 Speculation
มุมมอง 772 หลายเดือนก่อน
TWRTW Ep #2 - Nuclear Power, GPU Buildouts, Semi-Stateful Workloads, LLM security, GPT-5 Speculation
Using Llama-3.1-405B as a Coding Assistant with Continue.Dev, Ollama, and NVIDIA GH200 Superchip
มุมมอง 1.5K2 หลายเดือนก่อน
Using Llama-3.1-405B as a Coding Assistant with Continue.Dev, Ollama, and NVIDIA GH200 Superchip
Running Llama-3.1-405B with Ollama and Open-WebUI: Introduction to the DL384 Gen12 Server
มุมมอง 3312 หลายเดือนก่อน
Running Llama-3.1-405B with Ollama and Open-WebUI: Introduction to the DL384 Gen12 Server
TWRTW Ep. #1 - Intel issues, NVIDIA delays, JPY carry trades, Google antitrust
มุมมอง 1512 หลายเดือนก่อน
TWRTW Ep. #1 - Intel issues, NVIDIA delays, JPY carry trades, Google antitrust
Building Customized Text-To-SQL Pipelines in Open WebUI
มุมมอง 4.3K2 หลายเดือนก่อน
Building Customized Text-To-SQL Pipelines in Open WebUI
Simple Overview of Text to SQL Using Open-WebUI Pipelines
มุมมอง 5K3 หลายเดือนก่อน
Simple Overview of Text to SQL Using Open-WebUI Pipelines
Overview of an Example LLM Inference Setup
มุมมอง 3.1K3 หลายเดือนก่อน
Overview of an Example LLM Inference Setup
I am really curious to know how to integrate this particular pipeline text_to_sql_pipeline in Open web UI and enable it? I successfully verified the Pipeline connection but in 'Pipeline' section, when I upload this text_to_sql_pipeline, it shows me No Valves. Could you please explain me how you enabled it in your Open Web UI? I am using the docker compose containing Ollama, Open Web UI and Searxng service. I run the Pipeline Container separately. Please guide me
I have used the openwebui standard pipeline, and it looks like I can't put more than one table in the DB_table field. That's too much of a downside! Did you come across a solution?
It's turtles (nested dolls, a totem pole, a mortal coil, the collective unconscious etc.) all the way down... and it's a very long way down. Humans mistake their unconscious appetites for divine inspiration. You probably believe YOU thought that question. You are just the latest iteration, the most sophisticated variation that has passed The test. Love ya!
so I'm a little bit new to this area. You've add the xml file to the database yourself, and are just querying it via the pipeline? I was thinking about whether this would be a viable solution for an app that would allow people to easily find information that is stored in a database. But I guess that would be a huge security risk, since you basically allow them direct database access.
@@KeesFluitman no xml file, it’s a csv but yes it’s direct access to the database. In a real environment this would run against a data warehouse, lake, or some backup/export of the database
@@KeesFluitman there has to be a way in which people can find information stored in a database today. And generally they write SQL to find that information, or have to ask an analyst to write the SQL for them and send them the results. This pipeline just simplifies that process a little bit.
Cool video.... Very useful info, thanks. Worth a sub.
It's great video, thanks it helps a lot. Can I also connect the Microsoft SQL server database with Open webui through pipeline?
@@kosarajushreya6578 yes, but you’ll need to modify the pipeline code to do this. You’ll need to know how to connect to your DB in python.
This is wild
thanks so much, i will try all of it! but first Im curious how you settuo vllm for openwebui instead of ollama...do you have any good installation docu source for that?
so maybe the whole setup if relevant... like is ollama on a linux server? is openwebui on windows or on the same linux in a container etc..?
@@Earthvssuna you can run vLLM as a docker container or k8s deployment. For docker use this doc: docs.vllm.ai/en/latest/serving/deploying_with_docker.html Once the model (typically one from huggingface, like mistral or llama) is running, it’s an OpenAI-compliant endpoint. You can use the OpenAI python client for custom apps, or just add it as an endpoint in open-webui in the admin settings page. If you’re interested I was considering making some more videos describing how to install docker, kubernetes etc on a GPU server?
insightful! thanks brotha
This was a great intro, I think the information about Open webui pipes on the site is a bit vague. I would love to see more about how to use pipes for things like filtering user inputs or outputs, if pipes are the appropriate thing to use for that kind of thing. I work for a school district and would like to be able to do that to allow students access to local models.
@@jim02377 I haven’t played with filters, but that concept does exit as a type of pipeline: github.com/open-webui/pipelines/blob/main/examples/filters/detoxify_filter_pipeline.py
Hi Jordan, actually the document is in russian language but describes Kazachstan sities/towns and credit requirements in local currency. Still impressive demo of running local LLM capability.
Thanks, good to know. When I opened it up in Word, the bottom left says “Kazakh”. Too late now to correct the video
Great review, Jordan! Quick question. I have a pipeline that calls Replicate to generate an image based off the user_message (prompt) fed in from open-webui. However, when i get the response from Replicate, I'm having some issues displaying the response back in open-webui. Do you know if the return type of the pipe function has to be a string in order for open-webui to render text? What is open-webui's interface expectation on the return from the pipe function?
@@azmat8250 I’ve only seen a string work when returning from a pipeline. Even a list throws an error for me. However it’s all open source… if you look at the component during a web search, or with an audio input, it seems like you could create something custom.
@@azmat8250 looked into this and v0.3.30 of open-webui has experimental support for image generation via OpenAI’s api and a few others. It’s not via a pipeline, but still may be worth upgrading and checking out if you haven’t seen it yet.
thanks, @jordannanos . I'm on .30 but it seems like it's not working...at least for me. I'm still toying around with it. If I find something, I'll share here.
i installed open we ui local in a docker container but when i access don't see the option to upload the pipelines files, there is a special config you running?
@@geovannywing1648 if you’re using docker you’ll also need to run a separate “pipelines” container from the open-webui project, make sure networking is setup correctly between the containers, and then a connection is established between the two containers.
@@geovannywing1648 docs here: docs.openwebui.com/pipelines/
and the conclusion is…?
You think we can use custom model with api for rag ?
@@firstland_fr yes I don’t see why not. Ollama will work with any model in GGUF format (llama.cpp). And vLLM works with just about any transformers model from huggingface: docs.vllm.ai/en/latest/models/adding_model.html Both ollama and vLLM are tested with this pipeline
Hey Jordan! Can I change your pipelines for work in SQL Server?
@@renatopaschoalim1209 yes, it’s tested with Postgres and MySQL. If you know how connect to SQL server with python, you’ll be able to use the pipeline
Great Video! Can you tell me please how to create/generate API Key for llama_index?
@@Mohsin.Siddique llama-index is a python package that is installed via pip, you don’t need an API key. No API keys required for this pipeline
minds to share your code please?
@@RickySupriyadi hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
@@jordannanos wow cool, thank you.
What an awesome introduction to pipelines - thank you so much!
Hi, how do I execute the python lib installation on the pipeline server?
@@gilkovary2753 you’ll need to docker exec or kubectl exec into the container called “pipelines” Then run: pip install llama-cloud==0.0.13 llama-index==0.10.65 llama-index-agent-openai==0.2.9 \ llama-index-cli==0.1.13 llama-index-core==0.10.66 llama-index-embeddings-openai==0.1.11 \ llama-index-indices-managed-llama-cloud==0.2.7 llama-index-legacy==0.9.48.post2 \ llama-index-llms-ollama==0.2.2 llama-index-llms-openai==0.1.29 \ llama-index-llms-openai-like==0.1.3 llama-index-multi-modal-llms-openai==0.1.9 \ llama-index-program-openai==0.1.7 llama-index-question-gen-openai==0.1.3 \ llama-index-readers-file==0.1.33 llama-index-readers-llama-parse==0.1.6 \ llama-parse==0.4.9 nltk==3.8.1
Thanks for sharing this awesome project! I tried running the 01_text_to_sql_pipeline_vLLM_llama.py file from your GitHub repo, but I'm having trouble uploading it on Open WebUI even though I've installed all the requirements. Do you have any idea what might be causing this issue? Thanks again!
Did you configure well pipeline ?
@@dj_hexa_official what do u mean with that ?
@@netixc9733 what error are you seeing? docker logs -f or kubectl logs -f your pipelines container and it may report an error
Lovely demo of the synergy between language models and databases.
First of all great job Jordan. It would be really helpful if you could share the code on git.
hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
great video. hoping to see more as soon. congrats.
hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
Jordan , Super good job. I'm trying to integrate openwebui into my CRM system. That I would like to query the database for any of our product price or everything through the chat for my employees. This rag pipeline can make it in this way for example ? Thanks you for your answer
hi, I think if you've got a db you should be able to query it. especially if you already know how using python. I posted another video. code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
@@jordannanos Thanks a lot Jordan . Super cool
Hi. Could you link us to the source code of the pipeline?
code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
Jordan thanks, I have a single gpu runpod setup would you recommend just adding a docker postgresql to existing pod? and is the python code using langchain stored in the pod pipeline settings? this sort of reminds me of AWS serverless Lambda but simpler
@@RedCloudServices if you’d like to save money I would run Postgres in docker on the same VM you’ve already got. That will also simplify networking. Over time you might want to start/stop those services independently in the event of an upgrade to docker or your VM. Or you might want to scale independently. In that case you might want a separate VM for your DB and a separate one for your UI. Or you might consider running kubernetes. Yes the python code is all contained within the pipelines container and uses llama-index not langchain (though you could use langchain too). Just a choice I made.
@@RedCloudServices in other words, you’ll need to pip install the packages that the pipeline depends on, inside the pipelines container. Watch the other video I linked for more detail on how to do this.
@@jordannanos yep! just watched it. I just learned openwebui does not allow Vision only models or multi modal LLMs like Gemini. Was hoping to setup a pipeline using a vision model 🤷♂️ also it’s not clear how to edit or setup whatever vector db it’s using
Hi Jordan, thanks. I am missing the steps where you created the custom "Database Rag Pipeline with Display". From the Pipelines page you completed the database details and set the Text-to-sql Model to Llama3, but where do you configure the connection between the pipeline valves and the "Database Rag Pipeline with Display" to be an option to be selected?
@@martinsmuts2557 it’s a single .py file that is uploaded to the pipelines container. I’ll cover that in more detail in a future video
@@jordannanos Do create this video soon!
@@KunaalNaik @martinsmuts2557 just posted a video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html repo is here: github.com/JordanNanos/example-pipelines
hi
30k GC? 8 of them?
Thx for sharing and it's really interesting to learn more about the pipeline projects related to open webui.
Sweet rig. Is that your daily driver? 😀😀
The cost of such a setup is circa $500,000........ amma get me 2 :)
Thank you Jordan! Great work, interesting to see how these new servers can really deliver performance. ARM / x86.. just works. Yours, Greg
Thank you Jordan.
Thank you for this. Can you share more info on the RAG pipeline along with code examples.
working on getting it to run on both vLLM + ollama endpoints with llama3.1 + mistral. prompt uses llamaindex for text-to-sql.
similar to this guide: docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo/
Great job can't wait to see more
@@jvannoyx4 hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: th-cam.com/video/iLVyEgxGbg4/w-d-xo.html
Amazing video! I have a 4xA4000 GPU 128GB and I can only get the 405B 2_K model, and it’s really slow. Amazing how the GH100 chips offer great token/sec performance!
- what are you using it for? - .... stuff
AI, apparently. (LLM = Large Language Model)
I feel jalous of that 8xH100 server. Currently using a 4x3090 at home. I actually use a pretty similar setup with vLLM for the full precision models and exllama or llama.cpp for quantized models + OpenwebUI as a frontend.
Why would you need more than that? Be glad for what you already have or you won't find happiness :)
Bitch i have a p40 and im over the moon. Being poor in ml is hard.
Word
Why not podman tho
Good discussion! Keep it up.
Great work, Jordan! Gonna start scraping the parts together...
I been automating deployments with skypilot. It uses the cheapest spot instances and heals itself
nice video. saw the link from twitter. my question is, is there a way to speed up the results after you ask it a question?
Yes, working to improve the LLM response and SQL query time