- 245
- 249 964
John Tan Chong Min
Singapore
เข้าร่วมเมื่อ 7 ก.พ. 2012
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: discord.gg/bzp87AHJy5
LinkedIn: www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: delvingintotech.wordpress.com/.
Twitter: johntanchongmin
Try out my games here: simmer.io/@chongmin
Discord: discord.gg/bzp87AHJy5
LinkedIn: www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: delvingintotech.wordpress.com/.
Twitter: johntanchongmin
Try out my games here: simmer.io/@chongmin
o3 (Part 2) - Tradeoffs of Heuristics, Tree Search, External Memory, In-built Bias
o3 (Part 2): Tradeoffs of Heuristics, Tree Search, External Memory, In-built Bias
o3 is indeed groundbreaking, and shows that we might be close to finding a general training procedure that can self-improve with fine-tuning.
Here's part 2 of my discussion session on how o3 works based on my own understanding of it (or more generally, architectures that bootstrap learning via fine-tuning on correct trajectories)!
While o3 is powerful, I do not think o3-type architecture is the only way ahead for learning.
I believe that fine-tuning on own trajectories is slow to learn, and having a procedure to learn with external memory is very important (and missing) right now!
I also believe that learning from arbitrary start and end states in a trajectory is important - for instance, in math, we do not want to just learn the goal / model answer, but perhaps also how to reach every intermediate step of the solution.
Moreover, we should consider imbuing some biases so that we can reduce samples needed for training - like filters in Convolutional Neural Networks to bias for neighbouring pixels, so we do not need too many translations of the original image to learn translational invariance/equivariance.
~~~
Slides: github.com/tanchongmin/agentjo/blob/main/paper_reviews/o3_discussion.pdf
References:
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters - arxiv.org/abs/2408.03314
Learning, Fast and Slow: arxiv.org/abs/2301.13758
LLMs as a System of Multiple Expert Agents: arxiv.org/pdf/2310.05146
~~~
0:00 Introduction and Recap
2:21 Impressive Benchmark Scores
8:50 Is it AGI?
9:50 Correct Trajectories
35:45 Tree Search vs Parallel Search
57:14 Path Ahead - Adaptive Benchmarks
1:07:33 Learning, Fast and Slow
1:24:33 Multiple Abstraction Spaces / Neurosymbolic Integration
1:31:25 Discussion
1:54:30 Conclusion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: discord.gg/bzp87AHJy5
LinkedIn: www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: delvingintotech.wordpress.com/
Twitter: johntanchongmin
Try out my games here: simmer.io/@chongmin
o3 is indeed groundbreaking, and shows that we might be close to finding a general training procedure that can self-improve with fine-tuning.
Here's part 2 of my discussion session on how o3 works based on my own understanding of it (or more generally, architectures that bootstrap learning via fine-tuning on correct trajectories)!
While o3 is powerful, I do not think o3-type architecture is the only way ahead for learning.
I believe that fine-tuning on own trajectories is slow to learn, and having a procedure to learn with external memory is very important (and missing) right now!
I also believe that learning from arbitrary start and end states in a trajectory is important - for instance, in math, we do not want to just learn the goal / model answer, but perhaps also how to reach every intermediate step of the solution.
Moreover, we should consider imbuing some biases so that we can reduce samples needed for training - like filters in Convolutional Neural Networks to bias for neighbouring pixels, so we do not need too many translations of the original image to learn translational invariance/equivariance.
~~~
Slides: github.com/tanchongmin/agentjo/blob/main/paper_reviews/o3_discussion.pdf
References:
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters - arxiv.org/abs/2408.03314
Learning, Fast and Slow: arxiv.org/abs/2301.13758
LLMs as a System of Multiple Expert Agents: arxiv.org/pdf/2310.05146
~~~
0:00 Introduction and Recap
2:21 Impressive Benchmark Scores
8:50 Is it AGI?
9:50 Correct Trajectories
35:45 Tree Search vs Parallel Search
57:14 Path Ahead - Adaptive Benchmarks
1:07:33 Learning, Fast and Slow
1:24:33 Multiple Abstraction Spaces / Neurosymbolic Integration
1:31:25 Discussion
1:54:30 Conclusion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: discord.gg/bzp87AHJy5
LinkedIn: www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: delvingintotech.wordpress.com/
Twitter: johntanchongmin
Try out my games here: simmer.io/@chongmin
มุมมอง: 944
วีดีโอ
o3 (Part 1): Generating data from multiple sampling for self-improvement + Path Ahead
มุมมอง 2.4K21 วันที่ผ่านมา
75% on ARC-AGI semi-private dataset is insanely good! o3 is indeed groundbreaking, and shows that we might be close to finding a general training procedure that can self-improve with fine-tuning. Here's some slides I made to explain how o3 works based on my own understanding of it (or more generally, architectures that bootstrap learning via fine-tuning on correct trajectories)! That said, I do...
AgentJo CV Generator: Generate your CV by searching for your profile on the web!
มุมมอง 119หลายเดือนก่อน
Demonstrates how to use an AgentJo agent, equipped with a selenium web browsing function, to generate a CV! Notebook: github.com/tanchongmin/agentjo/blob/main/contrib/Demo/AgentJo_CV_Generator.ipynb 0:00 LLM Setup 1:50 Custom Search API 3:33 Agent Definition 5:22 Customised Replies AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and ...
AgentJo FinTech Demo: Extract Data from Web Pages, Answer with Citation, Multi-step Agentic RAG
มุมมอง 141หลายเดือนก่อน
Check out my latest FinTech demo notebook: It has: - strict_json to extract information from webpage - citation from pdf with sources cited - multi-step agentic retrieval augmented generation (RAG) Of course, more work needs to be done to make sure that LLMs are robust with different queries, but this is a baseline template you can freely use for your own use cases as well. Check out the notebo...
Can LLMs be used in self-driving? CoMAL: Collaborative Multi-Agent LLM for Mixed Autonomy Traffic
มุมมอง 2862 หลายเดือนก่อน
CoMAL is a very interesting paper which uses mutli-agent collaboration to define leader / follower roles between autonomous vehicles, and then each agent will plan their velocity, acceleration and spacing from car in front individually. It has a memory to draw information from and to update dynamically according to environment experience. Overall, there is a lot of merit to the architecture cre...
From TaskGen to AgentJo: Creating My Life Dream of Fast Learning and Adaptable Agents
มุมมอง 6432 หลายเดือนก่อน
TaskGen is a framework that is a culmination of 5 years of thoughts during my PhD to build fast learning and adaptable agents. It uses a task-directed, memory-based mechanism to focus on tasks and learn from the environment, with knowledge sharing on a need-to-know basis. AgentJo is the continuation of TaskGen, as we scale it to multiple memory abstraction spaces, multiple agent augmentations, ...
Tian Yu X John: Discussing Practical Gen AI Tips for Image Prompting
มุมมอง 1332 หลายเดือนก่อน
Speaker Profile: Tianyu is a generative AI practitioner, speaker, author. He founded a company dedicated to upskilling professionals in generative AI. He also chairs the GenAI Risk Chapter at RIMAS. With over 1,000 hours dedicated to generative AI tools, Tianyu has generated over 30,000 images and 10,000 videos. He holds a best-selling DALL-E course and recently published the book "Will ChatGPT...
Jiafei Duan: Uncovering the 'Right' Representations for Multimodal LLMs for Robotics
มุมมอง 2133 หลายเดือนก่อน
Speaker Profile: Jiafei Duan is a third-year PhD student in robotics at the University of Washington’s Paul G. Allen School of Computer Science & Engineering, where he is part of the Robotics and State Estimation Lab, co-advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on robot learning, embodied AI, foundation models, and computer vision. He is currently funded by the ...
TaskGen Tutorial 6: Conversation Wrapper
มุมมอง 873 หลายเดือนก่อน
I talk about how to wrap an agent in a conversational interface, for chatbots and interaction with environment. This conversation wrapper also has persistent memory that can keep track of states throughout the conversation. TaskGen Repo: github.com/simbianai/taskgen AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo AI and ML enthusiast. Likes to think ab...
TaskGen Tutorial 5: External Functions & CodeGen
มุมมอง 883 หลายเดือนก่อน
Here, I go through how to integrate functions from other agentic frameworks easily, and how to get TaskGen to generate and run code! AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. A...
TaskGen Tutorial 4: Hierarchical Agents
มุมมอง 1743 หลายเดือนก่อน
Here, we cover how to use hierarchical agents in TaskGen! AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator. Discord: discord.gg/bzp87AHJy5 LinkedIn: ww...
TaskGen Tutorial 3: Memory
มุมมอง 1753 หลายเดือนก่อน
Memory is very important for learning. We need different abstraction spaces of memory, each of them consolidating experiences in a different form. Then, when we need to retrieve the memories, we should take those memories that are similar to what we are experiencing right now. AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/sim...
TaskGen Tutorial 2: Shared Variables and Global Context
มุมมอง 1343 หลายเดือนก่อน
Here, we go through Shared Variables, a way to store and retrieve important information in a dictionary. We also go through Global Context, a way to put these Shared Variables into the agent's prompt. AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen AI and ML enthusiast. Likes to think about the essences behind...
Beyond Strawberry: gpt-o1 - Is LLM alone sufficient for reasoning?
มุมมอง 1.1K3 หลายเดือนก่อน
gpt-o1 has Chain of Thought (CoT) likely already built into the dataset, perhaps by using methods such as Self-Taught Reasoner (STaR) to augment the dataset with rationales, or getting PhD students to provide the rationale. The key takeaway is that inference at runtime helps significantly on traditional problem solving domains like math and code, and gpt-o1's way of doing this has significant p...
TaskGen Tutorial 1: Agents and Equipped Functions
มุมมอง 2304 หลายเดือนก่อน
In this TaskGen Tutorial Series, I will be going through how to use TaskGen. In this tutorial, we will cover the basics of how to use Agents and Equipped Functions for agentic pipelines. AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen AI and ML enthusiast. Likes to think about the essences behind breakthroughs...
LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation + Improvements
มุมมอง 5314 หลายเดือนก่อน
LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation Improvements
Agentic Systems for Production: Tips and Tricks
มุมมอง 7044 หลายเดือนก่อน
Agentic Systems for Production: Tips and Tricks
alphaXiv - Share Ideas, Build Collective Understanding, Interact with ANY open sourced paper authors
มุมมอง 2465 หลายเดือนก่อน
alphaXiv - Share Ideas, Build Collective Understanding, Interact with ANY open sourced paper authors
TaskGen Overview: Open-Sourced LLM Agentic Framework - Task-Based, Memory-Infused, StrictJSON
มุมมอง 2.8K5 หลายเดือนก่อน
TaskGen Overview: Open-Sourced LLM Agentic Framework - Task-Based, Memory-Infused, StrictJSON
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
มุมมอง 2K5 หลายเดือนก่อน
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
NeoPlanner - Continually Learning Planning Agent for Large Environments guided by LLMs
มุมมอง 5125 หลายเดือนก่อน
NeoPlanner - Continually Learning Planning Agent for Large Environments guided by LLMs
Michael Hodel: Reverse Engineering the Abstraction and Reasoning Corpus
มุมมอง 1.6K6 หลายเดือนก่อน
Michael Hodel: Reverse Engineering the Abstraction and Reasoning Corpus
TaskGen Conversational Class v2: JARVIS, Psychology Counsellor, Sherlock Holmes Shop Assistant
มุมมอง 1466 หลายเดือนก่อน
TaskGen Conversational Class v2: JARVIS, Psychology Counsellor, Sherlock Holmes Shop Assistant
CodeAct: Code As Action Space of LLM Agents - Pros and Cons
มุมมอง 7447 หลายเดือนก่อน
CodeAct: Code As Action Space of LLM Agents - Pros and Cons
TaskGen Conversation with Dynamic Memory - Math Quizbot, Escape Room Solver, Psychology Counsellor
มุมมอง 1777 หลายเดือนก่อน
TaskGen Conversation with Dynamic Memory - Math Quizbot, Escape Room Solver, Psychology Counsellor
Integrate ANY Python Function, CodeGen, CrewAI tool, LangChain tool with TaskGen! - v2.3.0
มุมมอง 3967 หลายเดือนก่อน
Integrate ANY Python Function, CodeGen, CrewAI tool, LangChain tool with TaskGen! - v2.3.0
Empirical - Open Source LLM Evaluation UI
มุมมอง 3458 หลายเดือนก่อน
Empirical - Open Source LLM Evaluation UI
Impressed with your bit on self reinforcing systems find it difficult to break out of their viewpoint, nice talk John
@@MachineLearningStreetTalk Thanks Tim, glad you liked it. It is fascinating how much we are stuck in a local optima for LLMs when there is so much to learn from cognitive science. Haha more to come on that after I read more.
I've only just started this video but...All from my memory so accuracy may be variable: 1990s ALN (Artificial Logic Networks), University of Alberta, Canada I think. Nodes, irc, are trained to be AND, LEFT, RIGHT of 2 inputs. There was some success in work with tar sands and some sort of imminent mechanical failure. I started looking into this in the adaptive optics field with the idea of FPGA implementation. I didn't get very far as it wasn't what I should have actually been working on.
Thanks so much for delving deeper into this, wish you a Happy New Year from Italy.
Great talk, thank you!
All the Ai Chat apps are full of glitches o1 included. The problem is you don't get a vivid presentation of the mistakes and glitches like you do with generative images. You need to be well versed in a subject and use chatApps a fair bit to tell how far off the market they still are. Very synthetic algorithmic patterns in its writing, and you can't really get away from that.
53:08 those two phrases don't appear in the article?
Hi there, it is extracted from this sentence: On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function.
@@johntanchongmin I stand corrected :) Thanks
Maybe some tree search is done during training in the same style as stream of search (arXiv:2404.03683) by training on serialized suboptimal search strategies where errors are potentially recovered through backtracking which might overcome error accumulation, but during inference it seems implausible to me for o1 at least as for o1-pro it seems different (see 17th dec AMA session from OpenAI's API team) probably using multiple samples. And I don't think a reward model is used during inference neither, as there is QwQ 32B-Preview that is open weight reasoning model and has similar performances as o1-preview/o1-mini on AIME, MATH and LiveCodeBench and it's the same architecture and inference code (with different default temperature, top_p,..) as the standard llm Qwen2.5-32B you can try it in huggingface. All this points to the fact that reasoning models are likely just llms with a custom post-training.
I agree. If tree search is used, it is only in training. Although I really suspect adding tree search in might make the outcomes worse if the wrong heuristic is used. Hence, it may be like what I said about expanding nodes exhaustively, at least for maybe first X layers, then take all leaf nodes and do majority voting / check with ground truth
Part 1 here: th-cam.com/video/-6J0S1q03Ds/w-d-xo.html
Part 2 here: th-cam.com/video/f5obaHiOog4/w-d-xo.html
Where did OpenAI say they didn't use tree search? I think they do use tree search, specifically MCTS for generating the synthetic data for o1, then at inference time they don't use tree search. The magic is in creating the synthetic data - they take a variety of paths including some wrong paths of the tree search and chain those with keywords like "but wait, the above is getting me stuck. Let's try this instead" then jump to another branch (the branch frequently does lead to the correct answer) of the tree. The key is MCTS + "let's verify step by step" in my opinion, so they linearize the MCTS thoughts chains and train on that. Somewhere in there they're using RL also as another key ingredient. Looking forward to hear your thoughts
Add one more thing: take a look at Sasha Rush's video "speculations on o1" where he describes 4 possible approaches and he explains the stream of search approach. There are a number of problems with this approach such as collapse and loss of generality (as you noted experiencing). But their "secret sauce" could really just be a lot of hard work to overcome these issues to scale the techniques
Thanks for the insightful comments. I think tree search may be possible but it is extremely hard to get the heuristic for the nodes right. For example, in AlphaZero the value network is very hard to train and often leads to system collapse if initialised wrongly (I've trained AlphaZero before). OpenAI members have repeatedly said the underlying algo is very simple. I think tree search is good but may be too complex for self-learning.
I tried o1 with moderately complex questions regarding solar astronomy and o1 completely fell apart. It was useless. I would point the contradictions in the answers and it kept apologizing.
Indeed, out of distribution prompts may cause failure. This is why o1 is good for coding and maths - it has been trained really hard on them
Prompt that makes 4o behave like o1: ``` [Problem] Do it out by the following format, taking care to reflect, verify, clarify all assumptions: ###Thoughts### <Subtask 1 Name> <Subtask 1 Result> <Subtask 2 Name> <Subtask 2 Result> ... ###Final Answer### <Final Result> ```
Great content, thanks so much for sharing all of your videos!
What is the purpose of generating synthetic data from the model which would be used to improve itself? Wouldn't the synthetic data it produced contain the exact same biases as the model? How do you remove the inherent bias? More importantly, if it can produce expert data, why would it be used to fine-tune itself over it again considering the model was already able to produce the very same data? Does this feel like CoT or ReAct with extra steps?
@@_PranavDesai You can actually do chain of thought prompting to get the model to output more detailed steps, which it natively may not do due to web data not being of that format. Such understanding of reasoning steps can be transferred across domains by fine tuning it, resulting in a model that can do reasoning/chain of thought natively without the prompt In most cases, you have a ground truth dataset to check if the answer obtained by reasoning is correct, and so you are more assured (though not 100%) that the model is generating the right reasoning traces. Btw I myself do not believe models can actually reason like humans, but these reasoning serves as chain of thought to help guide better generation, so it plays an important role.
one important reason of producing synthetic data from the model is that it helps the model represent its knowledge, otherwise you would be feeding the knowledge from another source which it doesn't know anything about. since we want the models to be honest, which means they should learn about what they know and don't, this self-generating data is the best way to make them hallucinate less.
AgentJo GitHub repo here: github.com/tanchongmin/agentjo
Great work and very practical too, kudos 👍 How much more time for going from agentjo to agentjohn? LOL
haha it's agentjo as it sounds friendlier
Check out AgentJo GitHub here! github.com/tanchongmin/agentjo
I like this more. But how does the tech differ from hypothes.is ?
Thank you John for this explanation! Will try it out!
i am quite interested to know, the distribution of XOR/XNOR distribution here. They are very good with non linearity classification. after the training is complete, does the distribution of xor/xnor seem higher than other gates?
Can't wait to see how this will revolutionize math
may you provide code pls?
Since the connections are randomly initialized and then fixed, so it's impossible to guarantee a 100% acc even the data is guaranteed no noise?
i hear the size being of concern, but i feel the comparison is a bit off: while the number of neurons in the logic gate net scale n^2 with the input size, where as in classic neural net scale linear. the number of weights in a classic FC-NN also scale quadratically with the number of neurons. so the learnable parameter count scales quadratically in both. the logic gate net when drawn on paper may look larger, but it wont require more memory i think. in the paper they mention that they just save the random connections in form of a seed number, so that does not require any space
There’s no choice, but it feels like too many interfaces have changed. 😭
It will not work well...
I'm happy to hear alternate views. How to improve it?
No, if you have a pulse signal, it will have an infinite representation in the latent (frequency) space. No advantage! Also, this JEPA is nothing new. In the past it was called PLS (Partial Least Squares and the likes -- Kernel PLS).
The idea of a suitable abstraction space is still a good one. It is normally good to do representations in a space where it is simple. For the infinite representation in frequency space, it just means that it may not be the right space for processing. Maybe we should do in the time domain instead for that.
Related video (TaskGen Paper): th-cam.com/video/F3usuxs2p1Y/w-d-xo.html
Here's my discord group for the AgentJo discussion + Logo Design competition: discord.gg/bzp87AHJy5
Looking forward to following the development of AgentJo ( very cool name), keep up the excellent work Jon💪
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
TaskGen Repo: github.com/simbianai/taskgen AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
Book Link: www.amazon.com/Will-ChatGPT-Take-Job-Strategies-ebook/dp/B0D6Y8ZX5Y DALL-E course: www.udemy.com/course/the-dall-e-master-course/ Upcoming Video Generation Course in Singapore (Thursday 24 Oct 2024 4pm): lu.ma/p0g5ksm9
AgentJo Repo (building on TaskGen for multi-agent systems): github.com/tanchongmin/agentjo TaskGen Repo: github.com/simbianai/taskgen
Thank you for this tutorial. Would love to see integrating LanceDB (multi-modal vector db) to TaskGen.
cool wrappers
The Design Philosophy and whole architecture for taskgen was a pleasure to watch. Got to learn so much from it. I would love to contribute to the Project. Do we have any next ideas/features that we want to develop for the Project ? I have joined your Discord Channel as well
Thanks for the affirmation! I am intending to build a lot of different memory structures, you can help with that!
Pretty powerful. Thanks!!
Hi John. Was chugging along well and fine with Ollama-mistral-nemo model. All works well right till Tutorial 2 - Shared variables. improvised with another function with sqlite dbase - works fine as well. Then came this 3rd Tutorial on Memory, having OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable. Is your memory method strictly requires OpenAI API key to continue? Look forward. Cheers and appreciate your work very much.
the default memory uses openai embeddings, let me do another with sentence transformers
Update: Proudly rejected at ENNLP Industry Track 2024 for not having enough experiments!