- 309
- 358 194
Cohere
เข้าร่วมเมื่อ 16 ธ.ค. 2021
Welcome to NLP, now.
We’re kickstarting a new chapter in machine learning by giving developers and businesses access to NLP powered by the latest generation of large language models, now.
Our platform can be used to generate or analyze text to do things like write copy, moderate content, classify data and extract information, all at a massive scale.
We’re kickstarting a new chapter in machine learning by giving developers and businesses access to NLP powered by the latest generation of large language models, now.
Our platform can be used to generate or analyze text to do things like write copy, moderate content, classify data and extract information, all at a massive scale.
Cohere For AI - Community Talks: Nikhil Vyas
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil is a postdoc at Harvard hosted by Prof. Sham Kakade. Prior to this, He was a graduate student at the TOC group at MIT where he was advised by Prof. Ryan Williams.His current research focusses on deep learning, most recently on optimization.
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: ShampoO with Adam in the Preconditioner's eigenbasis (SOAP).With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo. An implementation of SOAP is available at github.com/nikhilvyas/SOAP.
Nikhil is a postdoc at Harvard hosted by Prof. Sham Kakade. Prior to this, He was a graduate student at the TOC group at MIT where he was advised by Prof. Ryan Williams.His current research focusses on deep learning, most recently on optimization.
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: ShampoO with Adam in the Preconditioner's eigenbasis (SOAP).With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo. An implementation of SOAP is available at github.com/nikhilvyas/SOAP.
มุมมอง: 135
วีดีโอ
Roads to Research : Mentorship in ML Research
มุมมอง 166วันที่ผ่านมา
Join us for an panel discussion exploring the role of mentorship in the dynamic field of machine learning research. Our panelists will explore the impact of mentorship on shaping the careers, from fostering innovation to navigating pivotal moments, and considering when one is ready to become a mentor themself. Following the discussion, there will be an allocated Q&A session, offering attendees ...
Cohere For AI - Community Talks: Jessica Ojo
มุมมอง 93วันที่ผ่านมา
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models MSc and research student at McGill, affiliated with Mila under the supervision of Prof David Adelani. She's also a researcher at Masakhane, a community led research organization. She's currently working on multilingual natural language processing focused on under-resourced languages.
Cohere For AI - Community Talks: Gwanghyun (Bradley) Kim
มุมมอง 71วันที่ผ่านมา
BeyondScene: Higher-Resolution Human-Scene Generation With Pretrained Diffusion Speaker Bio: Gwanghyun (Bradley) Kim is a Ph.D. candidate in Electrical and Computer Engineering (ECE) at Seoul National University (SNU), under the supervision of Prof. Se Young Chun. He completed his M.S. degree at KAIST, where he was advised by Prof. Jong Chul Ye.This year, he is interning at NVIDIA Research, wor...
Cohere For AI - Community Talks: Zhijing Jin
มุมมอง 185วันที่ผ่านมา
Zhijing Jin (she/her) is an incoming assistant professor at the University of Toronto. Her PhD is at Max Planck Institute & ETH. She works on socially responsible NLP by causal inference. Specifically, her research focuses on causal reasoning with LLMs, causal methods to improve robustness, interpretability, and fairness of LLMs, as well as causal analysis of social problems. She has received 3...
Cohere For AI - Community Talks: Randall Balestriero
มุมมอง 149วันที่ผ่านมา
The Fair Language Paradox Large Language Models (LLMs) are widely deployed in real-world applications, yet little is known about their training dynamics at the token level. Evaluation typically relies on aggregated training loss, measured at the batch or dataset level, which overlooks subtle per-token biases arising from (i) varying token-level dynamics and (ii) structural biases introduced by ...
Cohere For AI - Community Talks: Panayiotis Panayiotou
มุมมอง 127หลายเดือนก่อน
Curricula for Learning Robust Policies with Factored State Representations in Changing Environments Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample...
Cohere For AI - Patrick (Tsung-Han) Wu
มุมมอง 97หลายเดือนก่อน
Visual Haystacks: Answering Harder Questions About Sets of Images Will explore new methods in Multi-Image Visual Question Answering, focusing on retrieval and reasoning across diverse image collections using the Visual Haystacks benchmark Patrick (Tsung-Han) Wu (he/him) is a second-year CS PhD student at UC Berkeley, advised by Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. His recent work ...
Cohere For AI - Roads to Research: Paths to Graduate Studies
มุมมอง 147หลายเดือนก่อน
Join us for a panel discussion with Bryan Li as part of our 'Roads to Research' series on the different pathways to graduate studies. You'll hear from speakers who have all had different journeys though grad programs and research.
Cohere For AI - Community Talks: Cong Lu
มุมมอง 258หลายเดือนก่อน
The AI Scientist One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents...
Cohere For AI - Community Talks: Dr. Shaina Raza
มุมมอง 103หลายเดือนก่อน
In addressing the critical need for safety in Large Language Models (LLMs), it is crucial to ensure that the outputs are not only safe but also retain their contextual accuracy. Many existing LLMs are safe fine-tuned either with safety demonstrations, or rely only on adversarial testing. While able to get safe outputs, they often risk losing contextual meaning as they mitigate bias and toxicity...
Cohere For AI - Community Talks: Ziqi Wang
มุมมอง 157หลายเดือนก่อน
Enabling Language Models to Implicitly Learn Self-Improvement Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model responses. To address this challenge, various approaches have been proposed to enhance t...
Cohere For AI - Community Talks: Joao Guilherme Madeira Araújo
มุมมอง 152หลายเดือนก่อน
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed ...
Cohere For AI - Community Talks: Arthur Conmy
มุมมอง 194หลายเดือนก่อน
Join us for an exclusive opportunity to hear from Arthur Conmy, a leading interpretability researcher at Google DeepMind, as he shares his research journey in mechanistic interpretability. Arthur will discuss his key insights, notable contributions, and his current work with the MATS program. He’ll also offer valuable advice for beginners in the field, making this a must-attend event for anyone...
Cohere For AI - Community Talks: Fabio Pizzati
มุมมอง 147หลายเดือนก่อน
The discussion will explore training CLIP models with synthetic data, focusing on the paper's methodologies, their effectiveness versus real data, and future research and application impacts in AI and computer vision. The associated paper for the session is @ arxiv.org/abs/2402.01832 Fabio's Bio: Fabio is a postdoctoral researcher in the Torr Vision Group (torrvision.com/index.html) at the Univ...
Cohere For AI: Community Talks: Dhruv Jain
มุมมอง 175หลายเดือนก่อน
Cohere For AI: Community Talks: Dhruv Jain
Cohere For AI - Community Talks: Maya : Multimodal Multilingual ModelMaya
มุมมอง 331หลายเดือนก่อน
Cohere For AI - Community Talks: Maya : Multimodal Multilingual ModelMaya
Cohere For AI: Community Talks - Amanuel Mersha
มุมมอง 114หลายเดือนก่อน
Cohere For AI: Community Talks - Amanuel Mersha
Cohere For AI: Community Talks -Xiang Yue
มุมมอง 188หลายเดือนก่อน
Cohere For AI: Community Talks -Xiang Yue
C4AI Expedition Aya: Identifying mislabelled samples in the Aya dataset
มุมมอง 32หลายเดือนก่อน
C4AI Expedition Aya: Identifying mislabelled samples in the Aya dataset
C4AI Expedition Aya: Doclingual - Multilingual Document Understanding
มุมมอง 20หลายเดือนก่อน
C4AI Expedition Aya: Doclingual - Multilingual Document Understanding
C4AI Expedition Aya: Enhancing Sinhala NLP
มุมมอง 72หลายเดือนก่อน
C4AI Expedition Aya: Enhancing Sinhala NLP
C4AI Expedition Aya: MCM: Multilingual LLM Capability For Crisis Management
มุมมอง 25หลายเดือนก่อน
C4AI Expedition Aya: MCM: Multilingual LLM Capability For Crisis Management
C4AI Expedition Aya: Mechanistically Understanding Aya for Safety Steering
มุมมอง 28หลายเดือนก่อน
C4AI Expedition Aya: Mechanistically Understanding Aya for Safety Steering
C4AI Expedition Aya: Multi Agent Aya for Workforce
มุมมอง 31หลายเดือนก่อน
C4AI Expedition Aya: Multi Agent Aya for Workforce
C4AI Expedition Aya: Multilingual Machine Generated Text Portion Detection
มุมมอง 32หลายเดือนก่อน
C4AI Expedition Aya: Multilingual Machine Generated Text Portion Detection
C4AI Expedition Aya: SafeAya: Multilingual Adversarial Robustness
มุมมอง 30หลายเดือนก่อน
C4AI Expedition Aya: SafeAya: Multilingual Adversarial Robustness
Is there a way to find out what is the paper she chose to present for this program? Trying to get an idea how recent the publication paper should be :)
This guy is amazing and he knows how the catch the viewers.. Respect to Andrew..
@Edward Robinson First of all, thank you for sharing such great knowledge. Second, I have a question. I have Pakistan's laws dataset, and thousands of laws are in different categories. How would I build RAG architecture to provide the best answer to user queries relevant to those laws?
Thank you to the entire team for this insightful presentation on the Multi-Agent Aya for Workforce project. The innovative approach to addressing communication barriers in distributed enterprises through a multilingual, context-aware chat application is truly commendable. The integration of AI to enhance efficiency and streamline workplace communication is a significant step forward. We look forward to seeing the future developments and integrations you have planned. Excellent work!
How can I join these live in future ?
@ Stephen Colbert ❤
this is going viral
💯🚴🏃🏃🤸🙏
39:50
k-means is poor man's analysis. It has little to no statistical reasoning for clustering. Works off heuristics 😓
Great video. Very insightful. Appreciate you guys sharing these insights on a solution that's in in production.
He doesn't look like a Salamanca. Lalo maybe?
Ahmad ❤ So proud of you ❤
Thanks for this Cohere. Looking forward
who can apply for this programme? can newbies in ML like me are eligible? I am a undergrad student pursuing btech cse and having great interest in AI/ML
Very well put discussion and very on-point. I'd like to recommend a follow up: deployment strategy from POV of enterprises and foundation model developers. Thanks.
Inspiring Journey. I'm going to apply for this year. I hope it works out :)
instead of holding everyone's hand, why not simply teach people to think critically. do you trust everyone you meet? then why should it be any different with LLM or any new shiny technology? by imposing constraints, you create cost on everyone, like California's new legislation on AI, that even Pelosi opposes. wouldn't it be better to teach people to think than not think?
Congratulations on the achievement sir Ahmad!
That was great.
I missed the info session on 16th Aug. Could you share the live session with us? It would be helpful to share it only for a limited time.
Here is the recording of the Scholars Program Info Session: th-cam.com/video/NSfCtm05e38/w-d-xo.html
amazing!
Бро настолько попали в Евразию, что молились всем Скандинавским богам, лишь бы не ткнуть в Россию я ущемлён и требую личных извинений
Бро настолько попали в Евразию, что молились всем Скандинавским богам, лишь бы не ткнуть в Россию я ущемлён и требую личных извинений
Such an insightful conversation
Nice any new model?
Vijay, I wonder the amount of AI domain knowledge you are holding to give such a wonderful talk about AI and its integration to Workflow/Service Automation. Really appreciate.
Quite clear!
Absolutely fantastic explanation ! Thank you very much for this video ! Particularly neat the explanation for sampling as a 2 step process : Going to predicted x0 and noising the prediction with the forward diffusion process one step closer to the original x0. All training objectives and sampling steps made total sense and felt the natural thing to do.
Nice.
Great talk! Thank you soooo much
Ahmet hocam başarılar, gercekten cok guzel bir proje icerisindesiniz !!!
I was going to say, stop with the AI, and employ real people instead. Ground breaking idea, I know! Unless you want a world where people don't feel motivated to learn to do anything, after all an AI could do in one minute what you could do in weeks, with years of experience behind you, yeah, why even bother? But the recording of sound is so bad, that maybe only AI could salvage it. If this was intentional, then well done.
The sound is terrible!
Command R+GraphRAG finetune coming soon
8:31 I think there is an abstract layer people are applying reasoning to and taking for grandeur. Even though we think only moving one spot away per move in the direction of weigh point is common sense and reasoning is the opposite of the reason why the model are able to excel at higher dimensional vector search and semantic meaning. The system never learns that the correct answer is X it only understand the relation between positive affirmation and the incorrect answers. Not am I saying the intellect score is a positive attitude but as for the the end-use right answer - might be wrong. General AI is Narrow AIs. 30:11 See I don't think they have noticed non determinism but just found a glitch in the reward mechanism through General Habit by exploiting human characteristics that the error margin is lost between all noise. It probably still needs A* data sets but with deterministic entropy in the weights is what I am saying;. Just because the models excel at whatever benchmarks doesn't demonstrate actual comprehension is the point.
perfect!
Las explicaciones de Luis siempre son Geniales. Gracias!
It is a great session. Do you mind sharing the slide?
Does this mean, we can start the process in say, Swahili and then convert to English, so as not to lose the particular characteristics of Swahili which happens when one starts with English?
1st
I was considering training a model on deliberate problem-solving with multiple Fermi problems, sadly there I could not find good data
good
This was a great session. Thanks.
15:00
No Fear.
Would be awesome if you could link papers!
Consider me convinced
awesome
thank u so much sir, greetings from turkiye