Meta Llama came up in “Dataflow Computing for AI Inference with Kunle Olukotun - #751” from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).
Quote
be necessary over that highly con strain that critical resource, that HBM bandwidth, right? And so reconfigurable data. Data flow does this by, you know, you think go back to your streaming uh idea, right? So if you take uh uh a LLM model, right? So let's take a simple one. like uh Llama 3.18B, right? 8 billion billion parameter model. It's got these decoders. Uh And it's got thirty two decoders, right? Which are essentially the same What you can do is Is that you can take that whole decoder and and map it onto a set of sixteen RDU. Chip is a reconfigurable data. So we call reconfigurable data flow architecture and the chip we call a reconfigurable data flow unit RDU. So take sixteen of these RDU. And you take the one D
Meta Llama came up in “Meta's New Model, Gemini 4, OpenAI Proposes AI Policy” from Artificial Intelligence: Educational AI News.
Quote
So you're not juggling tabs and juggling subscriptions. Alright, let's get into the first story, which is Google Gemini. I want to talk about their new open source uh situation with Gemini 4. So earlier this year. This week they have released Apache two point zero license, and basically this is their latest family of open models built specifically for reasoning and agentic workflows. What I think is really interesting. interesting about Gemini 4 is what Google is calling you know the best intelligence per parameter ratio in any open model right now. Basically you're getting the frontier level capabilities of what you Um you know something like Llama 4 Maverick requires that huge hardware setup and so they're you're basically The model already has over 400 million downloads and the community has spun up over a hundred thousand variants, which I think just kind of tells you how quickly developers are adopting. this. I think the significance is that it's less about kind of the benchmarks and it's more about the trend, right? The gap between open source and closed source models is definitely shrinking. And I think that Gemini 4 is just another data point in that direction. If we want to get into kind of the licensing on this, the Apache 2.0 license is also really important because it means that companies can actually use I remember when Llama first came out from Meta and they were…
Meta Llama came up in “Meta's New Model, Gemini 4, OpenAI Proposes AI Policy” from AI Chat: AI News & Artificial Intelligence.
Quote
So you're not juggling tabs and juggling subscriptions. Alright, let's get into the first story, which is Google Gemini. I want to talk about their new open source uh situation with Gemini 4. So earlier this year. This week they have released Apache two point zero license, and basically this is their latest family of open models built specifically for reasoning and agentic workflows. What I think is really interesting. interesting about Gemini 4 is what Google is calling you know the best intelligence per parameter ratio in any open model right now. Basically you're getting the frontier level capabilities of what you Um you know something like Llama 4 Maverick requires that huge hardware setup and so they're you're basically The model already has over 400 million downloads and the community has spun up over a hundred thousand variants, which I think just kind of tells you how quickly developers are adopting. this. I think the significance is that it's less about kind of the benchmarks and it's more about the trend, right? The gap between open source and closed source models is definitely shrinking. And I think that Gemini 4 is just another data point in that direction. If we want to get into kind of the licensing on this, the Apache 2.0 license is also really important because it means that companies can actually use I remember when Llama first came out from Meta and they were…
Meta Llama came up in “Running AI MCP Tools on Kubernetes with kagent” from Agentic DevOps : AI Engineering for Infrastructure.
Quote
operate in. And before we jump into what these the agents just jump back to what an agent is, the system prompt, the LLM. I want to just highlight something that we didn't mention and that is LLM is in the middle of this. So everything to do with LLMs, regardless of agents, like dealing with tokens, dealing with APIs and if you're running a local model dealing with the inference and the compute and the GPU and everything that you need to run that LLM, whether it's a a managed service like bedrock or if it's Claude or if it's like a Llama 3.2 running on on a pod somewhere in your cluster, you still have to deal with that on top of all this agent stuff as well. As you were zooming out from this picture of an agent and showing multiple agents, there's another view where you're zooming into the LLM part of And so I just want to make sure that a like our audience realizes that like Agents in these frameworks are not glossing over they're not like solving that for you r yet. Yeah. You're so right. And it's funny because I exist at this level. So this is how I think. But that black box is huge, right? Yeah, absolutely. Like just scaling inference is like a whole nother we would put Yeah. Put it on the cut. Yeah, but and by the way, I th I I saw wonderful I don't don't ask me what video I watch a lot of YouTube. Don't ask me what video I saw this in, but I saw some statistics.
Meta Llama came up in “Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith” from Latent Space: The AI Engineer Podcast.
Quote
I agree that it can be helpful, but I'm not sure if you're We wanted to generalize this to a very large number of models. That's one of the reasons that presenting it as ELO is quite helpful. and allows us to add models and it'll stay relevant for quite a long time. I also think it It it can be tricky looking at these exact tasks compared to the human performance. 'Cause the way that you would go about it as a human is quite different to how the models would go about it. Yeah. Uh I also like that you included Llama 4 Maverick in there. Is that like just one last like No, no, no, no, no, no. It is the it is the best model. model released by Meta and so it makes it into the home page. Yeah. And so we have Oh that's right. Oh sorry, I d yeah, I completely missed that. Okay. No, not at all. So the That which has a check it pattern. So so that is their harness, not yours, is what you're saying. Exactly. What's really interesting is that if you compare, for instance, Claude Four. Using the Claude web chatbot, it performs worse than the model. And so in every case the same thing. The model performs uh better in our agentic harness than it's web
Gemini 2.0 and other things and Q2, we said that innovation outright control and then we went into all this coding assistance. Llama 4, Cloud 4, and then in the Q3, we went into the inflection point where GPT-5 star.
Quote
predictions. Maybe before getting into the twenty twenty six predictions, what are your th big thoughts about what happened in twenty twenty-five? Of course we talked about all the four quarters starting with Q1 when I talk about we where AI is no longer saying everywhere, but it is getting more accountable in Q1. when we talked about Gemini 2.0 and other things and Q2, we said that innovation outright control and then we went into all this coding assistance. Llama 4, Cloud 4, and then in the Q3, we went into the inflection point where GPT-5 started and then inference took over the demand. And then Q4. we talked about more about assimilation phase and how Google took over again. So that is what I think we've been talking about. So before getting into the predictions phase, so Manji uh what are what is your take on twenty twenty five? Yeah no I think that's uh you guys picked up on some very good I would say observations and then right uh Because there's things things are moving very fast and now I think uh uh it's always fun to reflect back, hey, where we were in January of 2025 versus December. uh one thing i was thinking about like hey what are some of the product that really uh i think reach to a point where i call them sticky and you hear a lot of validation in terms of the use their usage and then the growth and then uh I I think my couple of top ones is notebook LM.
Meta Llama came up in “What is vLLM? | Agentic AI Podcast by lowtouch.ai” from Agentic AI Podcast.
Quote
But historically, hosting a massive model on premise was a nightmare. It was slow. Or eye touch. Exactly. But VLLM changes the economics because it makes serving so much more efficient. It suddenly becomes viable to deploy what production is. Deep calls a private AI appliance. open source model like Llama 3 or Mixtrel inside your own virtual private cloud. Right. You keep all the data within your firewall, but because of VLLM, You don't need a massive sprawling cluster to do it. You can achieve low touch automatically You can have a high performance agent running on a much more modest hardware footprint. So efficiency leads to privacy. Efficiency enabled privacy. If self hosting costs. You know, a hundred thousand dollars a month, most companies won't do it. If VLLM brings that down to ten thousand a month, suddenly every bank and hospital can afford their own private brain. That's a massive shift. It moves us away from the one giant model to Rule them all centralized approach toward a world of millions of special Trevor Burrus, Jr.: And that aligns perfectly with where the hardware is going too. I mean VLM isn't just for NVIDIA anymore, it supports AMD, it supports Intel Gaussian.
Meta Llama came up in “#177 - Instagram AI Bots, Noam Shazeer -> Google, FLUX.1, SAM2” from Last Week in AI.
Quote
But that also just shows how tricky it can be to create frontier models. And how even if you are a huge, well-capitalized company like Apple with so much great talent, that doesn't automatically mean that you can jump to the frontier. on AI. And I think it eventually they could get close enough to the front And that can be in part because of things like Llama 3.1. This open source model. And they would have to probably get a special license'cause I think Users can't use the model weights, and Apple would be one of the few companies in that category. category. But nevertheless, you could imagine it they would probably prefer striking a deal with Meta to get access to those weights and have those weights running on their own infrastructure, then be sending queries off to a third party like OpenAI. Right. And actually this reminds me we did going to this a little bit on the interview on your podcast where we talked about features uh of AI as a product as opposed to AI as a product. And I do think one of the highlights of Apple's approach is not
Meta Llama came up in “Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research” from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis.
Quote
Hello and welcome back to the Cognitive Revolution. Today I'm thrilled to welcome Cameron Burke. Berg back for his second appearance on the podcast. When Cameron was first here last year. Last November we went deep on his fascinating mechanistic AI consciousness research. Which showed that suppressing role-playing and deception features in Llama 3.3. made the model more likely to report having subjective experiences. And we also explored his philosophy of mutualism, which provides the deposits that alignment needs to flow both ways, which he memorably summed up by saying. I don't want to create something more powerful than us that has reason to see us, As a threat. As always in AI, a lot has happened in the last six months. months. Cameron's founded a new nonprofit called Reciprocal Research. He's become become the subject of a documentary called Am I? Which is currently premiering in theaters in select cities. Ahead of a public release on may fourth. And most importantly, the field of AI content.
Meta Llama came up in “Can an AI Agent Legally Own a Company? Christian van der Henst's Wild Experiment| E2283” from This Week in Startups.
Quote
You know, um go work on an open source project when they have this once in a lifetime chance to take down ten or twenty or fifty or a hundred million dollars if in equity. So the answer is it's very hard to compete with that. There is one American company, Reflection AI, that is doing research to build open models here in the United States. The problem is Jason every time I go to their website because I'm like, what's going on with reflection? There's nothing new. They haven't released a model that I can play with. They haven't they haven't done the thing that I that I want them to do, which is to put American open source or open way AI on the map again since the Llama 3 era, and it's not happening. And, you know, points to X. AI, they have open source GROC one and two. I checked. It's on Hugging Face. But that's not cutting edge. So I just think American companies could do more. And then we wouldn't have this problem and we wouldn't be C. leading so much of the global inference market to upstart Chinese AI labs, which are doing quite well, as you can see in their stock prices. So yeah. Before I let you go, Jason, do you have a certain background? Round up it appears square ground, yeah. Yeah. So I hear your beloved Knicks have uh well I hear you're quite rude to the Atlanta team. So what how do it end up 40 to 15 in the first quarter? Because that's just I have, you know, I u…
Meta Llama came up in “Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez” from Gradient Dissent: Conversations on AI.
Quote
Um and so yeah, th this idea that uh correctness is only a small piece of the story. uh is pretty neat. And it it gets back to I guess human preference that human preference is a correctness but it's about a whole lot more and vibes give us a uh you know a tool to understand that a lot more. And any any like um practical results on um can you describe the other? I think that like the the best I've gotten so far is like Llama, the Llama 3 latest generation models, they're much more friendly, they're entertaining. meaning uh they they're less formal and we've compared like chat GPT GPT four oh It's more formal. Um, it likes longer explanations. Um, we've seen formatting is pretty critical. people care a lot about formatting in the answer. So using markdown, uh using you know latex when needed. So formatting is predictive of human preference. Uh, formality is predictive of human preference, but it's very situation dependent. Some people want a friendly answer. and that's preferred. In some situations I want a again a concise formal answer. I don't want to joke around What about what about concise I I I feel like um you know my experience like open AI is a little less concise than the other models and I I can kind of feel it when people you know write to me and they've like obviously used it. I feel like Is that yep. Is my vibe
Meta Llama came up in “Ben Goertzel on "Superintelligence"” from Machine Learning Street Talk (MLST).
Quote
kind of brings in the alignment problem a little bit as well because people use that as as um a a way of talking about how dangerous it is. Do you think we have an alignment problem? Um I think the alignment problem Mm-hmm. Aaron Powell What do you mean by that? Well, so If you look at even Llama 3.1 or Chat GPT or the latest Mistro models or something. I mean you can give them ethical puzzles. And ask them what the average good-hearted intelligent person would say as the or above the average human at doing that. And my my my sons are Uh he's also an AI researcher. I mean we published a paper on that. Is that your son i is he your son? Yeah, yeah. He does? He's he's g he's giving a poster presentation here also. No way, can you send him our love? Because he he um right reposts pr Aaron Powell Wow. He's here. He's uh he's Yeah. Thank you for posting our videos. He's got a PhD in application of machine learning.