This was for Openwrt. So far my forays into AI have led me to believe that it is always helpful, and always wrong in some way. I was a ltittle dismayed yesterday that Gemini asked me which of two versions of a response I liked better. Both were wrong, so I got to choose the lesser of two weevils, and Gemini must now think one was right. Post-training tripe.
Yeah it's only about 75% right. What I mean by that is you get an answer for a simple question and it might be right almost 100% of the time. When you start asking more complex questions (building code etc), you'll notice small mistakes (if you have previous subject knowledge or fact check), however it will be mostly right. Oftentimes what I find is sometimes even when I tell it something, it will reference what I've said wrong. I'll then go back, make sure I was right, let the AI know what I said and then it will apologize for making the mistake and correct it. 6 months ago I may not have been asking it as hard of a question at times and the models I was working with weren't the best at acknowledging mistakes--so this is something they've started fixing on cloud models as well as many other issues over the last few months making AI more viable.
If you are using Gemini my suggestion is to run anythingLLM Desktop (or docker if you're familiar and want access from multiple systems), where you can get the Gemini API key and for free run the paid for online models--currently. Within Anything LLM you can adjust model parameters to make it more or less creative (temperature), add more certainty, give the model search capabilities that proxy through your PC, and enable RAG (documents that you feed the system). AnythingLLM for me simplifies all these things. Some of this is already available using the regular Google tools, like you can point it to your drive, email etc. Anythingllm gives you a little more granular management on a per chat basis and you don't have to run the models locally, although you have the option to do so.
For local AI AnythingLLM is sort of a front end for me, then you need what I refer to as a runner like Ollama, llama.cpp, or LM studio. I was using Ollama as a runner. It's easy to setup, however they haven't developed the support for AMD graphics like I'd hoped. So I've switched to LM studio and I find it's quite convenient, probably has too many options for me currently.
Even with the 8b models, you're getting pretty good accuracy compared to most cloud models you don't lose a lot of accuracy with more parameters and those cloud models are typically over 100B, some go as far as 1000B or higher. What I've found here is that the more training data doesn't necessarily mean a better model, especially with distilled models. Sort of like the more Ghz on a processor doesn't always translate to better performance. As a rule of thumb, if you have a smaller GPU or just an APU, you can use the smaller 4B models (these will run locally on CPU) and they have some for phones that are under 1B... The rule of thumb is:
4B>Highschool Level
8B> College Level
14B>Graduate Level
Then anything above that I would say is capable of being a professional of some sorts. What you'll find as you go up is less mistakes, slightly better capabilities, and maybe less hallicinatuons (not always). I'm interested more in running them locally, we don't need all these datacenters, tracking, etc.

. This is part of how we get RAM prices back to normal, get rid of all the tracking & all the other issues we're seeing. Leave some folks holding the bag....