LLM observations

I have a lot of thoughts on “AI”, but here are a couple quick observations from giving it a fair shake as a tool in the belt of a software developer:

For well documented (but think machine friendly well documented, not like necessarily really non-technically minded human approachable stuff… looking at you Typescript) GPT can really shine. A few examples I have personally had good results with:
- help building a Regex (look, could I go relearn regex syntax for the 60th time? yes… but maybe I never will again)
- Apache eCharts configuration
- Typescript assistance
- MUI theme settings / customization
For tech that is a little more… i dunno, “move fast and break things”? it can really lead you astray by recommending things that are out of date or wrong. I was looking for way to have Next.j be able to run local env with https, and went down a wild goose chase of installing proxy servers etc, but after a while I was like “this feels ridiculous…” and just used plain search and immediately found a new, but not like all that new, setting in the next.js docs that got me just what I was going for with a single startup parameter.

Maybe these two observations boil down to one. You can trust LLMs a lot more for information about projects with Robust stable documentation specifically, if the tool/library has had a major revision, it seems to just guess at the (usually older?) version. I just tried this for Svelte. All the suggestions were based on v3 whereas v5 is new and also very well documented, but has some fundamental differences. It didn’t tell me its suggestions assumed v3 as context, and presumably because the v5 stuff has a lot less training data, even when I gave it that context its suggestions were not as strong and it mixed in a few v3 type things that weren’t correct.

And of course, yes, if you provide more context either as explicit context input or via the prompt that helps, for sure, but 1, you have to know, and remember, to do that if you’re asking about something that might have that sort of variability or instability and 2, (guessing here) because LLM don’t actually have a any sort of solid data model of the world as compared to a statistical model based around language use, things like software versions are much more permeable boundaries than you might expect.

Leaving aside all question of the ethics of things, which include questionably sourced training data, and all the hidden energy costs, LLMs can be a useful tool. The caveat is that understanding it’s strengths and weaknesses is crucial to using it effectively - same as any tool, I suppose.

If you want more similarly ambivalent commentary this was a good post

add-on / updates

If you want a solid take on why the hype around AI is what it is in spite of its obvious limitations this was a very insightful thread

If you’d like a similar (but more in depth and more negatively concluding) take after giving CoPilot a fair shake for a while check this one out - the followup discussion is worth digging into as well. - I’d specifically highlight this quote, “The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high. The entire shift-left movement is about finding and preventing bugs earlier.” This is a really really important point, I think. This sort of cost benefit analysis is crucial and often what is missing in discourse on this topic! On the other hand, without intending to diminish it at all, it sort of characterizes all programing as equally serious or something, which obviously isn’t true. There is plenty of programming where “the costs of generating incorrect code even 1% of the time are” pretty inconsequential, actually.

If not enough of the training data gets an obsolete flag…

Recently encountered a situation where a co-worker used ChatGPT to get started on a color breakdown problem for a data viz chart, the tricky bit was the number of steps of color varied so a fixed set of colors didn’t work. ChatGPT suggested color libs from npm or other js centric solutions, but modern CSS has color-mix and that is was a pretty perfect fit in this case. But thinking about training data sets, it is predictible that chatGPT would suggest what it did, and in that is a lesson. Of course LLMs are going to err on the side of using outdated approaches as more of that content exists in its training data, and very little of that old content was ever updated with something like “you shouldn’t do it this way anymore, new and better ways exist now”.