simon,
@simon@simonwillison.net avatar

I put together some detailed notes showing how I use Claude and ChatGPT as part of my daily workflow - in this case describing how I used them for a 6 minute side quest to create myself a GeoJSON map of the boundary of the Adirondack Park in upstate New York
https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/

simon,
@simon@simonwillison.net avatar

I wrote this up in part because I'm tired of hearing people complain that LLMs aren't useful. There are many valid criticisms of them as a technology, but "not being useful" should not be one of them https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/#llms-are-useful

davidtoddmccarty,
@davidtoddmccarty@me.dm avatar

@simon @rberger Or you could hire a human.

simon,
@simon@simonwillison.net avatar

@davidtoddmccarty @rberger I don't think that works here. I'm not going to hire someone to help me make a GeoJSON boundary of the Adirondack park on a whim - this project is one of many examples of tiny little spikes of curiosity that I had which only made sense to satisfy if there was a very quick way to try things out

simon,
@simon@simonwillison.net avatar

@davidtoddmccarty @rberger I have been releasing open software for 20 years now, and every single line of code I release is designed to help avoid duplicate work - or "avoid hiring a human" if you look at it from a certain perspective

It's interesting how outsourcing to AI tools genuinely does feel different from other forms of productivity enhancing software

scottjenson,
@scottjenson@social.coop avatar

@simon I agree and thank you for writing this up. I'll be sure to check it out. I'd like to explore more reasonable uses with proper use cases but there is just so much messianic chest thumping it's hard to take it seriously. My rant a few days ago got yet another "just wait, AGI is just around the corner" silly reply.

I'm NOT saying you have any part of this. It's just when the bullshit is so thick, it's a bit hard to concentrate.

garyfleming,
@garyfleming@mastodon.social avatar

@simon I start thinking they’re useful but untrustworthy, due to the verification effort.

Then I realised that the usefulness is largely in relation to the effort taken to create external oracles to verify truth.

Generating a quick image? Useful. Generating precise data around something I need to be right? Not useful.

simon,
@simon@simonwillison.net avatar

@garyfleming Right - the trick with these things is figuring out how to use them productively despite their enormous reliability problems

I love using them for code because it's very easy to check if it works or not - it's much easier to check that code at least runs and produces what looks to be the right output than it is to fact check prose

garyfleming,
@garyfleming@mastodon.social avatar

@simon I agree with your first paragraph and strongly disagree with your second.

My experience of watching people do reviews of PRs tells me almost no-one can tell if non-trivial code is correct if there aren’t good tests in place.

Plenty of other domains where LLMs work well, though - principally where output is subjective.

simon,
@simon@simonwillison.net avatar

@garyfleming Right: part of using these for real work is being incredibly effective at reviewing code and writing tests, both of which are uncommon skills

But if you're knocking out a GeoJSON boundary of a park for fun (a very low-stakes activity) the risks are pretty minimal

buherator,
@buherator@infosec.place avatar

@simon

"and it was clearly wrong" - Here's my theory: LLM's are useful if results are easy to verify.

In your example eyeballing can easily tell if the resulting shape is similar to the input area. As I understand your use-case doesn't require too much precision, which is totally fine, but it's important to ask how much harder your problem would get if you wanted to make sure the input and output shapes are precise matches? Would you use an LLM to write some verification code? How do you decide if that code is correct? (I think in this particular case actual verification could be actually pretty easy, but I wanted to stick with the example)

simon,
@simon@simonwillison.net avatar

@buherator absolutely: the reason LLMs are so useful for code stuff is that code accuracy is easier to verify than prose, because you can run the code and see if it works

And it's still not easy! Using LLMs has encouraged me to really invest in improving my QA, code review and testing skills

buherator,
@buherator@infosec.place avatar

@simon Now let me put on my Grumpy Security Guy Hat:

Verifying code is incredibly hard. One of the main dangers of LLM's I see is that it's really easy to conclude that the code is correct because it works in the general case, but it will break havoc in edge cases. Worst, you won't be able to reason about those edge cases because you wouldn't know how the code works (you can figure it out of course, but then there goes your claimed efficiency).

Now for toy problems this is all good and well. On the other hand we've all seen toy scripts ending up in production...

simon,
@simon@simonwillison.net avatar

@buherator that's true, but honestly it's not that different from code reviewing PRs from human authors

If anything LLM code is easier to review: it's more likely to use the simplest approach, it comes with comments that actually match what the code is doing and there's no ego: you don't have to think for a second about if your feedback or requests for changes will offend the author!

But you do have to be very good at code review to use these things responsibly

achilleas,
@achilleas@mastodon.social avatar

@simon But the interface is the point, isn't it? I agree that tools can be useful and difficult to learn and use but the promise of LLMs is the natural language interface. It's a tradeoff. You get a natural, intuitive interface at the expense of predictability, tractability, and precision. If the easy interface isn't there, then what's the point?

simon, (edited )
@simon@simonwillison.net avatar

@achilleas I think chat is actually a really bad default UI for this stuff, because it doesn't provide any affordances that help you understand what the tool can do and how to use it

Great piece of writing about that here: https://wattenberger.com/thoughts/boo-chatbots

achilleas,
@achilleas@mastodon.social avatar

@simon Ok, we agree there. So the question is what are we spending all this effort and energy on?

simon,
@simon@simonwillison.net avatar

@achilleas lots of hype and bluster, which is layered on top of some incredibly useful if unintuitive tools for the people who invest the time and effort to learn how to best harness them

sszuecs,
@sszuecs@hachyderm.io avatar

@simon do you have experience with creating your own model?
I feel that "useless" depends really on what you do. I am writing special http proxy skipper and if I let generate some boilerplate code or try to generate kubernetes yaml then it codepilot/chatGPT generates all kind of things but basically all is so wrong that I am faster writing this myself. So for me it's right now pretty useless. I guess if I would train my tiny specific model it could help a lot, but I was not bored enough to do it

simon,
@simon@simonwillison.net avatar

@sszuecs I tried fine-tuning GPT3 once and didn't get great initial results - I get the impression that doing fine-tuning well requires a huge amount of patience and spending a lot of time and money that I don't want to spend

With LLMs I feel like iterating on prompts and RAG-tricks to populate the context window with relevant information has a much higher chance of working than fine-tuning a new model, at least for the kinds of things I use them for

simon,
@simon@simonwillison.net avatar

@sszuecs I did build a LLM from scratch from my blog content though! Entertaining but definitely not useful https://til.simonwillison.net/llms/training-nanogpt-on-my-blog

jni,
@jni@fosstodon.org avatar

@simon “I would miss them terribly if they were no longer available to me.” This seems like a good reason to focus on open source, local models, despite their limitations.

(But, as a side note, congrats on llm — the ergonomics are wonderful.)

simon,
@simon@simonwillison.net avatar
frank,
@frank@frankwiles.social avatar

@simon thank you for this! I struggle to get in the habit of using LLMs for these sorts of “side quests” and just seeing your flow/process has given me a few ideas where it would have been helpful in the last couple of weeks.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • kavyap
  • thenastyranch
  • tester
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • tacticalgear
  • Youngstown
  • ethstaker
  • osvaldo12
  • slotface
  • everett
  • rosin
  • khanakhh
  • megavids
  • ngwrru68w68
  • Leos
  • modclub
  • cubers
  • cisconetworking
  • Durango
  • InstantRegret
  • GTA5RPClips
  • provamag3
  • normalnudes
  • anitta
  • JUstTest
  • lostlight
  • All magazines