I put together some detailed notes showing how I use Claude and ChatGPT as part... - Random

simon, 2 months ago

I put together some detailed notes showing how I use Claude and ChatGPT as part of my daily workflow - in this case describing how I used them for a 6 minute side quest to create myself a GeoJSON map of the boundary of the Adirondack Park in upstate New York
https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jochen, ppatel

Image

Image alternative text

simon, 2 months ago

I wrote this up in part because I'm tired of hearing people complain that LLMs aren't useful. There are many valid criticisms of them as a technology, but "not being useful" should not be one of them https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/#llms-are-useful

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999, cigitalgem, filippo, BeAware +3 more

davidtoddmccarty, 1 month ago

@simon @rberger Or you could hire a human.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago

@davidtoddmccarty @rberger I don't think that works here. I'm not going to hire someone to help me make a GeoJSON boundary of the Adirondack park on a whim - this project is one of many examples of tiny little spikes of curiosity that I had which only made sense to satisfy if there was a very quick way to try things out

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago

@davidtoddmccarty @rberger I have been releasing open software for 20 years now, and every single line of code I release is designed to help avoid duplicate work - or "avoid hiring a human" if you look at it from a certain perspective

It's interesting how outsourcing to AI tools genuinely does feel different from other forms of productivity enhancing software

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scottjenson, 2 months ago

@simon I agree and thank you for writing this up. I'll be sure to check it out. I'd like to explore more reasonable uses with proper use cases but there is just so much messianic chest thumping it's hard to take it seriously. My rant a few days ago got yet another "just wait, AGI is just around the corner" silly reply.

I'm NOT saying you have any part of this. It's just when the bullshit is so thick, it's a bit hard to concentrate.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

garyfleming, 2 months ago

@simon I start thinking they’re useful but untrustworthy, due to the verification effort.

Then I realised that the usefulness is largely in relation to the effort taken to create external oracles to verify truth.

Generating a quick image? Useful. Generating precise data around something I need to be right? Not useful.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@garyfleming Right - the trick with these things is figuring out how to use them productively despite their enormous reliability problems

I love using them for code because it's very easy to check if it works or not - it's much easier to check that code at least runs and produces what looks to be the right output than it is to fact check prose

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

garyfleming, 2 months ago

@simon I agree with your first paragraph and strongly disagree with your second.

My experience of watching people do reviews of PRs tells me almost no-one can tell if non-trivial code is correct if there aren’t good tests in place.

Plenty of other domains where LLMs work well, though - principally where output is subjective.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@garyfleming Right: part of using these for real work is being incredibly effective at reviewing code and writing tests, both of which are uncommon skills

But if you're knocking out a GeoJSON boundary of a park for fun (a very low-stakes activity) the risks are pretty minimal

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

buherator, 2 months ago

@simon

"and it was clearly wrong" - Here's my theory: LLM's are useful if results are easy to verify.

In your example eyeballing can easily tell if the resulting shape is similar to the input area. As I understand your use-case doesn't require too much precision, which is totally fine, but it's important to ask how much harder your problem would get if you wanted to make sure the input and output shapes are precise matches? Would you use an LLM to write some verification code? How do you decide if that code is correct? (I think in this particular case actual verification could be actually pretty easy, but I wanted to stick with the example)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@buherator absolutely: the reason LLMs are so useful for code stuff is that code accuracy is easier to verify than prose, because you can run the code and see if it works

And it's still not easy! Using LLMs has encouraged me to really invest in improving my QA, code review and testing skills

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

buherator, 2 months ago

@simon Now let me put on my Grumpy Security Guy Hat:

Verifying code is incredibly hard. One of the main dangers of LLM's I see is that it's really easy to conclude that the code is correct because it works in the general case, but it will break havoc in edge cases. Worst, you won't be able to reason about those edge cases because you wouldn't know how the code works (you can figure it out of course, but then there goes your claimed efficiency).

Now for toy problems this is all good and well. On the other hand we've all seen toy scripts ending up in production...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@buherator that's true, but honestly it's not that different from code reviewing PRs from human authors

If anything LLM code is easier to review: it's more likely to use the simplest approach, it comes with comments that actually match what the code is doing and there's no ego: you don't have to think for a second about if your feedback or requests for changes will offend the author!

But you do have to be very good at code review to use these things responsibly

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

achilleas, 2 months ago

@simon But the interface is the point, isn't it? I agree that tools can be useful and difficult to learn and use but the promise of LLMs is the natural language interface. It's a tradeoff. You get a natural, intuitive interface at the expense of predictability, tractability, and precision. If the easy interface isn't there, then what's the point?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago (edited 2 months ago)

@achilleas I think chat is actually a really bad default UI for this stuff, because it doesn't provide any affordances that help you understand what the tool can do and how to use it

Great piece of writing about that here: https://wattenberger.com/thoughts/boo-chatbots

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

achilleas, 2 months ago

@simon Ok, we agree there. So the question is what are we spending all this effort and energy on?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@achilleas lots of hype and bluster, which is layered on top of some incredibly useful if unintuitive tools for the people who invest the time and effort to learn how to best harness them

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sszuecs, 2 months ago

@simon do you have experience with creating your own model?
I feel that "useless" depends really on what you do. I am writing special http proxy skipper and if I let generate some boilerplate code or try to generate kubernetes yaml then it codepilot/chatGPT generates all kind of things but basically all is so wrong that I am faster writing this myself. So for me it's right now pretty useless. I guess if I would train my tiny specific model it could help a lot, but I was not bored enough to do it

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@sszuecs I tried fine-tuning GPT3 once and didn't get great initial results - I get the impression that doing fine-tuning well requires a huge amount of patience and spending a lot of time and money that I don't want to spend

With LLMs I feel like iterating on prompts and RAG-tricks to populate the context window with relevant information has a much higher chance of working than fine-tuning a new model, at least for the kinds of things I use them for

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@sszuecs I did build a LLM from scratch from my blog content though! Entertaining but definitely not useful https://til.simonwillison.net/llms/training-nanogpt-on-my-blog

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jni, 2 months ago

@simon “I would miss them terribly if they were no longer available to me.” This seems like a good reason to focus on open source, local models, despite their limitations.

(But, as a side note, congrats on llm — the ergonomics are wonderful.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 2 months ago

@jni I have some llamafiles as backups! https://simonwillison.net/2023/Nov/29/llamafile/#llamafile-one-file

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

frank, 2 months ago

@simon thank you for this! I struggle to get in the habit of using LLMs for these sorts of “side quests” and just seeing your flow/process has given me a few ideas where it would have been helpful in the last couple of weeks.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment