@jnfingerle I don't know. I'm already at the "it's sad" stage. It is already a pain to use web search engines these days, because they dig up garbage that was generated from llms that have been fed garbage. We've lost so much depth already and people still think, it's funny? @dfeldman
@remenca@dfeldman this is actually a very important criticism — systems should not work on "garbage in, garbage out". As far as possible, they should work on "garbage in, error message out". That the system is not only incapable of spotting the mistake but actually in the first sentence affirms that the input was correct means you can't trust it. If its only failure mode is to confidently make an incorrect diagnosis, then how can you ever trust anything it says isn't just that?
@andrewt@dfeldman There are plenty of systems that do no report "error message out" and are still useful. It kind of reminds me of that story of that old woman who put her cat into a microwave oven, and now all microwave ovens come with explicit instructions of not putting cats into them. GPT4 has a big warning too in its frontpage about not trusting it. Yet, the author tries anyway and feigns surprise when the expected outcome occurs. This is not about the system but the user.
@remenca@dfeldman this is the issue though, right? A microwave is marketed as a cooker, AI is marketed as a general purpose trustworthy answer machine. It has the "do not trust the answer" disclaimer only for the same reason that psychic hotlines have a "for entertainment purposes only" disclaimer — so they can blame the user when something goes wrong. If you aren't meant to pay any attention to its answers, then what is the point of it?
@andrewt@remenca@dfeldman the real answer if you read the model cards is more complex, too much for your average user. To put it simply though depending on what model in use what you'll receive is a statistically likely response based on whatever you put as input, obviously with it only having tags matching what that particular model has been trained on. The less data that exists on a specific thing, the less likely it gets it right, so off the wall inputs will often result in crazy outputs.
@raptor85@remenca@dfeldman yeah, but like, that's exactly the problem, isn't it? I was once in a clinical trial looking at gingivitis and we said in the protocol we'd look at the central gingival margin — the bit of gum between the two front teeth. Then a subject turned up who had three front teeth. We had to make a call on what to do. An AI would have spat out some utter nonsense. You can't simply insist that the real world conform to the assumptions made when training the system, because it will always find a way not to, and any system you use has to be able to handle that.
@andrewt@remenca@dfeldman kind of, I'd say it's a failure on google/meta's part where the public is assuming that a generic model is good at everything, plus models are VERY sensitive to prompt formatting. (natural language works, but it's not ideal) You could quite easily train a model, say for your case specifically on data and images of human teeth, with enough well tagged inputs and exclusion of outside noise it would do an exceptionally good job at describing new images thrown at it.
@raptor85@remenca@dfeldman sure, but only if those images were like the ones it was trained on — if something unexpected turned up, which it definitely would, the model would not only fail, but fail in a completely random direction with no indication that it had done so. that's clearly much worse than if you had a human dentist look at the image and make an assessment because the human would (a) know something was wrong and (b) respond sensibly
@andrewt@remenca@dfeldman not really true at all, that's again more of just a problem specific to google/meta's generic settings in their web front end, most models when used in combination of good settings/prompting will tell you they can't make sense of an input, it's quite easy to determine that something put in simply doesn't have enough data that matches the tags to be reliable. Can't really take chatgpt's generic "google it and write a reasonable answer" settings as how everything works.
@raptor85@remenca@dfeldman I mean yes and no, like I'm sure a properly trained model designed exclusively to detect things in MRI scans could reliably reject scans that are actually fish, my worry is images of patients with unrelated benign tumours, or who've had strokes and had to rewire things around the damage — things that closely resemble the training data, but differ from it in important ways that a human would understand and an AI model cannot. And there's no real way to know if you've accounted for all of them because as I say, the real world will throw some pretty unlikely things at your system if you use it long enough.
@andrewt@raptor85@dfeldman you can account by having a good test set where you can evaluate the model. Obviously, the unseen cannot be accounted, but that also applies to humans. If a human doctor who is an expert on let's say stomachs is presented with a picture of a lung, he will fail or succeed depending on whether the disease he is trying to find appears also in stomaches or not. I don't think that this is too different from machines.
@remenca@raptor85@dfeldman no, if a human doctor who is an expert in stomachs is shown a picture of a lung, he will say "that's a lung, go ask Jane, she knows about lungs" because human doctors are intelligent beings with life experience beyond a million pictures of stomachs
@remenca But they will still have pointed out that it's a lung and not their expertise instead of confidently blabbing BS like the stochastical parrots they call "AI"/"AGI" do.
@wonka@andrewt@raptor85@dfeldman
I love how you guys use "statistical parrots" as in insult without realizing that we humans are exactly the same, hahaha.
Trial-and-error, which is the basis of all learning, is just stochastic gradient descent in an abstract form.
We can model a human as a function $h: \X \rarrow \y \in H$ such that for an input $\x$ produces an output $\y^\hat$. Now, exists a functional $\L$ that takes an $\h$ and produces a positive measure of error. Then the human updates its behavior to minimize that error, doing the opposite that caused that error.
@andrewt@remenca@dfeldman and that's really the job of the software involved, realistically if you were implementing a system for something critical like this the real workflow is to sort, flag, add descriptions, then anything questionable put at high priority on top for a doctor to verify. Remember the models themselves are basically just tags and math, you still need to WRITE the software that uses it
@raptor85@remenca@dfeldman but that's exactly the point that was being made, isn't it? Nobody's saying there's no place for basic image tagging algorithms, the argument is against slapping a language model into something important and acting like it's a replacement for human judgement, which is the main drive of the "AI" "industry" at the moment. And we figured out image tagging years ago, it's not some newfangled pipedream, I've got it on my phone and it works pretty well.
I am saying that the ai industry has invented a bullshitting machine and is trying to foist it into everything because that's how it makes money and I think this is a bad idea
I do not think language models should be allowed to make diagnoses. I do not think random untrained humans should be allowed to make diagnoses. I think doctors should make diagnoses.
@andrewt@raptor85@dfeldman I don't want to delve into what does it mean to be intelligent or not. I'm just saying that there is a machine that gets right a diagnosis more often than a human doctor does.
@remenca@raptor85@dfeldman can the machine talk to a patient, ask questions, examine them, work out what scan needs to be done, suggest medicine, discreetly enquire if those bruises are really from falling down the stairs and offer a comforting bedside manner? There is more to diagnosis than analysing images and you do need fully general intelligence and a knowledge of the world outside the human body to do it properly
@andrewt@remenca@dfeldman personally I think this is asking the wrong question, while to a degree the answer to most of this is "yes" this makes the assumption of the process fully replacing having doctors, when in practice it would be more efficiently used as a way to help doctors prioritize and get the right information into their hands faster without them having to manually sort through all the information themselves.
@raptor85@remenca@dfeldman ok but the question was specifically about making a diagnosis, and that's what "making a diagnosis" is. And we're so far away from building a machine that can do it it's laughable. Current technologies and future iterations on them can doubtless help a human do it, but they can't do it
@andrewt@remenca@dfeldman While I wouldn't trust a current gen system just yet to be accurate enough on it's own but I'd be cautious about assuming future iterations couldn't, machines are exceptionally good at finding patterns which is in the end what a diagnosis is based on, if I had to wager I'd give it 5 years before there's some level of automated urgent care centers. Don't forget we're doing things now widely considered almost impossible 5 years ago
@andrewt@raptor85@dfeldman Of course, if we are entitled to select only the cases in which our claims work perfectly, there would not be much of a discussion, don't you think. My point here is: if we train a network to process MRIs or any other medical problem and it turns out that it works better than the human doctors what do we do?
@remenca@andrewt@dfeldman or, better, train a model to sort thousands of MRIs and find ones with a high likeliness of critical issues for immediate review, which is a more likely and immediately useful use case.
@andrewt@raptor85@dfeldman I think this is relevant because gpt-4 failing to process an MRI and the entire field of AI failing in the same are different things. You are using the first failure to extend to every AI. This is an inappropriate generalization fallacy.
@remenca@raptor85@dfeldman I don't think anyone anywhere is saying that all machine learning algorithms are uniformly terrible at everything. The current pushback against ai is fairly specifically against the "use as much energy as a small town to create a plagiarism machine and push it as the solution to everything" model of ai being pushed at the moment by the grifters who got out of crypto and still have a warehouse full of graphics cards to think of a use for
@andrewt@raptor85@dfeldman I do agree that capitalism is using AI in the worst possible way, yes. I am against to many of its uses. But that does not mean that AI is useless. Only that we should get it off the hands of our capitalist overlords.
@remenca@raptor85@dfeldman I think it means it's reasonable and fair to highlight the shortcomings of the grift kind of ai using funny pictures of fish, though
@andrewt@remenca@dfeldman while it is funny you can see how the conversation quickly turns to "ALL AI IS EVIL AND BAD, people using it should be treated as criminals!" though with many people making claims that frankly have little basis in reality, even with an example of someone simply using a tool in an unexpected way in an attempt to force it to output a bad result. Basically drumming up a lot of drama over nothing
@andrewt@remenca@dfeldman this is about as accurate as saying "all doctors are just witches using leeches and snake oil". I can't really blame you though, misinformation travels around like WILD about this industry. The truth is most models can run on boards that use less than 10Watts, licenses for input data is HIGHLY policed in most models, and the only people pushing it as a solution to everything are idiots unrelated to development.
@andrewt@raptor85@dfeldman you don't know what an AI will do. If the AI have seen pictures of other people with three front teeth it will probably do it right. If not, it will likely fail or maybe it is able to generalize from cases with two teeth, like you did.
@remenca@raptor85@dfeldman right so EITHER you need to specifically train it on the edge cases that almost never come up OR it's going to guess in a haphazard way. What we did was to flag up the issue, discuss it, and work out if we needed to exclude the data, and note in the publication what we did about it. (I forget what the final decision was.) If the AI takes a guess then it won't tell you it's done it or justify its approach, it'll just spit out a number and nobody will ever know the input was weird
@andrewt@remenca@dfeldman You're basing a (wrong) assumption on how all the software works based on a single example of a generic implementation that's designed for entertainment and getting people to watch ADs. The software using the model does literally whatever you program it to do. To put into context of doctors, it's like basing your understanding of how hospitals work on having seen an episode of "House".
@andrewt@raptor85@dfeldman I don't think it is so different. After all, you tried to come up with an answer from your previous knowledge. The network will do the same. You could train it to output a confidence level if you want to. But still, it would be very much the same in my eyes.
@remenca@raptor85@dfeldman I'm not convinced by "confidence level" stuff — it's not really a confidence level, is it, it's just another output. When ChatGPT says it doesn't know someone it doesn't mean it doesn't know, because it never knows, it means "this is the sort of question that I've been trained to think 'i don't know' is an acceptable answer to"
@andrewt@raptor85@dfeldman I'm not talking about chatgpt. I'm talking about things like bayesian networks or VAEs that allow to output a confidence score with each prediction. You could augment an LLM with some of that.
@andrewt@remenca@dfeldman I think i see the disconnect here, when I'm talking AI and models LLMs are one small subset, you wouldn't identify pictures by a language model, you'd have an image recognition model for that. Sure if you want to format your output in plain english, you can also use a language model for that, but that's again up to whoever designs the system, it's by no means required. (For instance I have a model I use that produces tag clouds for input images)
@caiocgo@singe@blogdiva@dfeldman I got one of these so-called AIs to write a timelime of mobile telephones. It didn't have the full history so when I told it about Carterphone it gave me the "My apologies" line, and I kept correcting it with progressively more ludicrous statements and never once called me out. At the end, I had it telling me about Alcibiades inventing the first mobile phone to help win the Peloponnesian War.
@mike_k@dfeldman
Actual salmon expert here. This salmon is not experiencing any pain. It's perfectly normal and dead. All it needs is a bit of lemon vinaigrette.
Add comment