HN comments for: Anthropic Claude 3.5 can create icalendar files, so I did this

wenc

69 replies

12h44m

2024-08-25 05:44:25 UTC

You just have to double check the results whenever you tell Claude to extract lists and data.

99.9% of it will be correct, but sometimes 1 or 2 records are off. This kind of error is especially hard to notice because you're so impressed that Claude managed to do the extract task at all -- plus the results look wholly plausible upon eyeballing -- that you wouldn't expect anything to be wrong at all.

But LLMs can get things ever slightly wrong when it comes to long lists/tables. I've been bitten by this before.

Trust but verify.

(edit: if the answer is "machine verifiable", one approach to ask an LLM to write a Python validator which it can execute internally. ChatGPT can execute code. I believe Sonnet 3.5 can too, but I haven't tried.)

disillusioned

44 replies

10h16m

2024-08-25 08:12:28 UTC

The "off by one" predilection of LLMs is going to lead to this massive erosion of trust in whatever "Truth" is supposed to be, and it's terrifying and going to make for a bumpy couple of years. (Or the complete collapse of objective knowledge, on a long enough time horizon.)

It's one thing to ask an LLM when George Washington was born, and have it return "May 20, 2020." It's another thing to ask it, and have it matter-of-factly hallucinate "February 20, 1733." At first glance, that... sounds right, right? President's Day is in February, and has something to do with his birthday? And that year seems to check out? Good enough!

But it's not right. And it's the confidence and bravado with which LLMs report these "facts" that's terrifying. It just misstates information, calculations, and detail work, because the stochastic model compelled it to, and there wasn't sufficient checks in place to confirm or validate the information.

Trust but verify is one of those things that's so paradoxical and cyclical: if I have to confirm every fact ChatGPT gives me with... what I hope is a higher source of truth like Wikipedia, before it's overrun with LLM outputs... then why don't I just start there? If I have to build a validator in Python to verify the output then... why not just start there?

We're going to see some major issues crop up from this sort of insidious error, but the hard part about off-by-ones is that they're remarkably difficult to detect, and so what will happen is data will slowly corrupt and take us further and further off course, and we won't notice until it's too late. We should be so lucky that all of LLMs' garbage outputs look like glue on pizza recommendations, but the reality is, it'll be a slow, seeping poisoning of the well, and when this inaccurate output starts sneaking into parts of our lives that really matter... we're probably well and truly fucked.

stavros

31 replies

8h48m

2024-08-25 09:40:38 UTC

This is semi-offtopic, but "trust but verify" is an oxymoron. Trusting something means I don't have to verify whether it's correct (I trust that it is), so the saying, in the end, is "don't verify but verify".

cromka

11 replies

8h40m

2024-08-25 09:49:02 UTC

A better phrase would be “Use it but verify”, simply.

stavros

10 replies

8h38m

2024-08-25 09:51:10 UTC

Yes, which boils down to "verify".

8 replies

7h47m

2024-08-25 10:42:21 UTC

It's possible to trust (or have faith) in my car being able to drive another 50k miles without breaking down. But if I bring it to a mechanic to have the car inspected just in case, does that mean I never had trust/faith in the car to begin with?

"I trust my coworkers write good code, but I verify with code reviews" -- doing code reviews doesn't mean you don't trust your coworker.

Yet another way to look at it: people can say things they believe to be true but are actually false (which isn't lying). When that happens, you can successfully trust someone in the sense that they're not lying to you, but the absence of a lie doesn't guarantee a truth, so verifying what you trust to be true doesn't invalidate your trust.

stavros

6 replies

7h41m

2024-08-25 10:47:42 UTC

We're getting into the definition of trust, but to me trust means exactly "I don't need to verify".

If I say I trust you to write correct code, I don't mean "I'm sure your mistakes won't be intentional", I mean "I'm sure you won't have mistakes". If I need to check your code for mistakes, I don't trust you to write correct code.

I don't know anyone who will hear "I trust you to write correct code, now let me make sure it's correct" and think "yes, this sentence makes sense".

rpdillon

2 replies

5h51m

2024-08-25 12:38:09 UTC

to me trust means exactly "I don't need to verify".

If you use the slightly weaker definition that trust means you have confidence in someone, then the adage makes sense.

stavros

1 replies

5h2m

2024-08-25 13:26:39 UTC

The issue here is that the only value of the adage is in the sleight of hand it lets you perform. If someone asks "don't you trust me?" (ie "do you have to verify what I do/say?"), you can say "trust, but verify!", and kind of make it sound like you do trust them, but also you don't really.

The adage doesn't work under any definition of trust other than the one it's conflicting with itself about.

rpdillon

0 replies

1h1m

2024-08-25 17:27:42 UTC

I think I just provided an example where it makes sense.

Specifically: I have confidence in your ability to execute on this task, but I want to check to make sure that everything is correct before we finalize.

2 replies

7h29m

2024-08-25 11:00:16 UTC

“I trust that you believe your code is correct, now let’s double check”.

Or maybe the proverb needs to be rewritten as “feign trust and verify”

Lerc

1 replies

6h32m

2024-08-25 11:56:46 UTC

Or assume good faith but, since anyone can make mistakes, check the work anyway.

That's a bit wordy but I'm sure someone can come up with a pithy phrase to encapsulate the idea.

stavros

0 replies

5h2m

2024-08-25 13:26:57 UTC

"Trust, but verify"?

touisteur

0 replies

7h3m

2024-08-25 11:26:15 UTC

It's a matter of degrees. Absolute trust is a rare thing, but people have given examples of relative trust. Your car won't break down and you can trust it with your kids' life, almost never challenging its trustworthiness, but still you can do checkups or inspections, because some of the bult-in redundancies might be strained. Trusting aircraft but still doing inspections. Trusting your colleagues to do their best but still doing reviews because every fucks up once in a while.

The idea of trusting a next-token-predictor (jesting here) is akin to trusting your System 1 - there's a degree to find where you force yourself to enable System 2 and correct biases.

EGreg

0 replies

8h31m

2024-08-25 09:57:55 UTC

I always wondered about that

Diti

4 replies

8h10m

2024-08-25 10:18:35 UTC

French armed forces have a better version of this saying. “Trust does not exclude control.” They’re still going to check for explosives under cars that want to park in French embassies.

RandomThoughts3

2 replies

7h34m

2024-08-25 10:54:31 UTC

It’s interesting to notice that etymologically speaking the French and English words have completely different roots and therefore evokes slightly different ideas which are lost in translation.

Trust shares its root with truth. It’s directly related to believing in the veracity of something.

Confiance comes from the Latin confidere which means depositing something to someone while having faith they are going to take good care of it. The accent is on the faith in the relationship, not the truthfulness. The tension between trust and control doesn’t really exist in French. You can have faith but still check.

mathgeek

1 replies

6h27m

2024-08-25 12:02:10 UTC

Trust shares its root with truth. It’s directly related to believing in the veracity of something.

Would you mind sharing your reference on that? All the etymology sites I rely on seem to place the root in words that end up at "solid" or "comfort".

RandomThoughts3

0 replies

6h6m

2024-08-25 12:23:13 UTC

Definitely and that’s not incompatible with what I’m saying.

You are indeed looking far back to the Proto-Indo-European where words are very different and sometimes a bit of guesses.

If you look at the whole tree, you will see that both trust, truth and true share common Germanic roots (that’s pretty obvious by looking at them) which is indeed linked with words meaning “solid” and then “promise, contract”.

What’s interesting is that the root is shared between “truth” and “trust” while in French it’s not (vérité from veritas vs confiance from confere).

hoistbypetard

0 replies

6h56m

2024-08-25 11:32:56 UTC

I think a better translation of "control" in that saying is "checking" or "testing". "Control" in present-day English is a false cognate there.

IWeldMelons

3 replies

7h3m

2024-08-25 11:25:41 UTC

This is not quite true. "Trust" is to give a permission for someone to act on achieving some result. "Verify" means assess the achieved result, and correct aposteriori the probability with which said person is able to achieve the abocementioned result. This is the way Bayesian reasoning works.

Trust has degrees. What you have brought is "unconditional trust". Very rarely works.

stavros

2 replies

5h11m

2024-08-25 13:18:13 UTC

"Trust" is to give a permission for someone to act on achieving some result.

This would make the sentence "I asked him to wash the dishes properly, but I don't trust him", as your definition expands this to "I asked him to wash the dishes properly, but I didn't give him permission to achieve this result".

If you say "I asked someone to do X but I don't trust them", it means you aren't confident they'll do it properly, thus you have to verify. If you say "I asked him to do X and I trust him, so I don't need to check up on him", it's unlikely to leave people puzzled.

It's surprising to me to see this many comments arguing against the common usage of trust, just because of a self-conflicting phrase.

neom

1 replies

4h58m

2024-08-25 13:30:51 UTC

Why could I not say "I trusted him to do the dishes properly, after he was done, I verified, it's a good thing I trusted him to do the dishes properly, my supervision would have been unwarranted and my trust was warranted?"

I trusted someone to do their task correctly, after the task was done, I verified my trust was warranted.

stavros

0 replies

1h38m

2024-08-25 16:50:45 UTC

What would be different if you didn't trust them to do it correctly?

survirtual

2 replies

7h20m

2024-08-25 11:09:12 UTC

Is it an oxymoron to generate an asymmetrical cryptographic signature, send it to someone, and that someone verify the signature with the public key?

Why not just "trust" them instead? You have a contact and you know them, can't you trust them?

This is what "trust but verify" means. It means audit everything you can. Do not really on trust alone.

An entire civilization can be built with this methodology. It would be a much better one than the one we have now.

stavros

0 replies

7h16m

2024-08-25 11:13:06 UTC

Is it an oxymoron to generate an asymmetrical cryptographic signature, send it to someone, and that someone verify the signature with the public key?

Of course not. I verify because I don't trust them.

Why not just "trust" them instead? You have a contact and you know them, can't you trust them?

No, the risk of trust is too high against the cost of spending a second verifying.

This is what "trust but verify" means. It means audit everything you can. Do not really on trust alone.

Your comment just showed an example of something I don't trust and asked "why not trust instead"? The question even undermines your very point, because "why not trust them instead?" assumes (correctly) that I don't trust them, so I need to verify.

TeMPOraL

0 replies

6h45m

2024-08-25 11:43:31 UTC

An entire civilization can be built with this methodology. It would be a much better one than the one we have now.

No, it wouldn't. Trust is an optimization that enables civilization. The extreme end of "verify" is the philosophy behind cryptocurrencies: never trust, always verify. It's interesting because it provides an exchange rate between trust and kilowatt hours you have to burn to not rely on it.

msabalau

2 replies

5h4m

2024-08-25 13:24:29 UTC

Pragmatically, the statement was made famous in English by a conservative US president, addressing the nation, including his supporters, who trusted him, but not the Soviets with whom he was negotiating.

Saying, in effect: "you trust in me, I'm choosing to trust that it makes sense to make agreement with the USSR, and we are going to verify it, just as we would with any serious business, as is proverbially commonsensical" is a perfectly intelligible.

There is nothing cunning about clinging to a single, superficial, context free reading of language.

Human speech and writting is not code, ambiguity and containing a range of possible meanings is part of its power and value.

stavros

0 replies

4h56m

2024-08-25 13:32:43 UTC

So "trust me, but verify others"? Where have you seen this adage used in this sense? It's not even used like that in the original Russian, where Reagan lifted it from.

AmericanChopper

0 replies

3h33m

2024-08-25 14:55:37 UTC

I think that’s a rather peculiar interpretation. I always thought it was pretty obvious that Reagan was just saying that he didn’t trust the soviets, and found a polite excuse not to in the form of the Russian proverb.

jrflowers

0 replies

8h26m

2024-08-25 10:02:25 UTC

Yep.

https://en.m.wikipedia.org/wiki/Trust,_but_verify

jgalt212

0 replies

3h30m

2024-08-25 14:58:35 UTC

Trust, but verify (Russian: доверяй, но проверяй, romanized: doveryay, no proveryay, IPA: [dəvʲɪˈrʲæj no prəvʲɪˈrʲæj]) is a Russian proverb, which rhymes in Russian. The phrase became internationally known in English after Suzanne Massie, a scholar of Russian history, taught it to Ronald Reagan, then president of the United States, who used it on several occasions in the context of nuclear disarmament discussions with the Soviet Union.

enoch_r

0 replies

1h42m

2024-08-25 16:47:11 UTC

It basically means "trust, but not too much."

SoftTalker

0 replies

2h24m

2024-08-25 16:04:49 UTC

That's only one possible meaning of the word "trust," i.e. a firm belief.

Trust can also mean leaving something in the care of another, and it can also mean relying on something in the future, neither of these precludes a need to verify.

Edit: jgalt212 says in another reply that it's also the English translation of a Russian idiom. Assuming that's true, that would make a lot of sense in this context, since the phrase was popularized by Reagan talking about nuclear arms agreements with the USSR. It would be just like him to turn a Russian phrase around on them. It's somewhat humorous, but also conveys "I know how you think, don't try to fool me."

ljm

2 replies

6h30m

2024-08-25 11:59:19 UTC

I believe it’s going to become counter productive sooner than anyone might think, and in fairly frustrating ways. I can see a class of programmers trading their affinity with the skill for a structurally unstable crutch.

I was using Perplexity with Claude 3.5 and asked it how I would achieve some task with langchain and it gleefully spat out some code examples and explanations. It turns out they were all completely fabricated (easy to tell because I had the docs open and none of the functions it referred to existed), and when asked to clarify it just replied “yeah this is just how I imagine it would work.”

FooBarWidget

1 replies

5h43m

2024-08-25 12:46:15 UTC

One technique to reduce hallucinations is to tell the LLM "don't make things up, if you don't know then say so". Make a habit of saying this for important questions or questions for which you suspect the LLM may not know.

ljm

0 replies

1h55m

2024-08-25 16:34:20 UTC

It's hit and miss, for the same reason Google is (and increasingly more so). If you try and search for 'langchaingo' then you might get lucky if you add enough into the query to say you're working with go, but otherwise it'd just see 'langchain'.

Google is pretty much useless for the same reason.

They're not actually more intelligent, they're more stupid, so you have to provide more and more context to get desired results compared to them just doing more exact searching.

Ultimately they just want you to boost their metrics with more searches and by loading more ads with tracking, so intelligently widening results to do that is in their favour.

SoftTalker

1 replies

2h34m

2024-08-25 15:54:27 UTC

Maybe it will be analogous to steel. For most of the post-nuclear age, steel has been contaminated with radionuclides from atmospheric nuclear weapon use and testing. To get "low background" steel you had to recycle steel that was made before 1945. Maybe to fact-check information we'll eventually have to go to textbooks or online archives that were produced before 2023.

(Steel contamination has slowly become less of an issue as most of the fallout elements have decayed by now. Maybe LLMs will get better and eventually the hallucinated "facts" will get weeded out. Or maybe we'll have an occasional AI "Chernobyl" that will screw everything up again for a while.)

ravetcofx

0 replies

2h10m

2024-08-25 16:19:09 UTC

Because of llm Internet we have today, I already do go out of my way to find books and information that I can audit were written by a human before GPT.

wiltonn

0 replies

5h39m

2024-08-25 12:49:56 UTC

Agree with this sentiment, erosion of trust and potential issues. The illusion of facts and knowledge is a great moral hazard that AI companies are willing to step around while the market share battles play out. More responsible AI companies, stronger government policy, better engineering and less dumb users are all part of the solution here.

This is more solvable from an engineering perspective if we don't take the approach that LLMs are a hammer and everything is a nail. The solution I think is along the lines of breaking the issue down into 2-3 problems: 1) Understand the intent of question, 2) Validating the data in resultset and 3) provide a signal to the user of the measure to which the result matches the intent of the intention.

LLMs work great to understand the intent of the request; To me this is the magic of LLM - when I ask, it understands what I'm looking for as opposed to google has no idea, here's a bunch of blue links - you go figure it out.

However, more validation of results is required. Before answers are returned, I want the result validated with a trusted source. Trust is a hard problem..and probably not in the purview of the LLM to solve. Trust means different things in different contexts. You trust a friend because they understand your worldview and they have your best interest in mind. Does an LLM do this? You trust a business because they have consistently delivered valuable services to their customers, leveraging proprietary, up-to-date knowledge acquired through their operations, which rely on having the latest and most accurate information as a competitive advantage. Descartes stores this mornings garbage truck routes for Boise IA in its route planning software - thats the only source I trust for Boise IA garbage truck routes. This, I believe is the purpose for tools, agents and function calling in LLMs, and APIs from Descartes.

But this trust needs to be signaled to the user in the LLM response. Some measure of the original intent against the quality of the response needs to be given back to the user so that its not just an illusion of the facts and knowledge, but a verified response that the user can critically evaluate as to whether it matches their intent.

victorbjorklund

0 replies

7h47m

2024-08-25 10:41:33 UTC

yea, I seen it alot on social media where people use ChatGPT as a source for things it possible cant know. Often with leading questions.

tmikaeld

0 replies

8h10m

2024-08-25 10:19:09 UTC

This is why I don't use code LLM code generators for citing or outputting a solution that includes the current code, because it's inclined to remove parts that it thinks don't matter, but rather matter a lot further down the line. And if it's not caught in code reviews, that can cause severe and difficult to debug issues. I'm sure there will be an epidemic of these issues in a few years, because developers are definitely lazy enough to rely on it.

morsch

0 replies

8h35m

2024-08-25 09:53:47 UTC

why not just start there?

Because there are many categories of problems where it's much easier to verify a solution than it is to come up with it. This is true in computer science, but also more generally. Having an LLM restructure a document as a table means you have to proofread it, but it may be less tedious than doing it yourself.

I agree that asking straightforward factual questions isn't one of those cases much like I agree with most of your post.

ks2048

0 replies

8h30m

2024-08-25 09:59:04 UTC

Off topic, but a funny thing about asking about George Washington's birthday is there are two possible answers because of British calendar reform in 1750 (although we've settled on recognizing the new-style date as his birthday).

footnote [a] on wikipedia: https://en.wikipedia.org/wiki/George_Washington#cite_note-3

kovezd

0 replies

1h42m

2024-08-25 16:47:20 UTC

The "off by one" predilection of LLMs is going to lead to this massive erosion of trust in whatever "Truth" is supposed to be, and it's terrifying and going to make for a bumpy couple of years.

This sounds like searching for truth is a bad thing, but instead is what has triggered every philosophical enquiry in history.

I'm quiet bullish, and think that LLMs will lead to a Renaissance in the concept of truth. Similar to what Wittgenstein did, Plato's cavern or late middle age empiricists.

ardaoweo

0 replies

6h41m

2024-08-25 11:48:07 UTC

Already quite a while ago I was entertained by a particular British tabloid article, which had been "AI edited". Basically the article was partially correct, but then it went badly wrong because the subject of the article was about recent political events that had happened some years after the point where LLM's training data ended. Because of this, the article contained several AI-generated contextual statements about state of the world that had been true two years ago, but not anymore.

They quietly fixed the article only after I pointed its flaws out to them. I hope more serious journalists don't trust AI so blindly.

sorokod

3 replies

9h26m

2024-08-25 09:02:43 UTC

Isn't asking an LLM "to write a Python validator" suffers from the 99.9% (or whatever the error rate for validators written by Claude) problem?

bonzini

1 replies

9h3m

2024-08-25 09:25:28 UTC

The difference is that you're asking it to perform one intellectual task (write a program) instead of 100 menial tasks (parse a file). To the LLM the two are the same level of complexity, so performing less work means less possibility of error.

Also, the LLM is more likely to fail spectacularly by hallucinating APIs when writing a script, and more likely to fail subtly on parsing tasks.

dbaupp

0 replies

8h32m

2024-08-25 09:57:13 UTC

In addition to what you say, it can also be easier for a (appropriately-skilled) human to verify a small program than to verify voluminous parsing output, plus, as you say, there's the semi-automated "verification" of a very-wrong program failing to execute.

jessekv

0 replies

8h20m

2024-08-25 10:08:31 UTC

All tests have this problem. We still write them for the same reasons we do double-entry bookkeeping.

wanderingmind

2 replies

6h36m

2024-08-25 11:53:12 UTC

Kind of a noob question, is it possible to design a GAN type network with LLM, where one (or many) LLMs generate outputs, while a few other LLMs validate or discriminate them and thus improving generator LLMs accuracy.

oneshtein

1 replies

1h53m

2024-08-25 16:36:16 UTC

Yes, you can use AI to spot errors in AI output. Done before with good results, but it requires to run 2 different, but equally good, AI models in parallel, which is way more expensive than 1 model.

ada1981

0 replies

1h47m

2024-08-25 16:42:17 UTC

"which is way more expensive than 1 model."

In this case "way more" means exactly 2x the cost.

onlyrealcuzzo

1 replies

6h33m

2024-08-25 11:55:53 UTC

99.9% of it will be correct, but sometimes 1 or 2 records are off. This kind of error is especially hard to notice because you're so impressed that Claude managed to do the extract task at all -- plus the results look wholly plausible upon eyeballing -- that you wouldn't expect anything to be wrong at all.

That's far better than I would do on my own.

I doubt I'd even be 99% accurate.

If it's really 99.9% accurate for something like this - I'd gladly take it.

croes

0 replies

1h33m

2024-08-25 16:56:10 UTC

The problem is that people could be impressed and use it for things where 0.01% could lead to people getting hurt or even get killed.

nabla9

1 replies

9h32m

2024-08-25 08:56:30 UTC

They are just human.

IshKebab

0 replies

9h19m

2024-08-25 09:09:35 UTC

They aren't the same as humans. They definitely work differently.

Also they've been trained to say something as plausible as possible. If it happens to be true, then that's great because it's extra plausible. If it's not true, no big deal.

While I have worked with one awful human in the past who was like that, most thankfully aren't!

MattGaiser

1 replies

8h18m

2024-08-25 10:11:14 UTC

Have two LLMs do the task and compare.

campers

0 replies

7h21m

2024-08-25 11:07:59 UTC

Or three or four or five! https://openreview.net/pdf?id=zj7YuTE4t8

websap

0 replies

4h22m

2024-08-25 14:07:14 UTC

It would be interesting to see how well does a human do it? Are they correct more than 99.9% of the time?

troupo

0 replies

6h9m

2024-08-25 12:19:30 UTC

You just have to double check the results whenever you tell Claude to extract lists and data.

There's also the problem that they are tuned to be overly helpful. I tried a similar thing described in the OG article with some non-English data. I could not stop Claude from "helpfully" translating chunks of data into English.

"include description from the image" would cause it to translate it, and "include description from the image, do not translate or summarize it" would cause it to just skip it.

thelittleone

0 replies

12h1m

2024-08-25 06:28:22 UTC

Can solve that to have a similar or lower error probability as a human by running the results through a verification agent.

surfingdino

0 replies

8h17m

2024-08-25 10:11:48 UTC

Google's Gemini dev docs contain a few of such warnings. For a good reason, these models make stuff up and the domain where ROI is positive is small.

mmahemoff

0 replies

9h28m

2024-08-25 09:00:34 UTC

Yes, probably better to get the LLM to write the script.

Example, I was trying out two podcast apps and wanted to get a diff of the feeds I had subscribed to. I initially asked the LLM to compare the two OPML files but it got the results wrong. I could have spent the next 30 minutes prompt engineering and manually verifying results, but instead I asked it to write a script to compare two LLMs, which turned out fine. It's fairly easy to inspect a script and be confident it's _probably_ accurate compared to the tedious process of checking a complex output.

jph00

0 replies

10h23m

2024-08-25 08:05:32 UTC

Sonnet on Claude.ai can not execute code, although it often pretends otherwise.

huijzer

0 replies

7h7m

2024-08-25 11:21:47 UTC

I completely agree. If correctness matters, then it’s probably better to use LLMs to write the code than to let LLMs be the code.

dan-robertson

0 replies

8h18m

2024-08-25 10:10:46 UTC

If it’s correct 99.9% of the time, and the piano lessons are every two weeks, that’s one error in piano lesson scheduling over 40 years. That sounds good enough to me to not verify.

bhl

0 replies

1h54m

2024-08-25 16:34:50 UTC

I wonder if tool-calling to output schema'd json would have a low error rate here. For each field, you could have a description of what is approximately right, and that should anchor the output better than a one-off prompt.

agumonkey

0 replies

6h56m

2024-08-25 11:33:16 UTC

Can Claude tdd itself ? Lean-Claude

Foobar8568

0 replies

8h59m

2024-08-25 09:29:47 UTC

Well for my case, I have 0 trust in chatgpt ( or local LLM) to extract properly data from a PDF file, especially if it's over a few pages.

sen

11 replies

11h0m

2024-08-25 07:29:17 UTC

I did a similar thing with ChatGPT-4o giving it a txt file list of movie names, and it returned the release date of each one, then gave me an iCal file of all those movies with their release dates as recurring yearly anniversary events so I can watch my favorite movies on their anniversary.

I’ve done it for a few friends as well now and it’s got a 100% success rate so far, across over 100 total movie names.

dotancohen

6 replies

9h57m

2024-08-25 08:31:27 UTC

To bad that wouldn't work for good music albums. For many years (still, maybe?) albums would come out on Tuesdays. So one day of the week you'd have them all bunched up together.

That's how I remember that September 11, 2001 was a Tuesday. Album day, and there was a good one that day, too.

Tempest1981

2 replies

9h35m

2024-08-25 08:53:27 UTC

Jay-Z? Or Bob Dylan?

slfnflctd

1 replies

6h40m

2024-08-25 11:49:13 UTC

I always think of God Hates Us All by Slayer. Can't quite pinpoint exactly why.

dotancohen

0 replies

5h24m

2024-08-25 13:04:40 UTC

That's the one!

sahmeepee

1 replies

9h5m

2024-08-25 09:23:58 UTC

You wouldn't have them bunched up if you were celebrating the anniversaries, unless all the music you were interested in was from the same year (or only years where 11 September was a Tuesday, say).

dotancohen

0 replies

5h24m

2024-08-25 13:05:20 UTC

Oh, right, good point )) I should take the rest of the day off.

glasshug

0 replies

5h33m

2024-08-25 12:56:22 UTC

Still happens, but it’s Friday now! https://en.wikipedia.org/wiki/Global_Release_Day

stavros

2 replies

8h35m

2024-08-25 09:54:04 UTC

Did you go back and check all the dates?

sen

1 replies

6h31m

2024-08-25 11:58:01 UTC

Yup, we tried scraping IMDB for dates too to compare which method was easiest/best and it was spot on (and easier!). I guess the release dates are mentioned enough around the web that it has enough reference. It does use the US release dates for everything though which was one issue, but that’s almost always the first release for major movies anyway.

stavros

0 replies

5h8m

2024-08-25 13:20:27 UTC

Oh nice! Though I'd probably ask it to write some parsing code for the page instead, that way I'm more sure that the results are OK...

orhmeh09

0 replies

10h31m

2024-08-25 07:57:50 UTC

That is a really fun idea, I love it.

buro9

8 replies

11h55m

2024-08-25 06:33:31 UTC

I'm finding Claude to hallucinate less than ChatGPT, and to be far more accurate at coding than CoPilot. Pleasantly surprised on both counts.

Example hallucinations from ChatGPT include researching the dates of historical events for the company I work at, trivially verifiable by me but I was being lazy... ChatGPT told me about blog posts that never existed and I could prove never existed, Claude was spot on with dates and source links (but only appeared to have data through to the end of last year).

On coding, CoPilot came up with reasonable suggestions but Claude was able to take a file as an input and match the style of the code within the repo.

Claude, for me, is starting to hint at what a highly productive assistant can achieve.

romeros

5 replies

11h46m

2024-08-25 06:42:28 UTC

For some reason I felt Claude was Yahoo to ChatGPT's Google in the initial days. It just felt like ChatGPT had an insurmountable moat.

I played with Claude and it is just insanely superior compared to ChatGpt as of now. Someone on HN commented the same thing a few months back but I did not take it very seriously. But, Claude is just superior to Chatgpt.

wenc

2 replies

11h14m

2024-08-25 07:14:57 UTC

I pay for both Claude and ChatGPT.

Claude 3.5 is now my daily driver but it still refuses to answer certain questions especially about geopolitics so I still have to go back to ChatGPT.

I don’t agree that Claude is insanely superior to ChatGPT though. It still has trouble with LaTeX and optimization formulations. Its coding abilities are good but sometimes ChatGPT does better.

This is why I keep both subscriptions. At $40/math I get the best of both worlds.

BoorishBears

1 replies

9h39m

2024-08-25 08:49:52 UTC

I've never had 3.5 refuse a blunt callout, unlike 3

For example if I get a refusal to answer a question, a short blunt reply of "Are you seriously taking my question in such bad faith?" or "Why are you browbeating me for asking that" gets it unstuck

ascorbic

0 replies

8h32m

2024-08-25 09:56:32 UTC

I like to use "Please interpret my request charitably", because charitability is one of the character traits that it is meant to be trained for. https://www.youtube.com/watch?v=iyJj9RxSsBY

zoover2020

0 replies

11h25m

2024-08-25 07:04:03 UTC

I've felt this for a long time, been using Claude for over a year now internally - super impressed w/ it from the start and the 3.5 model is blazingly fast. Which version of C-GPT did you compare it to?

dzhiurgis

0 replies

10h54m

2024-08-25 07:35:06 UTC

Claude’s recent code preview rocks, especially for basic web design (well, not so basic - it was able to design 3d animated svg of winamp).

ChatGPT’s code interpreter and data analysis is killer too.

Seems I’ll need to keep paying for both.

yas_hmaheshwari

0 replies

9h16m

2024-08-25 09:13:13 UTC

This is what Andrej Karpathy also observed here: https://x.com/karpathy/status/1827143768459637073

( He did not mention anything about ChatGPT, but him using Claude instead says a lot )

My takeaway: Time for me to move from ChatGPT to Claude

stavros

0 replies

8h29m

2024-08-25 10:00:00 UTC

I made a mobile app using an LLM (I'd never written React Native before) and GPT-4 was constantly making mistakes after the code was more than a hundred lines or so. GPT-4o was even worse.

Claude, in comparison, dealt with everything much more competently, and also followed my instructions much better. I use Claude nowadays, especially for coding.

albert_e

3 replies

8h14m

2024-08-25 10:14:34 UTC

Many folks like schools, employers, etc still publish calendars in PDF format.

They spend more time on branding and visual formatting, than trying to create the same in formats that we can import into our calendar apps and actually use it practically.

I wonder if there is a two step process we can follow that will more robustly generalize ...

1. Read any document and convert into into a simple table that tries to tabulate: date, time (+timezone), venue, url, notes, recurrence

2. Read a table that roughly has the above structure and use that to create google/ical/ics files or links

We may be able to fine tune a model or agent separately to do each step very well.

bonaldi

2 replies

6h0m

2024-08-25 12:28:50 UTC

You badly overestimate the technical chops of the parent-guardian cohort and underestimate the practicality of a printable PDF with key dates.

albert_e

1 replies

3h30m

2024-08-25 14:59:07 UTC

This is not either/or.

We can have both.

Any organization in 2024 creating a calendar that impacts 500-1000 people should have the awareness that people use digital calendars / apps too.

They should make an effort to publish both PDF and ical.

Since that has yet to catch on in many places -- including the HR departments of large enterprises that publish holiday calendars and the like -- these alternatives ideas are explored.

Once we have the right tools ...

It just takes one tech-savvy parent to do this 10 minute exercise and publish an online calendar that other busy parents can subscribe to.

bronco21016

0 replies

3h13m

2024-08-25 15:15:31 UTC

Agreed. If my child’s elementary school can manage, then most larger organized activities should be able to manage as well. I’m thinking along the lines of like organized rec athletic leagues, organized after school activities. Etc.

I do find that most of these do provide some method of getting a calendar link but we’re not quite to 100% yet.

wannabag

2 replies

11h37m

2024-08-25 06:51:34 UTC

I had the exact same use case two weeks ago but I had received a pdf file from school and was sitting at a cafe with only my phone.

I use ChatGPT, and while the article is correct to say that it will claim that it cannot generate .ics files directly in the code interpreter it is however very much capable of solving this particular problem. I did the following (all on my android phone):

1. Had it extract all the useful dates, times and comments from the pdf 2. Prompted it to generate the ics file formatted content as code output 3. Prompted it to use the code interpreter to put this content into a file and save it as a .ics extension

It complied through and through and I could download and open the file with the gcal app on my phone to import all appointments..

For completeness, the claim that the code interpreter cannot "generate" ics files is because the python environment in which it runs doesn't have a specific library for doing so. ICS files are just text files with a specific format, so definitely not out of reach.

weissi

0 replies

11h27m

2024-08-25 07:01:36 UTC

Interesting, maybe ChatGPT got more reluctant to spit out ics? Didn't have to try very hard back in May: https://chatgpt.com/share/0848349d-4b0b-40f8-9d24-e9c4ffc065...

gw5815

0 replies

11h29m

2024-08-25 06:59:44 UTC

Yeah, Claude.ai just gave me the text that I simply saved as a my.ics file. I wonder if I could reword the prompt to get chatgpt to do it.

1oooqooq

2 replies

7h45m

2024-08-25 10:43:50 UTC

HN devolved into a cheap product review. this forum now is like reddit being taken by nothing but fake accounts hyping one mobile phone or game console or movie. so tiring.

unfunco

1 replies

7h43m

2024-08-25 10:45:48 UTC

The account that posted the link is 12 years old, your account is 9 months old.

1oooqooq

0 replies

2h19m

2024-08-25 16:09:28 UTC

that was still a lame advertisement.

Also, you too should be rotating accounts every few months to prevent doxing. that's just common knowledge and online hygiene.

pplante

1 replies

12h19m

2024-08-25 06:09:53 UTC

I just did this with my kids various school calendars all of which are locked up in gross PDF files. Claude only made one tiny mistake which it was able to correct when I asked it why it made the mistake. It actually pointed out the confusing formatting used by the author.

I tried to do the same with ChatGPT a few weeks ago and was unable to get very far.

zacharyozer

0 replies

10h14m

2024-08-25 08:14:42 UTC

I did this last year with my kids various activities using ChatGPT and the results were just OK. I'm looking forward to trying this out with Claude this year!

jillesvangurp

1 replies

8h51m

2024-08-25 09:37:51 UTC

I've been doing similar things with gpt 4o to simplify data entry. Right now, it's kind of useful to use but people are not systematically doing much yet with LLMs.

I think that's going to change because it's so obviously useful to do. Any work involving entering data into some form is something that can and will be automated now. Especially if you have the information in some printed or printable way. Just point your camera at the thing and quickly review the information.

Sucking up information that is out there on signs, posters, etc. can be amazingly useful. People put a lot of effort and money into communicating a lot of information visually that often does not exist in easily accessible digital form.

visarga

0 replies

7h5m

2024-08-25 11:24:00 UTC

In my tests, if you operate over diverse forms or documents the information extraction rate is around 90%. It's especially hard for complex forms.

imarkphillips

1 replies

9h18m

2024-08-25 09:11:19 UTC

This is quite useful. Apartment renters can now create their own ical feeds for booking platforms.

cromka

0 replies

8h39m

2024-08-25 09:50:15 UTC

That’s been implemented for ages, hasn’t it? I had my imported Airbnb calendar always up to date.

franze

1 replies

7h58m

2024-08-25 10:30:38 UTC

I tried the exact same thing in ChatGPT -- it identified the dates in the images, but it was unable to create an .ics file. However, it did give me some Python code to create the ics file for me.

well i constantly let ChatGPT create my calendar entries, either via Google Calendar links or ics files or QR codes.

Don't know what went wrong there.

TeMPOraL

0 replies

6h22m

2024-08-25 12:06:49 UTC

Back when I last did it with GPT-4-Turbo, asking for an ICS file would get it to refuse on the grounds that it cannot make a file. Asking for encoding in iCal fomat, however, would make it output into a Markdown code block, which I'd then copy-paste into Notepad and save as ICS.

These days, you may also try asking it to "use Python" for the purpose, in which case it will make you the file (by writing code that writes the file and executing it), but it's likely to make more mistakes because it has to encode the result into a Python string or array.

durdn

1 replies

10h35m

2024-08-25 07:53:44 UTC

I've been using ChatGPT for similar visual recognition things. Recently I took a video of a car because I really liked its color. I upladed the video to ChatGPT and asked to extract which paint color I'd need to specify to a modding garage to put a foil on my car. ChatGPT really impressed me, extracted a screenshot, did a color analysis of the paint, found the palettes from paint vendors and found for me the exact paint code to tell the garage. I was speechless.

yas_hmaheshwari

0 replies

9h13m

2024-08-25 09:16:04 UTC

Wow! Simply wow!

I had a similar requirement a few days back, but I stopped at Google lens. TIL

aussieguy1234

1 replies

9h14m

2024-08-25 09:14:51 UTC

I used a screenshot of my internet banking transaction list with GPT4-o to get a list of transactions for my company tax return. Visually verified the results and they were correct.

Earlier, a product manager sent me a list of company id's to turn on a feature flag for as a screenshot. Rather than enter them all in manually, I used that to get GPT4-o to generate a comma separated list from the screenshot. Again, worked perfectly.

layer8

0 replies

4h58m

2024-08-25 13:30:54 UTC

OpenAI must be happy to get all that confidential information. ;)

yakorevivan

0 replies

10h50m

2024-08-25 07:38:26 UTC

I have myself been using chatgpt/sonnet-3.5 to clean data, extract data, heck, even generate sample sql insert statements for a given table schema... small things, but, when done repeatedly, saves soooo damn amount of time and frustrations. I have been using these tools to generate sooo many small scripts to automate or do things that otherwise I either wouldn't have done it, or, it would have taken significant amount of time. These tools are tooo good now to not use them.

Also, as someone else have already pointed out, these things work correctly 99.99% of the time. But that remaining 0.01%... that's what becomes major issue since it is so small of an error, that, unless you verify, you'll end up missing.

When using LLM's..."Trust, but verify".

wodenokoto

0 replies

6h49m

2024-08-25 11:39:28 UTC

And this is where Siri failed in my opinion. Even things like "Create an event that starts on Monday and continues for 90 days" is and always has been a crapshoot.

All the things you didn't want to do manually yourself, Siri couldn't do either. Not much of an assistant then.

solaarphunk

0 replies

11h31m

2024-08-25 06:57:31 UTC

Can also be used to generate Google calendar links!

shade

0 replies

3h35m

2024-08-25 14:53:57 UTC

I did something similar recently with my daughter's school calendar and ChatGPT with gpt-4o; her school has a ton of closures/teacher work days, I fed the PDF of the calendar in and asked it for all the dates that impacted 2nd grade, then asked it to create an icalendar file for them.

Oddly, it didn't want to create the actual file, but gave me a Python script for doing so; you just need to be sure to tell it what time zone you are working in and that you want it to be an all-day event, or you'll get suboptimal results.

pvsnp

0 replies

12h13m

2024-08-25 06:15:29 UTC

Funny, I just learned this this week too and had it generate an ics file from an excel file. Then I realized there might be bugs, so I asked it to write program instead.

pietz

0 replies

11h33m

2024-08-25 06:55:42 UTC

Really cool! It's also really good at creating all kinds of mermaid diagrams. I have a prompt to create learning resources based on YouTube transcripts including visualizations in mermaid. It's awesome.

ndr_

0 replies

6h24m

2024-08-25 12:05:20 UTC

I use this trick for book announcements on Amazon: some ambitious book releases never get released, so I am not a fan of buying before the release. With LLM support, I'll add the release date given by Amazon to my calendar - quickly. The file download feature of my Workbenches helps with that: https://ndurner.github.io/chatbots-update

millzlane

0 replies

4h21m

2024-08-25 14:07:59 UTC

This was my first use of ChatGPT maybe 1.0 I don't really remember but it was early last year sometime. I just started college for the first time and had a list of due dates for assignments.

I asked chatGPT "Can you make me an ics file from this list of dates"

It did it instantly and told me to save the file in a text editor as an .ics file. But it messed up and put the assignments and discuissions on the same week.

I ask "Do the same thing but move the discussion questions to the following tuesday, and put the assignment details in the details of the meeting.

It spit out a perfect .ics file.

low_tech_love

0 replies

11h46m

2024-08-25 06:43:15 UTC

Does that mean we’re getting the functional PDAs that were promised to us for decades by the sci-fi? That’s be amazing.

losvedir

0 replies

6h7m

2024-08-25 12:22:18 UTC

Hey, I did a similar thing last week with my preschooler's paper schedule they handed out, using GPT-4o.

But I asked it to output in a certain CSV format, which Google Calendar can import. Easier to review, fewer tokens.

jlarks32

0 replies

9h36m

2024-08-25 08:53:06 UTC

Yeah for what it's worth, ChatGPT with custom instructions can definitely do this! This was something that I did pretty early on: https://chatgpt.com/g/g-6mfNb9Hys-calendar-creator

hganesan

0 replies

8h15m

2024-08-25 10:14:20 UTC

AI for calendar invites is a real winner, I did a related thing here, but ical is even simpler!

https://hareeshganesan.com/2024/07/14/baby-calendar

hereme888

0 replies

3h40m

2024-08-25 14:49:19 UTC

Months ago I did this with ChatGPT.

It was unable to create the .ics directly at first, but at least it wrote the python script that created the .ics.

I first ask the LLM to write out the calendar events in a markdown table to double-check it read everything correctly, and then create the file.

A different prompting strategy later on, ai think with example .ics formatting, resulted in a correct .ics. I just had to *emphasize* the time zone.

greenthrow

0 replies

6h4m

2024-08-25 12:24:44 UTC

So let's see all the power and copyright violations that are required to build an LLM and it can be used to replace a ~20 line bash script. Awesome guys.

fmbb

0 replies

7h7m

2024-08-25 11:21:44 UTC

Maybe useful in some cases, but this calendar has a heading saying the lessons are "every other week" so you don't have to make 13 calendar entries. It takes a few seconds but you can make a calendar entry on September 13, tell your calendar app that it repeats every two weeks, and then you delete the six "vacation" slots. This is much faster.

flmaki

0 replies

8h38m

2024-08-25 09:50:35 UTC

hmm

dukeofdoom

0 replies

11h51m

2024-08-25 06:38:22 UTC

You can also ask AI to make a markdown check list. Useful for ToDo's you can paste into markdown editors.

dkga

0 replies

7h58m

2024-08-25 10:30:31 UTC

That’s a great step forward.

I use a combination of the Reminders and Calendar apps to mark entries of academic conferences I participate. One interesting thing for me personally would be to find a way to read the appointments in the Reminders app (presumably a sqlite3 db somewhere in iCloud) and create corresponding dates in the calendar, such that whenever I add a reminder to submit papers to a conference, this would automagically block that calendar slot for the dates of the conference such that I know not to schedule in person meetings with anyone in those dates.

cdrini

0 replies

11h36m

2024-08-25 06:52:31 UTC

This is phenomenal! And also highlights a design pattern which I'm hoping becomes even more popular because of AI: interoperable human readable file formats! Things like .ICS files. I'd love to see more websites/apps prioritise supporting these sorts of things, since it make it really easy to plug/play and use AI, but also just empowers more people to get creative/build extensions.

There's been a bit of a trend to more silo'd data, and a deemphasis on files entirely in the last ~decade in the name of "simplicity". I'd love to see that backtrack a bit.

bmarswalker

0 replies

1h28m

2024-08-25 17:00:23 UTC

I tried this in ChatGPT 3.5 times and it repeatedly said "I am a language model and can't do that". Good to know it can do it now.

blsummer

0 replies

7h59m

2024-08-25 10:29:25 UTC

I tried the prompts in Gemini 1.5 Pro, it works too.

Use `List all the dates` or LLM will be lazy by listing only 3 results

asimovfan

0 replies

8h30m

2024-08-25 09:58:48 UTC

Ive been doing this with chatgpt.. sometimes makes mistakes, i have to try a few times but better than by hand..

LarsDu88

0 replies

12h54m

2024-08-25 05:34:57 UTC

This is why I love HackerNews. Life hacks without the endless meme spam found in reddit.

CraigJPerry

0 replies

2h25m

2024-08-25 16:03:55 UTC

When I long press on any date in the grainy image listing the dates, iOS OCRs the text, interprets it as a date and because I long-pressed, it offers a popup context menu to create an appointment or a reminder, while showing a day view of my calendar for that date.