Can anyone comment on its coding ability?
Considering cancelling my subscription with OpenAI as I was previously using GPT-4 quite heavily as a multiplier for myself, guiding it and editing outputs as required, but GPT-4o feels significantly worse for this use case. It is certainly better in many other areas, but its coding ability is not great.
I tried to revert back to standard GPT-4 but it is now so slow to respond (higher load?) that it breaks my mental flow, so I'm exploring other options.
(I'm biased as I work for anthropic)
I broke my hand 2 months ago and Claude 3.5 sonnet has been writing all my code for me. It's been awesome
Since you work there, can you ask that they are more transparent with exactly what you are paying for with a "pro" subscription?
Currently you are apparently paying for:
- Everything in Free - Use Claude 3 Opus and Haiku - Higher usage limits versus Free - Create Projects to work with Claude around a set of docs, code, or files - Priority bandwidth and availability - Early access to new features
But what are the usage limits? Higher than free by how much?
Having an invisible limit on a paid product really rubs me the wrong way. Maybe some rate-limiting after a certain amount would be better than a hard cutoff, but even then I'd like to know what the limit is before I pay, not when I accidentally hit it in the middle of something important.
I ran into this rate limit recently when generating a logo in SVG. It's 45 messages per 5 hours (with some caveats). It's documented here [0].
[0] https://support.anthropic.com/en/articles/8324991-about-clau...
Seems to also be limited by tokens. It’s still quite obscure.
Even moreso:
Fine for the free tier of course, but not great for the paid version.
For what it's worth, I haven't run into any limitations while using the API side. Although I also haven't really had to use it so extensively that it would be a problem.
ChatGPT says the same about capacity for Pro I think.
With a fixed capacity, someone’s traffic is going to be rejected. I think it makes sense to have a variable rate limit depending on available capacity and demand.
I was literally about to type in my credit card number after seeing the new code editor window UX, but for the life of me I had no idea what I was paying for.
I’ve been paying for GPT since 3.5 debuted and I know what I’m getting - full, unlimited use of the best model. Period.
Anthropic needs to figure out what the hell they are selling.
ChatGPT Plus does NOT give you full unlimited use of the best model. There are still limits.
FWIW I regularly hit my ChatGPT Plus limits, and I think the “dynamic” limiting is regularly in place. I’ve only once hit my Claude Pro limit. I now use Claude more than ChatGPT.
From this page:
https://help.openai.com/en/articles/6950777-what-is-chatgpt-...
As of May 13th 2024, Plus users will be able to send 80 messages every 3 hours on GPT-4o. and 40 messages every 3 hours on GPT-4. The message cap for a user in a ChatGPT Team workspace is approximately twice that of ChatGPT Plus.
Please note that unused messages do not accumulate (i.e. if you wait 6 hours, you will not have 80 messages available to use for the next 3 hours on GPT-4).
In certain cases for Plus users, we may dynamically adjust the message limit based on available capacity in order to prioritize making GPT-4 accessible to the widest number of people.
It says 5x the free tier usage, but I agree we could be more explicit about what that is. Just made a ticket :)
+1. More transparency around usage limit would be very helpful. Message based seems arbitrary if different messages are different lengths. Maybe token based with a bar that tracks how much you’ve used. It would go down with use and up with time and capacity and you’d be able to track and pace usage better than a warning you’re on last ten. Or just let us pay more for more usage! Switch to API! The workbench does not have as nice affordances as the chat.
Mind explaining your workflow? I actually just broke my wrist today :(
Oh no! I use Apple voice control into Claude and then copy and paste code back and forth between my IDE and Claude mostly.
For longer writing,I really like going for a walk for 45 minutes and brain dumping on a topic, and transcribing it. Then I write a brief outline and have Claude fill it out into a document, explicitly only using language that I used in the transcript. Then edit via voice
Thank you!
Are you using the claude.ai interface or do you have something bespoke?
I don't know about the usability for you but if you use vscode already cursor is basically vscode but has access to various llms, including Claude. I started using it recently and really like it.
https://cursor.com
Why do you work? Can't you get a sick leave?
The first night it was released, it one-shotted merging two Python files together (one was an incomplete mess of speech to text, the other was an also-incomplete Claude api demo with a fake loop with forced conversation paths). It rewrote the fake loop and turned it into a voice interface that worked on the first try(then I asked it to add text to speech and it did). The second night it improves the code by adding threads (which I do know about but wouldn't have thought to use it there) and it worked better. Then Linux decided not to work properly (pulseaudio and ALSA issues), I asked it to help me migrate to pipewire and it gave me step by step directions, when I had issues all I had to do was post the error and it fixed it enough that it's all working again. It's been a pretty amazing week from my perspective. I finally had to take a break earlier this week but I still use Claude 3.5 for gardening (taking pictures and asking for help) and general advice.
Would you please describe your setup for coding? I'm interested in trying this out, but I need an ELI5. Thanks.
PS Anybody is welcome to jump in with a description of their setup.
It s prob not gonna work well: he fixes demo of the api of Claude with Claude. You didnt think of asking the AI, so you need more work on your own inference abilities first
There's not really anything to it. As I mostly use console.athropic.com workbench(the API pay-as-you-go), I haven't played around much with the Artifacts--I am probably getting a Claude Pro subscription later today. But yeah I just copy/pasted my 2 code files in the window, told it I wanted a conversational voice interface model, and in less time than it took to describe it, it wrote it and even saved a bunch of output tokens by saying where to copy large parts of the first file's boilerplate code. That had been a project I started around Easter and had kept putting off because it felt too hard for my somewhat limited time. But there's no moat or special setup to this stuff; anybody who can communicate in a text box can productively use this tool.
Sourcegraph Cody uses Claude
I currently tested the cursor IDE [1]. It’s vscode with nice ai integrations made with great ux in mind. They integrate with ChatGPT and Claude Sonnet/ Opus. For my experience (rust/typescript) the sonnet 3.5 model is giving me better results.
You can pass the current file, or multiple files or your whole project as context to the models.
[1] https://cursor.com
I just copy and paste snippets and ask gpt to fix it or translate it
Gardening? The vision stuff for plants has always been terrible guesswork. Is it actually good/useful now?
It's a start. We just had a sudden storm that severely damaged some tomato plant stalks, and it suggested splinting them, or if it is still having trouble in a couple of days to cut off the top and plant in another pot, which is advice I've received from others too. The first picture I sent was some early blight on the leaves, it didn't get it on the first try (suggested the stem had edema, which I suppose is possible), but when I mentioned I thought it had a fungal issue it correctly noted the brown leaves and suggested a copper fungicide, among other correct remedies. It also correctly discriminated between two types of cucumber (well one was Armenian melon) held by my friend who had just picked them from her garden. Claude said that because there were hands present that it suggested that the vegetables were home-grown and/or freshly picked, and before that noted that they seemed to be good quality, which was correct. I went into this particular aspect of using Claude thinking it would be useless, but that's not what I have found with Claude 3.5 Sonnet. I'll note the fungal blight was with Claude 3 Opus, as well. Hope that helps.
Coding ability is off the charts, to the degree that I think in one year software engineering will be considered a dying field.
What will the people who use LLMs to write code be called? What will the field be called?
Where will all the power and computers come from that will replace the tens of millions of software developers?
I feel reasonably confident that most of the people who are making these hyperbolic claims either don't understand programming at a deep enough level to make the claim or are trying to sell you something.
No, I've got 12+ years of experience in FAANG software engineering. I really believe the end is near
FWIW, very few people will actually care about this statement unless you state your name, work you've done, etc. A lot of users here don't pay heed to anonymous appeals to authority.
I'll throw my hat behind this horse because, honestly, if I was just learning to code, I would have probably quit by now due to the frequency of these types of comments. LLMs have certainly improved at an impressive rate, and they're fairly decent at producing 'Lego' blocks. However, when it comes to block building—the real meat and potatoes of programming—they're, at best, bad butchers. Building functional systems is hard, and it's simply something LLMs can't do now or perhaps ever. Or I just don't know how to prompt. 50/50.
Yeah good idea to put everyone out of their job and give it to 2 companies. Then what happens when those companies decide to rise prices by x1000 (twitter api ?). Or what happens when your system has a bug and does not work and the AI says it’s all good ? Who will introspect your code to find what’s wrong ?
Here we go again. If you seriously think that then you should probably consider changing careers.
and who would actually verify the output of llm works as expected? Unit tests? Who would verify unit tests are testing what needs to be tested instead of just a branch/output coverage? I've used these tools personally and I've used gh copilot at company I work in and at no point in time I was under impression llm's can do all of my work. Each step needs verification if it actually gives what you need, some steps do need good description by someone who knows what the system is doing to perform the changes where these are needed. Continuous extension of features without good design patterns in place, usually checked by devs, can also create a lot of problems. So what makes you think in one year swe will be considered a dying field? What I see is the need for better devs that can see beyond boilerplate that can be empowered by llm's, but it doesn't mean a dying field, it means rising the standards to break into the field.
You can use OpenAI's developer platform https://platform.openai.com/playground/chat
I haven't used the normal website in a year, it is a way better experience:
- It allows tweaking of the model / temperature.
- You pay only per token used
- It does not use your conversations to train their models
- You can edit both your and the assistant message in the conversation or delete it.
- Allows for easy message replay if the output is not great
Be mindful that you are paying per token for every message in the thread, so if the thread becomes long it gets expensive, but still far more value than subscribing.
The world is changing too fast for knowledge that is nine months old.
------
USER
What is the maximum number of tokens supported by gpt-4o?
ASSISTANT
As of my most recent update in October 2023, GPT-4 does not have an officially designated variant called "gpt-4o". However, OpenAI's GPT-4 model supports different maximum token limits depending on its variant. There are two primary context lengths for GPT-4 models:
1. *GPT-4 with 8K context length*: This variant supports up to 8,000 tokens. 2. *GPT-4 with 32K context length*: This variant supports up to 32,000 tokens.
These limits encompass both input and output tokens in a given interaction. If you intended to refer to a specific or newly introduced variant, such as "gpt-4o," it is essential to look up the most current resources or official announcements from OpenAI, as models and their capabilities can evolve.
I don't know what you're trying to point out here. The models aren't trained on themselves.
ChatGPT and Gemini incorporate web search into their results, so they're not limited to stale information.
If you think you can save money by paying per token in the playground, you should know that you're not getting the same results.
For example, if I ask Gemini the identical question, it gives a reasonable answer, even though Gemini was released before ChatGPT was released. It can look up the current OpenAI documentation and summarize it:
USER
What is the maximum number of tokens supported by gpt-4o?
ASSISTANT
GPT-4o has a context window of 128,000 tokens. This means it can process up to 128,000 tokens between your input and its output combined. However, the maximum number of output tokens (the AI's response) is limited to 4,096 tokens.
I've been playing around with it this week and its coding ability is insane (for a LLM). I've given it some pretty sloppy descriptions about things I want to do and it's managed to figure out exactly how to do it on the first or second try, I'm talking things like building animations in React that cannot be described with text very well. Big pain point is copy and pasting things back and forth to have it edit them. If it was integrated and could see my local files, that would be killer. I know there are various companies working on that, but the jetbrains AI integration for example is garbage compared to the results I get by manually asking claude.
I wasn't worried about how this would affect our industry a few months ago, but this has me reconsidering. It's like a junior engineer that can do most tasks in seconds for a couple of cents.
What worries me is you need that time in the dirt to get a feel for coding as a craft. And at least for me that aspect of knowing the craft helps get my thinking in tune with problem solving in a very productive way.
Coding can be similar to playing an instrument, if you have mastery, it can help you be more expressive with the ideas you already have and lead you to new ones.
Whereas if we take away the craft of coding I think you end up with the type of code academic labs produce: something that purely starts on a “drawing board”, is given to the grad student/intern/LLM to make work, and while it will prove the concept it won’t scale into long term, as the intern doesn’t know when to spend an extra 30 minutes in a function so that it may be more flexible down the road.
I see this sentiment a lot regarding gen AI. An I get it, we need to learn our tools. But this seems like it's saying the only way to learn problem solving is the way you learned it. That's just not true. Everyone learns problem solving differently and the emerging field of gen AI will figure out it's own way. It's a different way of thinking. I see my niece using ChatGPT to make projects I wouldn't have even imagined taking up at her age. Her games work. Who am I to say she isn't learning problem solving? In hindi we say "pratyaksh ko praman ki kya avashyakta" (what's right in front of you doesn't require proof).
I’d say it’s far more useful than a junior engineer. A junior might know one or two areas well, but Claude 3.5 knows literally every steep learning curve across all engineering professions. Claude may not yet be able to orchestrate a large project, but it can advise on structuring using knowledge drawn from best practices across every conceivable discipline. We may not get “AGI”; we might get something far more useful instead.
I switched to Claude when Opus came out and have never looked back.
Anthropic tech is vastly better at code.
Also, Anthropic Projects is HUGE.
You get to create a set of default instructions that all project prompts to adhere to, like: Use Linq whenever possible. Consider Fluent capabilities when it will improve developer experience.
Then you can add all of your existing code (though as has been pointed out, connecting to a repo would be ideal).
And then you can have multiple conversations within that project, so you can separate your lines of thinking.
Alas, the same problems exist. Sonnet 3.5 is faster and a bit smarter, but it still cannot follow a longer conversation.
This is the problem that LLMs have not yet solved.
I've been using it in https://double.bot (vscode coding copilot I'm hacking on) and it's been pretty great so far. We have support for all the top models (gpt4o, deep seek, opus, etc) so I regularly like to switch between them especially if I see a challenging problem. So far, sonnet 3.5 has the best quality and is the one I use by default.
How does double.bot compare against cursor.sh? I've been using the latter for a while after switching from copilot++ and am pleased.
I was trying to get it to create an LLM from scratch using tensorflow.js with training and prediction, it got about 95% of the way, impressive.
But gpt-4o actually got a working solution in a couple prompts -> https://gist.github.com/thomasdavis/fadbca01605fb3cb64911077...
Though the new artefacts is really damn handy, you can describe the most detailed projects, and it does a really great job of what you asked for, and I found it delivered beyond what I wanted.
I am now paying for both -.-
- it's hard to rank which codes better, but I think claude has better abstractions
- sometimes I paste the output of the other, and continue solving on the other
Yo! Long time no see haha <3
Would love to see your workflow sometime, my experimentations have been small tbh but I was quite impressed with Claude.
Funnily, I think (modulo costs) feeding outputs of one model into another once you hit a brick wall is quite powerful, but feels clunky to do at the moment
Interesting. I felt GPT4 was virtually useless and GPT3.5 was the best, then came along GPT-4o and it instantly became the only version worth using.
I find GPT-4o to be extremely powerful and useful.
What don't you like about it?
To clarify, I think it's extremely powerful and useful too. It's just that I get more misses than hits when using it as a rubber duck and coding assistant.
I saw the initial 3.5 Sonnet release a few days ago and decided to pull the trigger with canceling OpenAI and moving over, and that was a great decision. I've been having to do more and more intricate nudging and leading towards the correct answer or approach when using ChatGPT of late.
My experience with Claude is very positive when it comes to programming and planning out infrastructure. My only gripe so far has been some ethical constraints that didn't exist for ChatGPT, and those are a big one for me since I don't need Anthropic employees to act as my moral compass. For a specific example, asking about securing code through making decompiling or reading harder is a no-no for Claude, but a-ok for GPT.
I use it mostly for coding help, not Python or JS but a mix of C and C++ for my own framework.
It is surprisingly good and helpful. I am still exploring the limits.
Writing doc comments or test cases is much faster and more fun with this kind of tool, but you still have to double check everything as it inevitably make mistakes, often small and not obvious…
I switched from a GitHub Copilot subscription to Cody for VS Code, to give the competition a chance.
Cody's chat UI seems pretty good for making comparisons like this. You can set it to whichever LLM you want, including GPT-4o and Claude 3.5 Sonnet.
I haven't used Claude enough to do to a detailed comparison, but GPT4o and Claude 3.5 seem roughly similar for my coding questions.
I find it’s coding ability superior to GPT4 and 4o
Hit or miss, with significantly more hits than misses. I was able to build a useful cybersecurity service in about 6 hours leveraging Claude to write most of the code. It has certainly made me more productive.
I haven’t had a great experience using GPT-4o for coding; it will sometimes make a mistake, fix the mistake introducing another mistake, then fix that second mistake repeating the first. Or other weird things like that. I love the text to speech, and image generation sometimes comes in handy, but ultimately I prefer Sonnet 3.5 for coding and have recently switched to that.
If OpenAI comes out with a stronger model I’ll prob switch back, I don’t have much brand loyalty in this regard. I could see some features making usage more sticky (memory, projects, etc).
It's slightly higher ranked than 4o on the lmsys leaderboard, Coding section. But the difference is not big.
What languages / frameworks do you focus on?
I’ve also had worse results with 4o than 4. It’s faster but it vomits output.
I can’t tell if I need to go tweak memory or if it’s just like this.