A couple big/strong open models that have just been released the past few days:
* Qwen 72B (and 1.8B) - 32K context, trained on 3T tokens, <100M MAU commercial license, strong benchmark performance: https://twitter.com/huybery/status/1730127387109781932
* DeepSeek LLM 67B - 4K context, 2T tokens, Apache 2.0 license, strong on code (although DeepSeek Code 33B it benches better) https://twitter.com/deepseek_ai/status/1729881611234431456
Also recently released: Yi 34B (with a 100B rumored soon), XVERSE-65B, Aquila2-70B, and Yuan 2.0-102B, interestingly, all coming out of China.
Personally, I'm also looking forward to the larger Mistral releasing soon as mistral-7b-v0.1 was already incredibly strong for its size.
Since it's not allowed to use ChatGPT in China, there is a huge opportunity to build a local LLM.
Anyone know what is the reason why both OpenAI and Anthropic proactively banned users from China from using their products...
Source for this? I know China's government firewall blocks ChatGPT (for obvious reasons) but I wasn't aware that OpenAI was blocking them in return.
Chinese IP are not allowed to use ChatGPT. Chinese credit card is not allowed for OpenAI API.
Source: my own experience.
What puzzles me most is the second restriction. My credit card is accepted by AWS, Google, and many other services. It is also accepted by many services which use Stripe to process payments.
But OpenAI refuses to take my money.
I don't understand, if ChatGPT is blocked by the firewall, how do you know that ChatGPT is blocking IPs in return? Are there chinese IP ranges that are not affected by censorship that a citizen can use?
When a website is blocked by the firewall, it doesn’t load.
When a website blocks Chinese users, the website loads but you cannot create an account.
Yes, the firewall does not block everything, otherwise it would be the same as turning off the internet! There are websites that work.
Okay but the point is that ChatGPT is blocked by the firewall.
EDIT: I read the comment below about Hong Kong, but I can't reply because I'm typing too fast by HN standards, so I'm writing it here and yolo: "I'm from Italy and I remember when ChatGPT was blocked here after the Garante della Privacy complaint, of course the site wasn't blocked by Italy but OpenAI complies with local obligations, so maybe it could be a reason about the block. API were also not blocked in Italy."
EDIT 2: if the website is not actually blocked (the websites that check if a website is reachable by mainland China lied to me) then I guess they are just complying to local regulations so that the entire website does not get blocked.
In so far as Hong Kong IPs are "Chinese IPs", we can access OpenAI's website, but their signup and login pages blocks Hong Kong phone numbers, credit cards and IP addresses.
Curiously, the OpenAI API endpoints works flawlessly with Hong Kong IP addresses as long as you have a working API key.
OpenAI API is not blocked. You can set up your own front-end like Chatbot UI.
it's not blocked by the firewall. i'm in china and i can load openai's website and chatgpt just fine. openai just blocks me from accessing chatgpt or signing up for an account unless i use a VPN and US based phone number for signup
as in, if i open chat.openai.com in my browser without a VPN, from behind the firewall, i get an openai error message that says "Unable to load site" with the openai logo on screen
if the firewall blocks something the page just doesn't load at all and the connection times out
ChatGPT was not blocked by the GFW when it first released for a few weeks (if not months, I don't remember), but at that time OpenAI already blocked China.
The geo check only happened once during login at that time, with a very clear message that it's "not available in your region". Once you are logged in with a proxy you can turn off your proxy/VPN/whatever and use ChatGPT just fine.
Have you tried with a prepaid card? Some even allow you to fund it with crypto.
Yeah that’s how most users in China access OpenAI. But it’s inconvenient for the majority of people nevertheless.
> Chinese credit card is not allowed for OpenAI API.
A lot of online services don't accept Chinese credit cards, hosting providers for instance, so I don't think that is specific to OpenAI. The reason usually given for this is excessive chargebacks of (in the case of hosting) TOS violations like sending junk mail (followed by a charge-back when this is blocked). It sounds a like collective punishment a little: while I don't doubt that there are a lot of problem users coming from China, with such a large population that doesn't indicate that any majority of users from the region are a problem. I can see the commercial PoV though: if the majority of charge-back issues and related problems come from a particular region and you get very few genuine costumers from there¹ then blocking the area is a net gain despite potentially losing customers.
----
[1] due to preferring local variants (for reasons of just wanting to support local, due to local resources having lower latency, due to your service being blocked by something like the GFW, local services being in their language, any/all the above and more)
It's definitely not a commercial thing but political.
I'm located in Hong Kong and using Hong Kong credit cards have never been a problem with online merchants. I don't think Hong Kong credit cards are particularly bad with chargebacks or whatever. OpenAI has explicitly blocked Hong Kong (and China). Hong Kong and China, together with other "US adversaries" like Iran, N. Korea, etc are not on OpenAI's supported countries list.
If you have been paying attention, you'll know that US policy makers are worried that Chinese access to AI technology will pose a security risk to the US. This is just one instance of these AI technology restrictions. Ineffectual of course given the many ways to workaround them, but it is what it is.
Perhaps they are unwilling to operate in a territory where they would be required to disclose every user's chat history to the government, which has potentially severe implications for certain groups of users and also for OpenAI's competitive interests.
I live in China. You can't use it here easily. Even if you use a VPN you still need a non-Chinese phone number.
I would love to know how they technically they can stop you once you run VPN. Do you have any idea on that?
OpenAI requires a working phone number to sign up, and a credit card to use various features.
So they just block the phone numbers (which has a country code), and credit cards (which owner/issuer's country info is available).
Not sure why this seems to be such a surprise to everyone here...
i know how: you need a verified phone number to open an account, and open ai does not accept chinese phone numbers or known IP phone numbers like google voice.
they also block a lot of data center IP addresses, so if you're trying to access chatgpt from a VPN running on blacklisted datacenter IP range (a lot of VPN services or common cloud providers that people use to set up their own private VPNs are blacklisted), then it tells you it can't access the site and "If you are using a VPN, try turning it off."
OpenAI does not allow users from China, including Hong Kong.
Hong Kong generally does not have a Great Firewall, so the only thing preventing Hong Kong users from using ChatGPT is Open AI's policy. They don't allow registration from Hong Kong phone numbers, from Hong Kong credit cards, etc.
I'd say it's been pretty deliberate.
Reason? Presumably in alignment with US government policies of trying to slow down China's development in AI, alongside with the chips bans etc etc.
Sounds plausible - this is in line with the modern trend to posture by sanctioninig innocent people.
Of course, the only demographic these restrictions can affect are casuals. Even I know how to cirumvent this; thinking that this could hinder a government agent - who surely have access to all the necessary infrastructure by default - is simply mental.
Now former board member was a policy hawk. One of big beliefs is that china is at no risk of keeping up with US companies, due to them not having the data.
I wouldn't be surprised if OpenAI blocking China is a result of them trying to prevent them from generating synthetic training sets.
My theory was that they operate at loss and they don't want increase that loss by offering it to adversaries.
Probably the realization that this is an arms race of sorts.
Probably because of the cost of legal compliance. Various AI providers also banned Europe because until they were ready for GDPR compliance. China has even stricter rules w.r.t. privacy and data control: a lot of data must stay inside China while allowing authorities access. Typically implementing this properly requires either a local physical presence or a local partner. This is why many apps/services have a completely segregated China offering. AWS's China region is completely sealed off from the rest of AWS, and is offered through a local partner. Similar story with Azure's China region.
Baidu has a Chatgpt clone that I use regularly.
https://yiyan.baidu.com
I imagine it is good enough for most people.
Given the subdomain name, I presume it uses the Yi-34B model?
I have no idea, but yiyan is short for wenxinyiyan(文心一言), which roughly translates to character-heart-one-(speech/word). Maybe someone who is Chinese could translate it better. So I don't think the name has anything to do with the model.
I do wonder what their backend is. They have the same 3.5/4 version numbering scheme that ChatGPT uses, which could be just marketing (and probably is), but I wonder.
EDIT: fixed my translation
Their backend originates from Baidu ERNIE: http://research.baidu.com/Blog/index-view?id=160
“A single word from the heart”
AFAIK, model behind yiyan is Baidu's ERNIE. Yi-34B (and Yi model family) comes from another startup created by Kai-fu Lee earlier this year: 01.ai.
Given I'd get through registration can I talk with it in English?
Yes.
I'm curious in knowing why you've opted for this model over ChatGPT-3.5. Is it because it performs better in Chinese?
Chatgpt is blocked in China including Hong Kong, so my school computer doesn't have access to it. I also am a very very casual AI user
Which is why there are 100+ LLMs in China ... the so-called 百模大战, battle of the 100 models.
Let 100+ LLMs bloom
I was thinking more of a Thunderdome setup.
when is the new mistral coming out and at what size?
I'm hoping that they make it 13B, which is the size I can run locally in 4-bit and still get reasonable performance
What kind of system do you need for that?
If your GPU has ~16GB of vram, you can run a 13B model in "Q4_K_M.gguf" format and it'll be fast. Maybe even ~12GB.
It's also possible to run on CPU from system RAM, to split the workload across GPU and CPU, or even from a memory-mapped file on disk. Some people have posted benchmarks online [1] and naturally, the faster your RAM and CPU the better.
My personal experience is running from CPU/system ram is painfully slow. But that's partly because I only experimented with models that were too big to fit on my GPU, so part of the slowness is due to their large size.
[1] https://www.reddit.com/r/LocalLLaMA/comments/14ilo0t/extensi...
I get 10 tokens/second on a 4-bit 13B model with 8GB VRAM offloading as much as possible to the GPU. At this speed, I cannot read the LLM output as fast as it generates, so I consider it to be sufficient.
Which video card?
RTX 3070 Max-Q (laptop)
I can fit 13B Q4 K M models on a 12GB RTX 3060. It OOMs when the context window goes above 3k. I get 25 tok/s.
Mine is a laptop with i7-11800h CPU + RTX 3070 Max-Q 8GB VRAM + 64GB RAM (though you can get probably get away with 16GB RAM). I bought this system for work and causal gaming, and was happy when I found out that GPU also enabled me to run LLMs locally at good performance. This laptop costed me ~= $ 1600, which was a bargain considering how much value I get out of it. If you are not on a budget, I highly recommend getting one of the high end laptops that have RTX 4090 and 16GB VRAM.
With my system, Llama.cpp can run Mistral 7B 8-bit quantized by offloading 32 layers to the GPU (35 total) at about 25-30 tokens/second, or 6-bit quantized by offloading all layers to the GPU at ~ 35 tokens/second.
I've tested a few 13B 4-bit models such as Codellama and got about 10 tokens/second by offloading 37 layers to the GPU. Got me about 10-15 tokens/second.
i have lenovo legion with 3070 8GB and was wondering should i use that instead of my macbook m1pro.
The main focus of llama.cpp has been Apple silicon, so I suspect M1 would be more efficient. The author recently published some benchmarks: https://github.com/ggerganov/llama.cpp/discussions/4167
On my Mac M1 Max 32GB of ram, Vicuna 13b (GGUF model) at 4bit consumes around 8GB of ram in Oobabooga.
Tried turning on mlock and upping thread count to 6, but it's still rather slow at around 3 tokens / sec.
a CPU would work fine for the 7B model, and if you have 32GB RAM and a CPU with a lot of core you can run a 13B model as well while it will be quite slow. If you dont care about speed, it's definitely one of the cheapest ways to run LLMs.
Q5_M on Mistral 7B has good accuracy and performs decently on a CPU too
I've tried out DeepSeek on deepseek.com and it refuses conversations about several topics censored in China (Tiananmen, Xi Jinping as Winnieh-the-Pooh).
Has anyone tried if this also happens when self-hosting the weights?
I just tried the GGUF 7b model of Deepseek and it let me ask some questions about some pretty sensitive topics - Uighur Muslims, Tank man, etc.
https://huggingface.co/TheBloke/deepseek-llm-7B-chat-GGUF
When I try out the topics you suggest at the huggingface endpoint you link, the answer is either my question translated into Chinese, or no answer when I prompt the model in Chinese:
<User>: 历史上的“天安门广场的坦克人”有什么故事? <Assistant>:
Interesting - I can't speak to the Huggingface endpoint. I downloaded the 4-bit GGUF model locally and ran it through Oobabooga with instruct-chat template - I expressed my questions in English.
I haven't tried that base model yet but I have tried with the coder model before and experienced similar things. A lot of refusals to write code if the model thought that it was unethical or could be used unethically. Like asking it to write code to download images from an image gallery website would work or not depending on what site it thought it was going to retrieve from.
There is also goilath 120b
most AI papers are from Chinese people (either from mainland China or from Chinese ancestry living in other countries). They have a huge pool of brains working on this.
What is a good place to keep up with new LLM model releases?