The article does a decent job of explaining of how Chinese characters work, but it falls short of explaining why.
The reason why Chinese continues to use a logographic writing system is due to both tradition and practicality. English has grossly grouped together Chinese as one unified language, when in actuality it is not. In fact, many "dialects" are mutually unintelligible--one speaker cannot understand another speaker. If all of China switched to using a phoenetic writing system, everyone would write everything differently. It'd be very difficult--impossible at some points--to read and write materials from other "dialects". However, with a logographic approach, everyone can understand that the character 工 means "work" even if I pronounce it like [wirk] and someone else pronounces it like [wak], for example. It's one of the reasons why subtitles are so prevalent in Chinese media. Obviously, this problem can be eliminated by eliminating individual "dialects", which is sort of promoted through the adoption of Mandarin Chinese. Many Chinese media is also dubbed in the standard dialect so that actors with regional dialects can be understood.
As for Chinese characters in other languages, Japanese becomes a lot easier to read with the addition of Chinese characters. Kanji allows sentences to be shorter, less ambiguious, and easier to parse. Unlike Chinese, each character is not just a single syllable, and there are many homonyms in Japanese because there's a smaller set of sounds.
https://history.stackexchange.com/questions/46658/did-china-...
As far as I know it's not English or any Western entity that has grouped the Chinese languages together as one, but the Chinese government, for political reasons. Western linguists recognize the variants of "Chinese" as different languages.
There's a saying in linguistics that "a language is a dialect with an army and a navy" [0]. Linguists recognize that where you draw boundaries between languages is essentially arbitrary, even more so than boundaries between biological species. It tends to be that a language is a language if and only if some sovereign state declares it to be so, otherwise it's a dialect.
This is also how you get Portuguese as a distinct language from Spanish even though the two are more mutually intelligible than Scots (a "dialect") and American English are. Portugal has the sovereign government to back up its claim to having its own language where Scotland does not.
[0] https://en.wikipedia.org/wiki/A_language_is_a_dialect_with_a...
I like the expression, but the example of Portuguese/Spanish is absurd IMO. As a Portuguese speaker, the amount of effort required to communicate with Spanish speakers is very, very high, to the point where I avoid trying at all costs. Here in Texas, it is almost always more effective for my family to communicate with Spanish speakers using very broken English and hand gestures on both sides than trying to get any Portuguese-Spanish mutual intelligibility to work.
I speak fluent second-language Spanish and have had next to no difficulty communicating with native Portuguese speakers from Mozambique, Cabo Verde, and Portugal. What variety of Portuguese do you speak?
I'll concede that it's possible that I actually have an advantage as a second-language speaker, since my Spanish is probably slower than a native's and when I'm listening I'm already doing more work than a native is accustomed to.
Brazilian Portuguese has some phonological differences that I think confuse people in both directions more than other varieties of Portuguese, like the /tʃ/ and /dʒ/ for <t> and <d> in various contexts. For example a Spanish speaker would probably have a hard time recognizing that Brazilian Portuguese /dʒi'abu/ is cognate with Spanish <diablo>. A Brazilian Portuguese speaker who was less familiar with Spanish might similarly have a hard time recognizing /ˈdjablo/ as cognate with Portuguese <diabo> 'devil'.
Or Brazilian /'sedʒi/ is cognate with Spanish <sed> 'thirst'. A Spanish speaker will have to know to effectively ignore the /ʒi/ in order to recognize the word easily!
Maybe more extreme, Brazilian /'hedʒi/ (written <rede>) is cognate with Spanish <red> 'net, network'.
You might also be familiar with a greater variety of Spanish pronunciations as a non-native speaker... if you know Argentine /'ʃubja/ and /'ʃabe/, then you have a better chance to recognize Brazilian /'ʃuvɐ/ and /'ʃavi/ ('rain' and 'key', respectively).
Yeah, I suspect that OP speaks Brazilian Portuguese but I didn't want to assume.
I should have specified in my original post, but I only meant that Portugal Portuguese (and at least a few of the African varieties that are still very close to Portugal's) are mutually intelligible with Spanish. Which actually just further illustrates the complexity of categorizing speech into discrete languages...
Interestingly, it makes Brazilian much easier to understand to (many) Italians and Romanians.
I am Italian and whenever I go to Spain I usually don't really need to speak English because the languages are close enough that you can go by by just knowing a handful of basic words (and the Spaniards I meet usually prefer it that way). This is both a blessing and a curse; all Italians I met living in Spain (and viceversa, all Spanish-speakers I met in Italy) tend to have a hard time learning the other language "properly" because the threshold for being understood is extremely low. If, perchance, someone speaks an Italian with Spanish grammar, people will still understand you perfectly.
Given that Castillan and Portuguese are even closer (both Western Romance, part of a linguistic continuum, ...) I find it very hard to believe that honestly. I am only familiar with the European variants, thought. Maybe the issues you faced are due to how the Latin American variants have diverged significantly over the years?
The big difference between ES and PT is the accent/pronunciation of letters and matching words. Secondarily, is the differing vocabulary. But a lot of these are still understood as archaic/uncommon alternative words.
(see shoen's post below.)
So, if you learn the accent of the other language, all of a sudden a large portion of the language is unlocked. This happened to me, almost like a light switch.
I don't have a lot of experience with Italian but it seems like the pronunciation is closer to Spanish.
Yeah the phonetics are very close. Castillan has more fricative sounds like [ð], [θ], [x] and [β] and no open vowels, but that's it.
But the comparison was to Scots, which is (sometimes, not universally) considered a dialect of English rather than a separate language, but is hard for standard English speakers to understand. It's not just English with a Scottish accent. I have no idea how Portuguese feels to Spanish speakers or vice versa, but here's an example of modern Scots from Wikipedia. I'm curious how it compares.
(Edit: And here's a spoken example - https://youtu.be/am1MCJsEGYA)
As a native speaker of English and having conversational ability in Spanish I would describe both Scots and Portuguese as separate languages. Portuguese feels like it has as much in common with Spanish as Italian or French to me, and I can't remotely carry on a conversation in Portuguese. (Or Scots really, though with the somewhat mutual intelligibility I can speak English or Spanish and maybe that's workable, but I'm definitely not going to understand the Portuguese.)
Which dialect of Portuguese are you referring to?
I'm a fluent second-language Spanish speaker and have had a lot of success communicating with native Portuguese speakers, but only Portugal Portuguese and various African Portugueses. I can't understand Brazilian Portuguese at all.
a friend of mine who grew up in argentina went for an interview at a university in brasil.
they reported that on the first day, portugese was just gibberish. on the second day they realized they could read a solid chunk of a portugese newspaper. on the third day they felt they were beginning to understand what people were saying to them.
obviously, YMMV (and does).
This really what "mutually intelligible" means, I'd say: that the languages are so close you can sort of work it out without explicit instruction. You still need some experience with the other language - and quite often there'll be a geographical and cultural proximity that means almost all speakers have that.
I grew up in a Scandinavian country and visited the others a lot when I was young, and I find I understand most of what I hear in the other languages, but it's quite common for my peers who don't have that experience to understand nothing.
I may be way off here, and happy to be corrected. My experience is Texas-Spanish is difficult to use in Spain, and would guess the inverse is true. Which I would deduce making Portuguese-Spanish a non-starter in the state.
I know a very limited amount from having grown up and played soccer in the "Mexican" rec leagues in Tx. While traveling to Spain, English is perfectly fine in cities. But days trips to smaller towns/villages they had more trouble understanding my attempts to communicate with the basic texas-spanish I had picked up, than they did the hand gestures and single english word here and there. I understood next to nothing in Portugal (it might as well had been Dutch to my ears; I had no idea until now that they are kinda similar in the way Spanish/Italian is). Of course, this could be that I'm simply horrible at Spanish. But have heard Texas-Spanish is even weird for Baja-California-Spanish speakers.
Spain Spanish and <pick-latam-country> Spanish are the same language with very different vocabulary.
(Well, not quite, because Spain Spanish has loismo and what not that <pick-latam-country> Spanish almost certainly does not, and there's other variations as well, like Argentine Spanish having very different imperatives, Argentine and Colombian voceo vs. tuteo everywhere else, etc.)
Probably not as absurd as you think. I reckon if you dropped an American in a random town in Scotland (or even a northern English town, for that matter), they would also need to use very broken English and hand gestures to communicate as well. Glaswegian or Geordie is near incomprehensible to RP speaking Brits, yet alone to an American who's only exposure to Scottish is Mel Gibson as William Wallace.
Given the context, you'd probably have an easier time talking Portuguese with someone from Vigo than, say, Juarez, but even then, that might depend on you not having a Brazilian dialect...
After all, Spanish & Brazilian speakers in the new world have their own dialects (not languages).
I don't thinm the intention was to paint Spanish and Portuguese are incredibly similar, only to say that they're more similar than Scots and English which are still considered the same language.
No, China literally has multiple dialect continuums[0]. It is the case of "a language is..." saying but the "other" way around.
0: https://en.wikipedia.org/wiki/Dialect_continuum#Chinese
I don't think we actually disagree.
I'm saying that because China has one single army and navy and at the same time a huge narrative wrapped up in the idea that it's all one China, those "dialects" don't get to be languages because the army and the navy say otherwise.
"A language is a dialect with an army and a navy" implies its corollary, which is that "a dialect is a language without an army or a navy".
(In fact, that's likely what was originally meant by the person who coined the phrase—he was a specialist in Yiddish linguistics writing during WW2.)
I think our disagreement is in whether there can be fault lines of mutual intelligibility bewteen dialects. If liguists and Chinese languages speakers are to be believed(no particular reasons not to), there are in China.
I don't disagree that there can be fault lines of mutual intelligibility between dialects. I'm not even commenting on how we define dialects at all—all I'm saying is that the distinction between a dialect and a language is an arbitrary one that is made for political reasons more than linguistic ones, and that's something that even the sources for that Wikipedia page agree with me on. For example (emphasis added) [0]:
In the absence of nation states I suspect that we'd mostly talk about dialects and dialect continuums. Discrete languages are only really relevant as a concept because of the non-linguistic ties that bind a nation together.
[0] https://books.google.com/books?id=lCgnrA7Ke3QC&pg=PA1&source...
It's perhaps how it is seen and used in English. But in China Chinese languages tend to be referred as such with Mandarin referred to as the "common language", etc though the character used has an oral connotation.
I could believe this is true if you’re only comparing languages that have the same root or parent language such as Latin languages, etc.
But I don’t see how anyone could describe the difference between Chinese and English as arbitrary or as two dialects even if the apocalyptic collapse of all major nations which spoke such languages occurred tomorrow.
My understanding is that theres something called lexical similarity and if it’s over a certain percentage it’s a dialect.
How does that work with e.g. French Creole which has French, Carribean, and English in it. What if this feels like a dialect but the percentage of any given parent is less than your cut-off percentage? You make the rule sound very easy to interpret but I think the general principle is that language classification is nuanced and the irony of the "navy and army" language requirement are it kind of has nothing to do with the actual language spoken.
The "navy and army" argument is usually employed when the question arises whether something is a dialect or a separate language. IMHO such Creoles should also be classified as languages, with the caveat of dialect continuums.
Creole is a weird case IMHO because English itself is pretty much a creole between Old English, Norman French, Norse, and some Gaelic and Pictish languages.
What's arbitrary isn't that languages are different from each other, what's arbitrary is where you draw the line. When you take two languages on opposite sides of the world they're unquestionably different languages. But as you transition slowly from one language to another, how many languages you spin off and which dialects fall under which languages is arbitrary.
Even if you tried to use a method like this to draw lines, it requires you to pick a "center" dialect that you compare all other prospective dialects/languages to. Which dialect you pick as your "center" dialect will determine which dialects end up under your umbrella language, and picking a different center would yield very different results. Which language you pick as your center is inherently a political question, one which would be settled by a sovereign state.
And aside from that problem, lexical similarity is not used to define languages. All it measures is how similar word sets are, and language variations are way more complicated than just vocabulary. No serious linguist would ever try to use a single metric like that to draw lines between languages (and again, most serious linguists aren't actually interested in drawing general-purpose lines because they understand that the lines are not real).
Reminds me of when I went on business trip of several weeks to Sweden from California. The Swedes spoke English reasonably well. I then went to Scotland for vacation and had a much harder time understanding their dialect than I did with the Swedes.
For me as a non-native speaker of English and German, this is quite normal - I mostly have an easier time understanding other non-native speakers since they usually use "international" dialect/pidgin, speak slower and usually articulate more distinctly.
A classic example of this is Hindi and Urdu -- the two languages are largely mutually intelligible when spoken, which is the main criterion for being the same language, but are written with different scripts and of course used in separate and adversarial states.
They're written the same, and have basically the same grammar. The characters have hugely different readings, but people can communicate easily in writing. You can call that different languages, but that's certainly a different kind of different than people would expect when you say different.
If there were a version of English where all of the letters designated completely different sounds, but was written exactly the same way, would it be a different language? Would people who said that they were dialects of the same language have to be saying this for political reasons?
edit: I mean, Chinese is how you would expect it to be. How would two people living extremely far apart in China even know how each other would pronounce a particular character? How would they have communicated those sounds 500 years ago? The wide variance in the pronunciation of words even in English is also due to our dogshit orthography (largely imposed by the French), which often fails to give a decent hint for how to say something. Chinese characters are symbols of concepts that usually have a hint of what it's meant to sound like in the northern dialect, 1500 years ago, by referring to another character that there's no reason one would know what it sounded like.
The phonology differs. Vocabulary differs. Grammar differs. Speaking Cantonese and Mandarin natively, I have no idea what Hokkien or Sichuan people say, whether or not you write it down.
This is especially apparent when speaking to less educated people with less exposure to the standardised, official Chinese language, which is what people do actually write down when intending for a broader audience, of course. Diglossia is real.
Yep. Anybody who’s ever read written Cantonese or Shanghainese would realise they are often unintelligible unless you speak those languages and understand how they’re written. eg 「佢冇做乜嘢」
And yet the incorrect parent comment has been voted to the top of the thread by those who think it’s helped them.
Others have already corrected your other misunderstandings, but this is also false. Spanish has at least as much variance in pronunciation as English and has an orthography that is extremely regular. Brazilian Portuguese and Portugal Portuguese likewise have the same, highly regular orthography and are barely mutually intelligible.
To the best of my knowledge you actually have the causality mostly reversed: English's orthography is useless largely because the pronunciation changed but the spelling didn't, and English has a variety of pronunciations because the pronunciations changed differently in different regions. English has a messier orthography than other languages because of our complicated history of borrowing words, but the evidence shows that even people who start with a highly consistent orthography don't use it to keep their pronunciation static and shared.
China had an imperial bureaucracy for over 2000 years, which sent officials from one end of the country to the next. In fact, a predecessor of Standard Chinese (a.k.a. "Mandarin") was called "the language of officials" (官話).
Here in China, linguists consider the different Western "languages" to be dialects, and believe that the Western governments, for political reasons, make people think they speak different languages than their neighbors, so that they cannot unite.
I'm just joking, but what you say is as absurd as my joke. Western linguists don't consider the dialects different languages. If they do, they do it for political reasons. Accept that there are different ways of thinking and the real world never has to submit to how you define concepts like "a language", and not everything China surprises you with has something to do with politics.
Edit: I realized that my joke was closer to reality than parent's comment: https://en.wikipedia.org/wiki/Serbo-Croatian
Western linguists generally view the concept of a "language" as being a political one more than a linguistic one, and so rather than quibble about definitions they just use whichever word the people who speak the language/dialect would use. For example, from a book about Chinese dialects [0]:
The Chinese language varieties are dialects because it is politically expedient for them to be so. The Western Romance languages are languages because it is politically expedient for them to be. Linguists shrug and move on to more interesting (to them) questions.
[0] https://books.google.com/books?id=lCgnrA7Ke3QC&pg=PA1&source...
Open WALS or Glottolog or any other language catalog and you will see they categorize Chinese as a language family, consisting of multiple languages like Mandarin, Wu/Hui, Min, etc. You are free to disagree of course, but "Western linguists don't consider the dialects different languages" is simply not true to my knowledge.
Serbo-Croatian is a special case. We all know that we speak the same language, but ‘the others’ are naming it wrongly.
Also, Slavic languages are very difficult for non-Slavs to learn, but very easy for other Slavs. And many are mutually intelligible.
On a recent holiday to to Hong Kong {sp}, I noted the Metro announcements were in Cantonese, Mandarin and then English. I also noted that apologies and other minor social skirmishes and interactions, eg after bumping into someone, were mostly delivered in English.
When in Rome, speak English: its the French language.
Hong Kong is a special case where English still has higher prestige to begin with. Locals sometimes code-switch to English even if everyone present would understand Cantonese.
Franco-Germanic, por favor.
They are different languages in the sense that they are not mutually intelligible, but the prestige of Mandarin is not a CCP one. Mandarin as it evolved has always been the language of government and of the plains. Other Sinitic languages routinely borrow readings and terms from Mandarin. There's more that I want to say about the topic, but it's less relevant.
I don't think Kanji makes Japanese easier to read. Korean, ditched it, and it's comparatively easier to read.
Without Kanji, it severely degrades readability. One has to reconstruct the word from syllables, which introduces another layer of cognitive load.
In Korean, it works similarly as well though, most people nowadays are quite used to not incorporating Hanja in sentences over multiple decades, to the point where it would be impractical to mingle Hanja in Korean.
Which is what every language with an alphabetic writing does, and it works just fine. It is not "another layer of cognitive load", it is just a different layer, one that can be said to be much lighter or other languages would not have switched centuries/millenia ago.
The real problem of Japanese is the massive amount of homophones coming from Chinese. It is already a problem in Chinese, but even worse in Japanese due to the smaller phonetic repertoire.
Disagree. Once you get accustomed to reading kanji, and did not learn to visually parse (except very briefly as a young grade school child) nearly all of the words that you see regularly as logograms first, and groups of sounds second, the experience would be akin to reading English while afflicted with a strangely selective amnesia hole for entire words. Such that reading a word like 'shoe' would not instantly evoke an association with a piece of footwear but would have to be (admittedly very rapidly) sounded out letter for letter each time, instead of scanning the entire unit as a whole.
That's what reading a word normally represented by a familiar kanji character but "expanded" into hiragana feels like, and slightly more pronounced if it's, for some hipster reason, written in katakana.
And how many years does it take to? What about words that you have never seen before? What about ambiguous or uncommon readings that require furigana even for fully educated adults?
As I said in the sibling comment, I live in Japan and every Japanese person I have met complaints about the massive effort it took them to learn how to read and write.
Sounds like an education problem. Traditional Chinese, which is the defacto language used in Taiwan doesn't even have any comparable phonetic alphabet (beyond a phonetic pronunciation alphabet in the form of bopomofo) such as hiragana and katakana.
It's effectively "all kanji" as it were, and yet Taiwan has one of the highest literacy rates in the world, and I never met any Taiwanese when I lived there (for years) that complained that Chinese was too difficult.
In Chinese tho with extremely rare exceptions all the characters have only one reading.
Japanese has onyomi and kunyomi. The onyomi also come from different periods in Chinese so there's multiple onyomi for most Kanji.
Then you get two Kanji words that come in all varieties. Most are onyomi + onyomi, but you get some that are onyomi + kunyomi or kunyomi + kunyomi or kunyomi + onyomi.
There's also not really any solid rules to it, and when there are, there are plenty of exceptions.
It's a real nightmare of a system. A fun one though.
And then you have nanori, the non-standard readings used for people and places names that are impossible to read without furigana or already knowing the name. One that really surprised me was a village called 愛子 (a common female name read as "Aiko") near Sendai but in this case read as "Ayashi".
This stupid phenomenon is due to the fact that Japanese Gov decided to teach only arbitrarily 1000 kanji to school kids and this number decrease every 10 years.
https://ja.wikipedia.org/wiki/%E5%B8%B8%E7%94%A8%E6%BC%A2%E5...
You can see how stupid it is.
While at the same time Chinese people are learning 3X more. No one ever complain about the difficulty of Chinese character after all.
So you're saying that verbal communication doesn't work in Japan and everyone just texts each other? "I'm sorry Mr Honda Kawasaki but due to how Japanese language works it's impossible for me to tell whether you want to buy three oranges or cook prostate cancer, please send me a letter" "Okay I will fight the sky colander"
Most countries at some point had to simplify their languages in order to promote literacy. Korea didn't ditch hanja just for shits and giggles, it did so in order to make it easier for schools. Japan never really had to face this problem at a scale that required complete removal of kanji because by the time people got such ideas Japan was already quite literate, so kanji stuck around. Plus, Japan is an extremely conservative society, they only ever change anything once all other options have been exhausted.
Same reason why English spelling is so ridiculous. It's not that English is such a unique language that it absolutely requires a spelling system that doesn't make sense and effectively forces everyone to memorize each word's spelling aside from it's pronounciation (wow just like kanji), it's just that English spelling has never been a problem to a degree that required a systematic solution, so now we're stuck with what we have. If we suddenly decided to make a giant reform of English spelling to have it reflect actual pronounciation, the resistance would be equally giant.
Don't take me wrong, I fully agree with you.
I actually live in Japan, I am trying to learn the language, and it is royal PITA. Heck, every single Japanese person I have asked has complained about their language being so ridiculously difficult. They wish their language was easier, but as you say it is also such a conservative society that it will never change.
In comparison, my mother language is Spanish, a language with a highly phonemic spelling. My girlfriend is trying to learn it, and she always commends how once you learn a few basic rules, you can read anything.
Yes, so basically the arguments around lack of Kanji leading to worse readability are actually hitting upon the fact that readability suffers short-term not because Kanji enhances readability,but because they're simply not used to processing the language only through kana, and that were they to acclimatize to that, it becomes readable again and in fact easier to read than before.
I do wonder whether kana might be slightly easier to read if ん were written as part of the preceding syllable like it is in hangul.
Kana would be slightly easier to read if we spent as much time reading in Kana as we have in Kanji.
Hangul has some funny rules around patchim that need to be memorized. Kana does a great job avoiding this, so on balance kana is probably just fine compared to Hangul.
I don't think so. Kana just don't have enough entropy compares to Kanji. A kanji can be compose with up to twenty strokes with high variety of stroke patterns. Those excessive complexity make it identifiable even in extreme situations. (Blurry or tainted or whatever situation). In some case, a kanji with half of its size masked still be decoded without any ambiguity. But this will never work with an voice based language.
Not for beginners. But for fluent speakers, kanji are very necessary to distinguish all the homophones.
E.g. 効果, 硬化, 降下.. are hard to learn, but they're clear. こうか is much easier to learn, but it could mean any of 10-15 unrelated things.
All of which native speakers have no trouble distinguishing between in conversation when there are no Kanji anywhere. It's the same with homophones in any language, usually the context makes it clear because the alternatives don't fit.
The homophones in Japanese and Korean pretty much all come from the vocabulary they share with Chinese which makes up the bulk of the vocabulary for both those languages.
One doesn't use Kanji anymore, and no one seems to struggle to read it?
Japanese on the other hand I have seen even natives struggle to read. Heck even the existence of furigana in novels is an admission of this.
Chinese/Japanese has a level of written mutual intelligibility. Korean lost it.
It's like a native English speaker encountering new vocabulary. Happens quite often.
I'd agree that manga use of furigana helps (perhaps school-aged readers) reading, but furigana in novels are standard tools in the language that authors can use to achieve some effects that is hard to describe to non-speakers.
Sometimes furigana can be used artistically,sure, but that's the exception to the rule and it's by and large a reading aide in the vast majority of cases, and the inclusion of it in novels aimed at adults indicates that without it the author expects a certain percentage of readers may struggle with how to read the Kanji otherwise.
Why does this tool in the language need to exist? The answer cannot be because Kanji make things easier to read, else you wouldn't need tools to help you read Kanji you at times otherwise wouldn't be able to.
If you come across a word you don't know as either a native speaker of English or Korean, you can at least sound it out, which ups the probability you can connect it with a word you've heard before, otherwise since you know how to type it out it's trivial to look it up in a dictionary. If you come across a word you don't know in Japanese as a native speaker and there's no furigana it's a guessing game. The meaning is slightly more obvious to you, so you might be able to guess, but if you can't guess and you care to know and the word is in print then looking it up becomes a bit more of a pain.
Korean didn't completely lose the mutual intelligibility aspect entirely since the underlying pronunciation of the words still remains and can be used to correctly guess the word in a lot of cases. Like 시간 and 時間 as an example, but there's many, many words I've been able to guess in Korean based off knowing Japanese. I was able to score 50% on TOPIK II reading exam after only having studied Korean for 4 months in large part because of this.
This just isn't true. Even most native JP speakers agree that kanji are oppressively hard to learn and remember, so if it were feasible to get by with kana alone, then at least some native speakers would do it in some contexts. But outside of language learning it's virtually never done, and there's a reason for that.
Also, I think you're overlooking that Chinese and Korean have a lot more vowels/tones to work with than Japanese. There are a lot of Chinese-derived compound words that are homophones in Japanese but not elsewhere.
So how are homophones in Japanese understood when expressed verbally?
How much trouble do you actually have with them?
It really isn't. In conversation fluent JP speakers tend to avoid compounds that would be ambiguous, or add distinguishers like "学校の校歌". Honestly, try converting a paragraph of text from a newspaper to all kana, and having a native speaker read it.
Try have them be educated in a kana only system and then have them try read it. They'd probably do just fine. You'd expect anything you've spend a decade doing to be easier than the thing you've spent much less time doing.
かんちょうが かんちょうで かんちょうに かんちょう された。 Is probably a sentence that definitely requires Kanji to understand the precise meaning given how many homophones かんちょう has, but it's a toy example.
I can't, because there are no native speakers who learned that way, as I'm sure you know :D
But there are many learners like that, and my experience in Japan is that anyone who doesn't learn kanji has a very low ceiling on their vocabulary, even if they use the language daily for decades.
Because the reality is that it's hard to memorize 1K kanji, but if you do it then it's relatively easy to learn 10K+ compound words. Without kanji, to reach fluency somebody would have to memorize 5-10 completely unrelated meanings for "kouka", then 5 more for kakou, and so on for every combination of common single-kanji readings.
I mean - if you're in Japan, you surely know people who try to get by without kanji. Do you know any who've reached fluency? Like who could use and understand 5-6 different "kouka"s without any idea of the kanji they use? If your premise here is true then people like that should be the norm, since learning would be so much easier for them compared to those wasting their time on kanji?
Written kana drops intonation information that's present in speech. Writing with kanji makes up for this, and also allows for more complex sentences that aren't as common in spoken Japanese.
I personally find the most difficult part of reading kana-only text to be detecting word boundaries. It's much easier when kanji is used, and I'm not even a native speaker.
An English analogy isthatyoucouldwritewithoutspacesandbeunderstood but it's more difficult to read and unnatural.
Young gen-z types on Japanese Twitter abbreviate everything, but even they don't drop kanji.
That is what whitespaces are for, which Korean also uses.
Adding whitespace is a pretty simple solution. Heck if you really, really absolutely needed to resolve tonal ambiguity in kana you could add something to Kana to do that. That'd enhance the readability even further since, it's basically impossible for foreigners to learn correct intonation in Japanese unless they explicitly study it and that's on top of memorizing all that Kanji, but it would become explicit. I can recall exactly once in the last 10 years having a conversation where the there was ambiguity between two homonyms and someone asked a clarifying question to resolve it. The vast majority of the time it's just clear from context.
So.. I would say even that ambiguity isn't something people would actually have much a problem with.
I don’t speak or read Korean but I am studying Japanese.
I think GP was trying to say that kanji helps:
たまねぎ 玉ねぎ
いつつ 五つ
In both of these examples the words are the same. I’m still early enough in my studies that I don’t know the rules of when someone might choose to write one way or the other, but I’ve seen examples of ads that “spell it out” with hiragana. (Which is harder for me to read, which is what GP was trying to convey imo)
I've been fluent in Japanese for over a decade and am about 6 months into studying Korean.
I understand the issues of Kanji vs no Kanji well. Korean successful ditched it, isn't painful to read, is far more accessible to read for beginners, and doesn't suffer from an extreme long-tail of ambiguous difficult readings like Japanese does.
With Japanese no matter how many vocab you learn, you hit new words like 仲人, think you know how to read it correctly, can never quite be sure, look it up every time as a consequence and are surprised often enough at the reading that you never really settle into a sense of confidently being able to read new words correctly. It sucks.
In contrast I was able to score 50% on the reading section of TOPIK II after only 4 months of study.
So, on balance I'd say reading Korean is way easier because they ditched Kanji.
Thanks for clarifying. I don't disagree that Korean does seem easier, but I still think you are slightly confusing what the GP is trying to say.
I've never seen "仲人" before but I'm pretty sure it's some kind of person.
https://jisho.org/search/%E4%BB%B2%E4%BA%BA
仲 go-between (which I never saw before but I knew the radicals as "person middle" but didn't know what they were combined, but this one made great logical sense)
人 person
So while I can't "read" it (in Japanese) I can know what it means pretty confidently as kanji very regularly mean the same thing in compound words.
If I saw "なこうど" I'd have no idea because those hiragana don't mean anything to me until I learn the meaning.
Am I making sense? Like the first time I saw 花火 I knew "flower fire" and was able to guess firework.
same with 大人 being adult.
I'm not saying you are wrong that Korean is easier -- I'm saying, learning kanji can make it easier to understand a lot of meaning with never actually being able to "read" the words. and the reading is absolutely hard because of kunyomi and onyomi etc etc.
Being able to guess the meaning of new words was neat earlier on in the Japanese journey, but in the end the problem of "gah but the how hell do you actually read this?" was a greater detriment than that was a benefit.
In contrast if I saw なこうど I could at least be perfectly confident I was reading it correctly even if I didn't know what it meant. Sometimes I may be able to guess from the context at least partially what it means, but if not, then I could simply opt to move on having collected an instance of seeing the word. I might then later hear it elsewhere, or perhaps see it again and if I encounter it enough times I can get curious and look it up.
I could do the same thing with Kanji except I'd have to look it up anyway to be confident I was reading it correctly. Else I just don't know what the word is, so its harder to mentally file it anywhere in my brain. I found this lead to a very long-tail of pain when reading Japanese that didn't abate even when I got up to around 17k vocab in Anki after which I just said bugger it.
So, on balance I prefer the set of problems that no Kanji poses over the set of problems that Kanji poses.
I vastly prefer the ability to potentially infer the rough meaning of an unrecognized word, then the ability to pronounce it.
As an ESL CELTA certified teacher for years, their rubric also seems to back this up in order of relative importance: it's meaning, then form, then finally pronunciation.
I don't know if a rubric for English is as applicable to Japanese.
You're just trading one set of problems for another. Those aren't even problems English has.
Me too! Nice to talk with someone with a similar background :)
Native speakers don't magically gain the ability to read correctly every new word. So it is fine to hit the dictionary every now and then!
In the case of 仲人, I can guessimate 仲 (naka) and 人 (hito), but recall that 素人 and 玄人 are pronouced <long vowel>~uto. I would try to pronounce it as nakouto (the correct spelling is nakoudo). So people do gain a heuristic for reading.
Also Kanji provides a mnemonic device after learning the meaning of the word. (One who makes/improves 仲).
Yes, unless you're Chinese or Japanese speaker ;)
As someone who isn't even fluent in Japanese, I find it easier to read text with Kanji (in contexts where I am relatively fluent) than without.
Japanese without Kanji is like English (or any Latin alphabet language) without punctuation or spaces or capitalization. And also if English had a ton more homophones. You basically need to word-split and disambiguate as part of the reading processing; it's painful.
This is because this is how you've trained your brain to process Japanese.
Korean includes white space for this reason, and since everyone gets used to reading it with no Hanja, they don't find it awkward.
So, it's perfectly possible to ditch Kanji and add spaces and have a way more accessible language.
That doesn't solve the homophone issue though.
usually a parenthesis with corresponding character is used to be explicit or to avoid confusion but its strictly for chinese loan words like:
ex) 시간 (時間)
compared to furigana (note that its not even possible to display the phonetic hiragana): 第二巻
younger generation are no longer learning Chinese so more English/European loan words are directly used which ironically fixes the issue.
it is impossible to converse in Korean without using English/European loan words
ex) 아이러니 (irony)
This allows new ideas/concepts to quickly disseminate in the collective Korean psyche. Constantly new words are being invented, slangs used by primary/middle school are unknowable.
Abbreviation/concat combos to create totally new words:
ex) 씹상타치 (way **ing above average) also written as ㅆㅅㅌㅊ (wfaa) literally means
If you look at old Japanese video games before the hardware could do Kanji, they used spaces to separate words. But when the game was capable of Kanji, the spaces went away.
Korean is easier to pronounce, but Japanese is easier to read. Try reading a Japanese children's book that doesn't use kanji. It's excruciating.
Most HN users who are coders should be able to read and write Korean within a few hours...
Good luck with any other language. Japanese is tougher since you have to memorize thousands of what are basically hieroglyphics and learn two separate alphabets (one for native Japanese, another for foreign).
With Korean you can express many sounds phonetically (with some abstractions for non-korean sounds)
That hasn't been my experience of both languages.
I think kanji can make more clear the meaning if it can be understood, but not for pronouncing. Kana is other way around. (In case it is difficult, it is also possible to add furigana.)
I think it is beneficial to have both in Japanese writing.
English? Do you mean the UK? US?
My perception was that China said Chinese was one language and that most westerners agreed. Is this not the case?
The difference between a langauge and a dialet is political, not linguistic.
"A language is a dialect with an army and navy"
https://en.wikipedia.org/wiki/A_language_is_a_dialect_with_a...
It's cute, and gets at something real, but it's overstated. The first question is "are these things mutually intelligible", with an answer on a spectrum between "yes, perfectly, obviously" and "no, not at all, obviously". There is a huge gray area that stretches pretty far from the middle in which contingency, identity, community, and nation building projects (both military and literary) move the lines around quite a bit... but as we near either pole that factual question dominates. I think. I'd be interested to learn of counterexamples.
There are German "dialects" that are completely mutually unintelligible. Take someone who speaks the standard German dialect and drop them into the Swiss Alps, and they will understand next to nothing. It will be much easier for them to learn to understand Swiss German (linguists call it "Allemannic") than it would be for someone who doesn't speak German, but it will still take time and effort to adapt. Why are they both called German?
There is Luxembourgish, which is basically the local dialect codified as one of the official languages of Luxembourg. It's otherwise perfectly understandable for people from adjacent parts of Belgium and Germany. But I guess the locals would see that it really is almost the same.
Similarly to Chinese, Germans see themselves as mostly the same culture. Standarddeutsch is pretty much a fusion between the different varieties and has evolved along with them for a long time; differently from 普通話, which is much younger and the standardized form of a northern variety of Chinese. Germans also really cling to their dialects, and Switzerland and Austria both use slightly different versions of Standarddeutsch.
The opposite example are the varieties of Serbian, Croatian, Bosnian, and Montenegrin, which are quite inter-intelligible, but which are considered as different languages by their speakers.
Standard German is not that old. It was largely developed in the 19th Century, and it was not until the 20th Century that most people in Germany were able to speak it. Standard German is also heavily based on a regional dialect of German (in particular, central German dialects).
Standard Chinese is a product of the early-to-mid 20th Century, so about 50-100 years younger than Standard German. This just reflects the fact that German unification was in the mid-1800s, while China's modernization occurred in the early 20th Century.
People do typically consider German and Swiss German to be separate languages, I think.
It's pithy but not really true. Breton is its own language. So is Basque. These are not political questions, but based on linguistics entirely.
newbie take: there is one Chinese language
intermediate take: there are many Chinese languages
expert take: there is one Chinese language
There are also plenty of languages in China that are not Chinese or a dialect of Chinese. Tibetan and Mongolian (and their many dialects) are obviously not Chinese. Chinese written language is used as a phonetic script for some minority languages (although many are based on uighur script is used a lot also, Uighur itself uses arabic).
It seems pretty well known that Mandarin and Cantonese are different languages. It turns out there are a whole bunch.
My analogy is that 1,2,3,4,5 is a unified script that allows anybody to understand writing.
However, saying the words 1,2,3,4,5 will depend on your local language.
The analogy to numerals is great to get a western-language speaker to grok the basic mechanism by which a symbol can be unrelated to a sound, reused across completely different languages, and even have different ‘readings’ in different contexts (2, 2nd, 12, 1/2…)
The use of 2nd is a better example of the limitations of this system. The reality of written Standard Chinese is that you'd use "2nd" to write both "second", but also "deuxieme", instead of writing "2eme".
Right, that case is a little more useful to analogize the way Japanese uses kanji (Han characters) with local Japanese inflections (okurigana) to adapt the Chinese writing system to their local inflected language. We write the word ‘second’ using the Arabic symbol that connotes the concept of ‘two’, followed by an irregular English inflection to make it ordinal. Is ‘seco’ a ‘reading’ of the 2 symbol? Kinda sorta?
Also helps you appreciate that Japan is not completely insane for having seemingly completely unrelated number words for different contexts, even though they write them with the same numeral. Turns out, so do many western languages (although generally only for ordinal/cardinal, not for the endless range of counters Japanese has)
The fact that as a native English speaker this seems like not how you actually read or write numerals at all - no, of course 3 doesn’t ’read’ as ‘thi(r)’ - also suggests that there is a less mechanistic way to understand the relationships between hanzi and words than learners often try to apply (we want to find the ‘rules’ that must underpin these things), and that the way native speakers of Mandarin, Cantonese or Japanese think of the relationship between these symbols and the words they are writing is much more organic - and that’s okay.
I don't know much Japanese but I don't see it as that weird. I see it as something like "murder of crows" or "pile of sand". Something cultural that was there for a reason (or a monk somewhere) and now we have to memorize it.
I think the GP is referring to a different part of the Japanese numbering system: the two different numerals used with different counting words for 1-10; e.g. the fact that you say "muttsu no koto" [6つのこと] for "six things" but "rokko no retasu" [6個のレタス] for "six heads of lettuce".
This is what they are comparing to the difference between "first" and "one" in English, which are obviously two different origin words for the number 1 (unlike sixth and six, where sixth is clearly just derived from six).
Those are language-specific, the common way to write an ordinal (outside of English) is with a dot - 2. = 2nd.
I don't think any "common way" of this kind really exists. In my country, the common way to write "second" ("a doua") is either II or II-a (so using roman numerals). In French, either 2eme or 2e are the common ways. I've never even seen this "2." spelling for ordinals.
Regardless, my point was that Chinese spelling is not as universal as it is made out to be, that different Chinese dialects/languages just use the Standard Chinese spelling, even when it doesn't match their own spoken language, just like a French person using "2nd" to spell "deuxieme".
But it also shows that you don't need logograms.
We don't have a distinct symbol for 35 just because English and French pronounce it differently.
Unlike with numbers, day to day language needs to convey a larger variety of concepts hence why logograms are still needed. Much like how more advanced math requires different symbols and operators that are akin to mathematical logograms to convey additional concepts beyond the fundamental quantities of 0 to 9.
Alphabets are often linked to sound, and it would be a tremendous challenge to create an "alphabet" analogue that links to a set of fundamental concepts that you can somehow rearrange to form higher level concepts and can still be universally understood without it linking to sound.
If it is nearly impossible to switch the US from imperial to metric, I cannot imagine what it would take to unify a massive population under a single dialect. I think the answer is measured in generations.
VTubers and ~15 years. Media, in generalized form. The velocity of weebs internalizing fringe Japanese concepts is astonishing.
Not as fast as the reverse. It's surprising how much Japanese is just borrowed English written in katakana.
Eh, it’s “hard” for the U.S. to switch from customary to metric units because nobody really cares enough to do so. The forces causing language standardization (in many countries, not just China) are much more powerful.
The US will be slower than most places due to the federation and ideological individualism.
Americans use metric where it's required by law (e.g., food and drug packaging), it just takes government force. Government force can also change a population's language. See, e.g.,
https://en.wikipedia.org/wiki/Anglicisation
https://en.wikipedia.org/wiki/Russification
https://en.wikipedia.org/wiki/Sinicization
It's worse than that, the US doesn't even use Imperial units, which was only introduced after US independence..
They use their own unique set of units that are different to both Imperial and Metric.
Ask French how they've managed to (almost) eradicate the Occitan and Breton and spread the Parisian dialect as the offical variant of French language throughout the country.
Pronouncing two identical sequences of characters differently depending on dialect doesn't appear to be a problem in English.
The dialects of English (such as Scots) most certainly do not spell words the same as a general rule. The fact that there are slight pronunciation differences between different accents of standard English is not at all the same as writing entirely different words with the same sequence of characters.
For example, the word for "enemy" in Mandarin is "dírén", in Cantonese it is "dik6 jan4", in Shanghainese it is "dih nyin", and in Hakka it is "tit5 ngin11" (including different tone markers, I'm taking these translations from different sites). All of them would write it as "敵人". These words are far more different though than the difference in how an American and Englishman would pronounce "four".
The influence of loan words on English give vastly different pronunciations of words even regionally. For extreme examples see "shibboleth", especially in the UK.
What I've found with some cursory googling is mostly place names, which I agree often have major differences between spelling and pronunciation. Even there, many of the examples on the Wikipedia page are still plausible spellings for the pronunciation, especially given how ambiguous English spelling is in the first place. Others seem more like nicknames that have essentially replaced the original full name of the place, while the full name is conserved in the spelling.
I'm not entirely certain that's a good explanation.
For most of history, literacy isn't exactly common. I'm not finding easily accessible any estimates of literacy rates for early (say, Qin dynasty) China, but numbers for medieval Europe suggest something like 10-30% for relatively broad definitions of literacy, which seem to be commensurate for estimates for Qing dynasty China. Especially if you look at the period at which Chinese characters essentially ossify into their modern form, it's not clear to me that there's a wide diversity of topolects that it has to approximate, almost certainly nothing to the degree of modern Chinese.
For another thing, mediating among linguistic diversity is something that all of our other scripts have had to do. Cuneiform was used to write the administrative languages of different language families (Semitic languages like Akkadian, Indo-European like Persian, and who-knows-what-language-family-these-are like Elamite), and yet it was a syllabary. Even Chinese script itself starts devolving into a syllabary when Japanese adapts it.
The reason I think Chinese resisted becoming a syllabary was because Chinese was poorly suited for such a transition: my understanding is that words in Chinese are largely monosyllabic and involve a decently high degree of homophones. Furthermore, reconstructions of Old Chinese also suggest a relatively complex phonotactic structure, which means a syllabary that largely covers a CV-syllable scheme is hard to adapt. In other words, Chinese may have been a rare language in that conversion from a logography to a syllabary would not have dramatically reduced the amount of characters one would have had to have learned. (Note also that the reduction of a syllabary to an alphabet, abjad, or abugida happened effectively twice, with Phoenician (or some ancestor) and Korean Hangul).
I don't see how your explanation and GP's explanations are mutually exclusive.
To add some detail to this point:
A syllabary would have had to represent phonetics as well as tones, which would have multiplied the required syllabary by n number of tones. For instance, Mandarin has 4 (or arguably 5) tones. The "ah" sound has four pronunciations: ā, á, ǎ, and à. Hong Kong Cantonese has at least 6 tones, having purportedly lost a few. Different dialects of Chinese have different numbers of tones, and some have been lost or gained within the same dialect throughout history.
Via Phoenician (between Egyptian hieroglyphs and Etruscan), the English alphabet is also a bunch of pictograms with reduced pictorial content.
Here is a picture of a fish under water:
M
D
I'd love to hear other people's take on this. I heard this many times when I lived in China, however living in Taiwan - people still always use subtitles. In Taiwan there are vanishingly few people that don't speak Mandarin, so it's not inserted for people that are bad at Mandarin. You will see that both in China and Taiwan people that are fluent in Mandarin watching a Mandarin movie will never turn off the subtitles.
Talking to native-speaking friends I've pieced together that it seems Chinese is actively hard to make out (compared to English). Without the subtitles they will miss sections of dialogue in movies/tv-shows. Maybe because it's so tonal and contextual? I've asked people "Okay, but when you talk to people day-to-day, you don't have subtitles - so how are you dealing with it?" and the responses seem to boil down to "often we have to guess what the other person is saying"
I'd love to hear some thoughts from someone who is 100% biligual and able to make the comparison
This jives with my experience as well. Chinese has a ton of near homophones that are distinguished by tone. One interesting result of this is that Chinese speakers seem to hate hate hate accents, even just from other parts of China. I hear because it's just mentally taxing to listen to.
Though I'm certainly not 100% bilingual. I like to think it isn't just that I'm annoying to listen to. I have heard other speakers get put down as sounding like 'birds chirping' which seems to be a popular way to describe accents.
I’m not 100% bilingual, but my take is that there’s a lot more to be gained from the subtitles due to homophones, wordplay, and literary allusions. It’s like getting genius.com rap annotations in real time for any metaphors or references.
It’s worth noting that the first emperor of China was the one that unified the language. The country was at war for about 250 years during the warring state period. One of the main pushes to maintain unification was standardized writing system throughout the country, increase of commerce, and unified monetary system.
Also somewhat disputed, but the first emperor of China killed all the scholars from every other nation they conquered to facilitate the language unification.
"unified the language". I think you mean script, rather than language, given the plurality of those extant even today.
When I was studying Mandarin Chinese at a school in Shanghai, borrowed a book on Shanghainese. The reason why everyone in shanghai aren’t writing “everything different” is because they are not writing in Shanghainese. They are writing in Mandarin.
I disagree on your assessment of Japanese. I would argue that Japanese is the most difficult written language in common usage / not artificial.
Moreover, one of the greatest literary achievements of Japan, “The Pillow Book” is written entirely in hiragana. Today you have so much text that leans into the resolve of ambiguity that kanji lends that you’d lose a lot of writings were everyone to unlearn kanji, but I disagree that it’s an aid, and had Japan developed its own writing system, it would have felt a lot more like hiragana than kanji.
This explanation doesn't make any sense. Your example is two words which are barely different at all in their pronunciation, certainly not sufficiently to cause unintelligibility (i.e. [wirk] vs [wak]). Differences in pronunciation of this kind are everywhere in English.
As far as I understand this, this is quite an oversimplification. The differences between different dialects of Chinese is huge, especially in terms of vocabulary. The writing system isn't as purely logographic as it is often touted. There are only ~4000 characters in common use (university level literacy), but many more common words. So, lots of words are written with multiple characters. In Standard Chinese (corresponding mostly to the dialect of Beijing), each of the characters in a word represents a syllable in that word. This correspondence doesn't hold for other dialects.
Overall, people speaking other dialects of Chinese than the standard essentially write in a different language then they speak, unless they also adapt a different variety of written Chinese and lose any mutual intelligibility (a lot of such varieties exist, though few are standardized). It is in some ways like writing English words with the latin spelling of their etymology, say writing the English phrase "Jules appreciates art" and the French phrase "Jules apprecie l'art" both as "Iulius appretio ars".
You don't need logograms for that. You don't need a phonetic alphabet if all you want is a visual alphabet.
You could write a word / morpheme as a string of standard strokes / radical characters, instead of as overlapping standard strokes.
This is a great explanation of important points many people fail to recognize. Thank you!
I'd like to point out this isn't unique to CJK (Chinese/Japanese/Korean). Languages descending from or based on Latin can be understood, at least very generally, by each other because the equivalent words in each language usually have similar spellings or appearances.