freefont-bugs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freefont-bugs] Discussion and questions on Unicode Han Unification


From: Ange Gapes
Subject: [Freefont-bugs] Discussion and questions on Unicode Han Unification
Date: Wed, 26 Jan 2011 15:52:00 +0900

Hello,

sorry this is not directly about bugs in Freefont, nor direct development matters, but I could not find a more generic ml for your project. But I think this kind of discussion is still of interest. Hopefully you will think so.

I recently came to some interest on the Han unification project and problem they implies for texts mixing languages. As you are a font project, I guess you know the issues, but for those who don't, I summarize this way: typically for the main 3 languages (Chinese, Japanese, and Korean, though these last one don't use them much in modern writing, hence CJK) who use Chinese-originated characters (Han characters), the Unicode project has decided to unite the character from a same origin (Han Unification: Unihan). This leads to problem when the actual writing of them is different depending on the actual country, sometimes slightly (style), sometimes in a more obvious way. The Wikipedia page has good examples on the issue: http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters (this is significant only if you have right fonts on the computers which will show actually the characters with difference).

The way it is dealt with is:
- you use only one of these languages, then you don't care and take only fonts which display your chosen language's way.
- if you read texts of several languages, or even mixed inside a same text, the text can have some kind of markup then different fonts are selected.This is the way it is done in html, hence you can see different fonts for the actually same unicode character in the Wikipedia page I showed before.

But what when you read raw text file without markup for instance? No sure way to tell the language for the editor and mixed characters won't show up.

So why do I tell this all to you? I would like to know your opinion, if not position, towards this Unicode decision. Do you have any remarks on it?
Also what does it mean for a project like yours? Is it possible in a same font family to provide several different fonts/design for the same character with "context" information (= this font is preferably for Chinese display only, unless no other choice, this one for Japanese, and so on) and a default one maybe (in case no context is available, use this "generic" design)? So that a software using your font only may still display different designs depending on the displayed language (if it knows it) or a default version otherwise...

On a side note, I read somewhere that there were maybe some other kinds of characters where similar problems arise. In particular I read on a website about another example of Arabic characters being used in several country/languages but displayed slightly differently. Yet after some search, I could not find actual information on this specific issue, so I don't know if it is true, or maybe it has been fixed since then by the Unicode project by assigning specific characters or control characters to change the display? (Arabic don't have that many characters as those East Asian languages, hence less space issue for duplicating characters)
Do you know about such specific Arabic-character issue? Or other issues with other glyphs in other alphabet?
Do you participate into Unicode standardization? Do you have details on what conducted to this internally? Is it really ONLY a space problem? Because even though there are for sure a lot of characters in these countries, it looks to me there are still a lot of slots unassigned, really far enough (that's how Unicode has been designed after all: with far enough slots for all history, as far as I know). So I don't see the points of keeping them for no reason (it's not like suddenly new alphabets of hundred of thousands of characters, all new, will be created in the next century).
And in the worst case, Unicode may still be extended.
So if you have any particularly interested link to discussion in the Unicode project (mailing lists maybe?) about how we came to this, this is interesting as well.
I will also myself ask directly to Unicode guys later, but I first wanted to know the opinion of a font project whose goal would be to span on all the Unicode. What does that imply for you?

And so on second level, why do I ask all this? Simply first of all I am interested in Unicode, in such questions, for personal use but also for pure intellectual interest (among other reasons, being myself involved in standardization processes, though not directly into Unicode, for now at least). Also because I think this is pretty sad and when I read about this, I didn't agree much with such moves (whereas the prime goal of Unicode was to support any existing character, so this looks like a step backwards; and also because we know that some countries, Japan at least for what I know, is not very into standardization, thus they don't use that much the Unicode encodings, like UTF-8, but localized encodings, and this kind of move won't make them want to change this).
And also because I am currently beginning to write what-may-become-a-book, in some future, not on this in particular, but this kind of topic may be part of it.
So thanks all. Any opinion and information on the topic would be greatly appreciated.

Ange

P.S.: and for personal use, a last question: do you plan on supporting these East-Asian characters in some foreseen future? In particular Japanese Hiragana-Katakana-Kanjis and Korean basic alphabet?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]