The Absolute Minimum Every Software Developer Must Know About Unicode

The Absolute Minimum Every Software Developer Must Know About Unicode

Кривицкий уже долгие годы [безуспешно] пытается просвятить нас всех на тему
кодировок.
Но у него не получается.

А я вот наткнулся на статейку, которая просвещает этот тёмный вопрос:
The Absolute Minimum Every Software Developer Absolutely, Positively Must
Know About Unicode and Character Sets:
http://www.joelonsoftware.com/... 

Читать надо, конечно, всё, но вот некоторые цитаты:
===
...
I discovered that the popular web development tool PHP has almost complete
ignorance of character encoding issues, blithely using 8 bits for
characters, making it darn near impossible to develop good international web
applications
...
In many cases, such as Russian, there were lots of different ideas of what
to do with the upper-128 characters, so you couldn't even reliably
interchange Russian documents.
...
Some people are under the misconception that Unicode is simply a 16-bit code
where each character takes 16 bits and therefore there are 65,536 possible
characters. This is not, actually, correct.
...
It does not make sense to have a string without knowing what encoding it
uses.
...
If there's no equivalent for the Unicode code point you're trying to
represent in the encoding you're trying to represent it in, you usually get
a little question mark: ?
===