Home Standaarden voor lettertekens / Standards for character encoding Back to the future Back to the future [3]

Back to the future [2]

Zoals ik al in de vorige aflevering schreef is niet iedereen gelukkig met de weg die de ISO/Unicode is ingeslagen. Er wordt op de discussielijsten voor Unicode uitgebreid gediscussieerd of het überhaupt wel nodig is om 'multilingual' te werken.

Het begon met de niet zo gekke vraag van Joe Ross:

We're looking for a multilingual search engine to use with our internal and external web sites. The criteria I have in mind are those in Thierry Sourbier's paper on "Keys to Building a Multlingual Search Engine":

  • all text processing as normalized Unicode or UTF-8
  • language-specific text parsing, in particular word breaking
  • storage of indeces as Unicode or UTF-8
  • ....

Does anyone know of an available search engine with these capabilities?

Kortom als ik met een zoekmachine werk moet ik ook 'Wa³êsa' in kunnen tikken.

Het tweede probleem -als je dan al een font hebt met zoveel tekens, hoe kun je die dan soepel gebruiken - werd aangeslingerd door een Zweed, Frank Luin:

I have just succeded to input my first Unicode characters in a Mac. It was done in inDesign by Adobe and I used the "Insert character" function. It can be done, but it isn't done easily.

I know, of course, that every character I write just now is a Unicode character, too. My problem is to be able to input, e.g., a g+breve from the Turkish alphabet, a special Maltese characters, a c+circumflex used in Esperanto, and so on, without the need to select a different keyboard layout.

Some characters with diacritic can be input pressing the diacritic+character, at least in a Macintosh. I believe that a generic layout allowing that any such combination be possible would be of great help and ought to be included automatically with every operating system. Maybe it is done already. One of the layouts installed with Mac OS 9 is labelled Latin Extended (U). I cannot see what it means since it is constantly disabled (=grey) and I have not seen any information about it.

Another solution would be the possibility to customize keyboard layouts. An example: I am used to work with a Swedish keyboard, but need to write an article about Czech films. Since I am a perfectionist I want to use the exact spelling of the Czech names. If I could position the Czech characters on the keyboard layout where I can easily reach them - more or less in the way keyboard layouts can be customized with ResEdit -, my work would be much easier. I have seen some time ago a similar program for Windows (called "Compose", if I remember correctly), so I know it can be done. With the richness that Unicode offers, the possibility for such ad hoc layouts ought to be a natural addition to any Unicode savvy operating system.

Het probleem van de toetsenbordindeling is ook bij Atari niet vreemd en zijn ook aangepakt met programma's als 'Compose' doch altijd uitgaand van een uitgebreide ASCII tekenset [0-255].

Frank Luin vervolgt nog enige antwoorden later met een jammerklacht over Apple Macintosh:

> what kind of Unicode characters did you succeed with - I guess only
> characters covered by an installed Language Kit, didn't you? Which
> Unicode characters is it possible to enter with Adobe inDesign?
> > I am searching an ATSUI compatible application - to enter/edit ALL
> Unicode characters, even those not covered by Apple Language Kits!
> Since Mac OS 8.5 I was told this should work even with PC-fonts!!!

Capitals, Charcoal, Gadget, Palatino, Sand, Techno, Textile - as you see fonts installed by the ordinary Mac OS installation. There may be more. There is an odd possibility to insert Unicode characters in inDesign - in this case I am talking only of the Latin alphabet - by the "Insert character" menu item. It works, but it isn't user-friendly at all.

As for your search for an ATSUI compatible application, I feel the same frustration as you not beeing able to receive any adequate information. There are lots of technical documents on the Apple site, but not a single one telling how an ordinary user can make use of Unicode, e.g. what applications to use or what routines.

I am afraid that a contribution found on another list, telling that Unicode will be usable on the Mac only with Mac OS X, may be true. InDesign seems to be an anomality.


You have probably seen John Hudson's explanation about inDesign. Yes, in fact, I have been able to input characters in the additional slots and print them as well, after installing AdobePS 8.6, the printer driver which can handle those characters (it comes with inDesign).

> > With "Mac OS 9" you can optional install all "Apple Language Kits".
> But to enter characters outside these Kits you need - beside fonts to
> display - an ATSUI compatible application. But not one of which exists,
> I was told at Apple! They keep working on it - a simple text editor!
> To me this sounds as dead-born as QD-GX was years before! Shit!

That's as far as I have have come in my explorations, too. A technician at Apple Sweden told me that Unicode in Mac OS is, for now, only developers' stuff. I hope somebody in this list has more relevant information to share.

> > Beginning with "Mac OS 8.5" ATSUI compatible applications could use
> PC-TrueType fonts - as are, not to be converted anymore - displaying
> Unicode ranges not covered by Language Kits, too!
> > Did you try PC-Cyberbit or other Unicode font with Adobe inDesign.
> I would be very interested in knowing about this.

No, I haven't tried that possibility. I downloaded long ago the PC-Cyberbit font and have it somewhere. But even Verdana for PC is a Unicode font, if I am right. Being much smaller it is easier to work with it, I suppose. I'll try it and let you know.

ATSUI is het wondermiddel van Apple voor de méértalige ondersteuning. Alleen welk programma gebruikt het en wie kan me vertellen hoe het werkt? Zoals al eerder gezegd hou ik mijn hart vast als ooit mijn Atari's het zullen begeven. Met welk programma moet ik méértalig bezig zijn? Voorlopig dus nog niet op een Macintosh! Of toch als ik Deborah mag geloven??

Our Unicode text editor is actually a system service built into Mac OS 9. The editing program we are working on is just a demonstration of that editing engine. Any application can use it and it's quite straightforward to use. It is our replacement for TextEdit, and it dispenses with most of TextEdit's limitations. Apple's Java implementation, MRJ, uses a private version of this MLTE (Multi-Lingual Text Editor), and will move towards using the built-in version in the future.

The only publicly shipping software that I am currently aware of that uses ATSUI are MLTE and Kotoeri (Apple's Japanese input method). Kotoeri uses ATSUI to draw the characters in its palettes. It supports full Unicode, although non-Unicode applications cannot receive characters outside the range of Mac OS character sets. Kotoeri will not function without ATSUI.

I feel it's important to emphasize that unlike GX, ATSUI is built into the system, cannot be removed, is currently being used by Apple system software (like Kotoeri), and is slated to be used by more system components as time goes on. It is not going away, and it is not an optional install. HFS+, our file system, returns file names as canonically decomposed Unicode. When the Finder gets around to supporting these Unicode file names, it will be using ATSUI and MLTE to do so.

I know there are developers working on third party applications that use ATSUI and MLTE. I am hoping that as Apple demonstrates our commitment to the technology, the specter of GX will no longer haunt us, and more developers will feel confident in using it.

Deborah Goldsmith
Manager, International Toolbox Group
Apple Computer, Inc.

Over de vraag of er wel belangstelling is voor méértalige documenten en programma's die dit aankunnen, wordt gelukkig heel positief gereageerd door de 'marktleider':

That's not to say that I (or rather, Microsoft) don't want to create multilingual-capable software. Obviously we do as evidenced by the work we did in Office2000. We do some opportunistic work to make multilingual authoring easy but the multilingual aspect is mainly a side effect of globalizing the product, and not a primary goal or driving force, as I discussed in my last presentation at the Unicode conference.


And, just to reiterate, anyone who knows me knows that I (and Microsoft) are all in favour of supporting multilingual usage in Office (and Windows, and other applications), and have a track record to prove it. I'm just trying to point out some realities.

Chris Pratley
Lead Program Manager
Microsoft Office

Xerox becommentarieert dat dan als volgt:

What Chris says matches the results of market studies we (Xerox) did on multilingual systems. It is globalization and connectivity that create the value of one-world architecture; multilingual documents are a pleasant bonus for those of us who need or enjoy them.

Joe Becker

Kortom, omdat we over de hele wereld werkzaam zijn en onderling verbonden hebben we Het gewoon nodig!

Copyright © Rein Bakhuizen van den Brink
Last updated on 21 november 2001

Home Standaarden voor lettertekens / Standards for character encoding Back to the future Back to the future [3]