Re: Sinhala GNU/Linux

From: harshula <email-not-shown>
Date: Sun Dec 05 2004 - 02:21:51 LKT
To: Donald Gaminitillake
Cc: Anuradha Ratnaweera <email-not-shown>, Delan Silva <>

On Sat, 2004-12-04 at 09:23 +0600, Donald Gaminitillake wrote:

> I know Mr Harshula understand the problem.

Hi Donald,

You seem to be quite certain that you have not misunderstood the issue. On the other hand, I feel that you have. I'm more than happy to have a discussion, if and only if, we restrict the discussion to only the Unicode code chart.

I assert that the Unicode code range:

0d80-0dff (hereby referred to as Set A1, A2 and A3)


200C-200D (hereby referred to as Set B)

allows us to encode the Sinhala letters. If you disagree, please attach an image of the word that you claim can not be encoded. And I will attempt to give you the Unicode code sequence. We'll discuss only one word at a time.

First and foremost, you are getting bogged down by conceptualising letter encoding as a simple matrix. Perhaps this was the case a few decades ago, but it is no longer the case.

For extensibility purposes, the Unicode encoding ends up involving more than one simple matrix. In our case it is roughly the union of:

This union produces a set of encodings containing all the basic elements (letters). Ironically, this union not only contains your 1660 letters, it also includes baendi akuru. You should also note that the basic elements are not encoded by a fixed number of bits.

For the reasons stated above, it is incorrect to claim that the Unicode Sinhala encoding is incomplete.

You may also be interested in the UTF-8 encoding format which allocates a set of characters to be encoded in 8 bit, a larger set of characters to be encoded in 16 bits and an even larger set of characters to be encoded in 24 bits. Please note the use of dead high bits, I suspect it will become relevant later in the discussion:

Harshula Received on Sun Dec 05 07:21:51 2004

This archive was generated by hypermail 2.1.8 : Wed Dec 08 2004 - 17:56:45 LKT