Discussion with Donald Gaminitillake: Re: Sinhala GNU/Linux

From: harshula <email-not-shown>
Date: Sun Dec 05 2004 - 14:48:57 LKT
To: Donald Gaminitillake
Cc: Anuradha Ratnaweera <email-not-shown>, Delan Silva <lakfoil@slt.lk>

Hi Donald,

You seem to be confused. You need to read my email again.

I explicitly stated that:

I assert that the Unicode code range:

0d80-0dff (hereby referred to as Set A1, A2 and A3)

and

200C-200D (hereby referred to as Set B)

allows us to encode the Sinhala letters.

...

In our case it is roughly the union of:

A1
A2
C (= Cartesian product of A2 and A3)
D (= a subset of the Cartesian product of B, A3, A2 and C2 (C2 = Union of C and A2))

This union produces a set of encodings containing all the basic elements (letters).

So once you understand and acknowledge the existence of Set C and D, then we can continue our discussion. If you do not understand the mathematical terms, please tell me, and I'll find an explanation for you.

What you wrote in your email does not even remotely correspond to what I wrote. I have taken the liberty to attach my email to the bottom of this email, as you seem to have replied to Anuradha's email and not mine.

Regards,
Harshula

On Sun, 2004-12-05 at 09:38 +0600, Donald Gaminitillake wrote:
> Dear Harsula
>
> Thanks for the reply.
>
> 0D82..0D83 ; Other_Alphabetic # Mc [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA
> 0DCF..0DD1 ; Other_Alphabetic # Mc [3] SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL SIGN DIGA AEDA-PILLA
> 0DD2..0DD4 ; Other_Alphabetic # Mn [3] SINHALA VOWEL SIGN KETTI IS-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA
> 0DD6 ; Other_Alphabetic # Mn SINHALA VOWEL SIGN DIGA PAA-PILLA
> 0DD8..0DDF ; Other_Alphabetic # Mc [8] SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOWEL SIGN GAYANUKITTA
> 0DF2..0DF3 ; Other_Alphabetic # Mc [2] SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHALA VOWEL SIGN DIGA GAYANUKITTA
>
> 0d80-0dff (hereby referred to as Set A1, A2 and A3)
>
>
> This is given in SLS1134
>
> Set A1 A2 and A3 do not contain complete sinhala characters
>
> according to you the rest of the chracters are listed in set B which is 200C-200D
>
>
> I am sending you the image of the set A which is the SLS 1134 if correct please send me similar image of the set B
>
>
> Best
>
> Donald
>

Message -------- From: harshula <email-not-shown> To: Donald Gaminitillake Cc: Anuradha Ratnaweera <email-not-shown>, Delan Silva <lakfoil@slt.lk> Subject: Re: Sinhala GNU/Linux Date: Sun, 05 Dec 2004 07:21:51 +1100 On Sat, 2004-12-04 at 09:23 +0600, Donald Gaminitillake wrote:

> I know Mr Harshula understand the problem.

Hi Donald,

You seem to be quite certain that you have not misunderstood the issue. On the other hand, I feel that you have. I'm more than happy to have a discussion, if and only if, we restrict the discussion to only the Unicode code chart.

I assert that the Unicode code range:

0d80-0dff (hereby referred to as Set A1, A2 and A3)

and

200C-200D (hereby referred to as Set B)

allows us to encode the Sinhala letters. If you disagree, please attach an image of the word that you claim can not be encoded. And I will attempt to give you the Unicode code sequence. We'll discuss only one word at a time.

First and foremost, you are getting bogged down by conceptualising letter encoding as a simple matrix. Perhaps this was the case a few decades ago, but it is no longer the case.

For extensibility purposes, the Unicode encoding ends up involving more than one simple matrix. In our case it is roughly the union of:

A1
A2
C (= Cartesian product of A2 and A3)
D (= a subset of the Cartesian product of B, A3, A2 and C2 (C2 = Union of C and A2))

This union produces a set of encodings containing all the basic elements (letters). Ironically, this union not only contains your 1660 letters, it also includes baendi akuru. You should also note that the basic elements are not encoded by a fixed number of bits.

For the reasons stated above, it is incorrect to claim that the Unicode Sinhala encoding is incomplete.

You may also be interested in the UTF-8 encoding format which allocates a set of characters to be encoded in 8 bit, a larger set of characters to be encoded in 16 bits and an even larger set of characters to be encoded in 24 bits. Please note the use of dead high bits, I suspect it will become relevant later in the discussion: http://en.wikipedia.org/wiki/UTF-8

Regards,
Harshula Received on Sun Dec 05 19:48:57 2004

This archive was generated by hypermail 2.1.8 : Wed Dec 08 2004 - 17:56:45 LKT