[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47F2D590.2040300@zytor.com>
Date: Tue, 01 Apr 2008 17:38:40 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: David Newall <davidn@...idnewall.com>
CC: Jan Engelhardt <jengelh@...putergmbh.de>,
"John T." <j.thomast@...oo.com>, linux-kernel@...r.kernel.org
Subject: Re: UTF-8 and Alt key in the console
David Newall wrote:
> Jan Engelhardt wrote:
>> Hence the proposal of using definite start and end markers:
>>
>> echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat
>
> I see no merit in the idea. Most seriously, there isn't any real-world
> problem being solved. In addition, it proposes creating yet another
> type of terminal emulation. If there's something you don't like about
> VT escape codes, use a different emulation. For example, Televideo
> terminals used almost exclusively single-character control codes,
> reducing the scope of being mid-sequence to, well much closer to zero.
>
> You need to make quite clear that your proposal is to discontinue use of
> VT terminal emulation.
Okay, let's put this to rest once and for all:
*** ISO 6429 sequences are self-terminating. ***
No, you can't tell you're inside one if you miss the leading CSI, but as
has been pointed out, there really isn't a huge case for it.
The standard is available for free under the name ECMA-48:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf
It references ISO 2022, a.k.a. ECMA-35:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-035.pdf
These standards use a decimalized hexadecimal notation, so if you see
"05/10" it means 0x5a. A "column" refers to a 16-character set, so
"column 4" refers to bytes 0x40 to 0x4f.
The structure defined in section 5.4 of ISO 6429/ECMA-48:
-----------
5.4 Control sequences
A control sequence is a string of bit combinations starting with the
control function CONTROL SEQUENCE INTRODUCER (CSI) followed by one or
more bit combinations representing parameters, if any, and by one or
more bit combinations identifying the control function. The control
function CSI itself is an element of the C1 set.
The format of a control sequence is
CSI P ... P I ... I F
where
a) CSI is represented by bit combinations 01/11 (representing ESC) and
05/11 in a 7-bit code or by bit combination 09/11 in an 8-bit code, see 5.3;
b) P ... P are Parameter Bytes, which, if present, consist of bit
combinations from 03/00 to 03/15;
c) I ... I are Intermediate Bytes, which, if present, consist of bit
combinations from 02/00 to 02/15. Together with the Final Byte F, they
identify the control function;
NOTE The number of Intermediate Bytes is not limited by this Standard;
in practice, one Intermediate Byte will be sufficient since with sixteen
different bit combinations available for the Intermediate Byte over one
thousand control functions may be identified.
d) F is the Final Byte; it consists of a bit combination from 04/00 to
07/14; it terminates the control sequence and together with the
Intermediate Bytes, if present, identifies the control function. Bit
combinations 07/00 to 07/14 are available as Final Bytes of control
sequences for private (or experimental) use.
-----------
Note: DEC added nonstandard control sequences initiated with SS3 (ESC O)
as well as CSI (ESC [); otherwise they use the same format.
The Final Byte is easy enough to spot, as writing a generic parser which
can pick this apart, including parameter handling.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists