[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.1.10.0804012200340.6604@fbirervta.pbzchgretzou.qr>
Date: Tue, 1 Apr 2008 22:13:50 +0200 (CEST)
From: Jan Engelhardt <jengelh@...putergmbh.de>
To: "H. Peter Anvin" <hpa@...or.com>
cc: David Newall <davidn@...idnewall.com>,
"John T." <j.thomast@...oo.com>, linux-kernel@...r.kernel.org
Subject: Re: UTF-8 and Alt key in the console
On Saturday 2008-03-29 18:05, H. Peter Anvin wrote:
> David Newall wrote:
>> Jan Engelhardt wrote:
>> > What do you mean by self-terminating? There is no easy
>> > synchronization like in UTF-8, given you are anywhere inside
>> > a text stream, how do you know (a) you are already in an
>> > escape sequence and (b) how to figure out the rebegin of
>> > normal text.
>>
>> It's not very useful being able to tell you are inside a escape sequence
>> unless you see that sequence from the start. You do need the complete
>> sequence to make sense of it.
>
> I think what Jan is alluding to is the property of UTF-8 text that you can
> start in the middle of a string and either skip an incomplete character or
> find the beginning of it. If you can search backwards, you can find the
> beginning of an escape sequence, too; the "skip incomplete" functionality is
> missing, though, but as you say, isn't actually all that useful in real life
> *for the applications which use these kinds of escape sequences.*
No backwards searching, just forwards.
In UTF-8 this is simple. You know you are in a character when the highest
two bits are 10, and you can skip bytes until the start of the next
character, whose highest bits are either 00 or 11.
With the VTxxx escape codes, this is hardly possible. Given a broken
code of ^[43m,
echo -e '\x1B[43m wonderful \x1B[0m' | cosmicrays | cat
3m wonderful ^[[0m
There is no way to check whether you are in the escape code. And there
is no way to find its end. If a heuristic were to be used (which is
certainly a possibility), you would end up killing text up until the
next ^[.
Hence the proposal of using definite start and end markers:
echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat
3m^] wonderful ^[0m^]
Ok, finding out whether we are in an escape code is not as easy as with
UTF-8 (the latter of which looks at the current character only), but
still very viable.
Prerequisite to this simple model is that the user does not use an
overly long dumb escape sequence like ^[[43;43;43;43;43;43m, i.e.
that the end marker is in the buffer if we really are in an escape
sequence:
static bool in_an_escape_seq(const char *buf)
{
const char *e = strchr(buf, 0x1D);
return e != NULL && e < strchr(buf, 0x1B);
}
If so, skipping parts of a faulty write() is easy:
static const char *get_out_of_esc(const char *buf)
{
if (in_an_escape_seq(buf))
return strchr(buf, 0x1D) + 1;
else
return buf;
}
--
make boldconfig -- to boldly select what no one has selected before
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists