linux-kernel - Re: [PATCH v3 00/14] vt: implement proper Unicode handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2025042517-defacing-lushly-10d5@gregkh>
Date: Fri, 25 Apr 2025 16:29:52 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Nicolas Pitre <nico@...xnic.net>
Cc: Jiri Slaby <jirislaby@...nel.org>, Nicolas Pitre <npitre@...libre.com>,
	linux-serial@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 00/14] vt: implement proper Unicode handling

On Thu, Apr 17, 2025 at 02:45:02PM -0400, Nicolas Pitre wrote:
> The Linux VT console has many problems with regards to proper Unicode
> handling:
> 
> - All new double-width Unicode code points which have been introduced since
>   Unicode 5.0 are not recognized as such (we're at Unicode 16.0 now).
> 
> - Zero-width code points are not recognized at all. If you try to edit files
>   containing a lot of emojis, you will see the rendering issues. When there
>   are a lot of zero-width characters (like "variation selectors"), long
>   lines get wrapped, but any Unicode-aware editor thinks that the content
>   was rendered properly and its rendering logic starts to work in very bad
>   ways. Combine this with tmux or screen, and there is a huge mess going on
>   in the terminal.
> 
> - Also, text which uses combining diacritics has the same effect as text
>   with zero-width characters as programs expect the characters to take fewer
>   columns than what they actually do.
> 
> Some may argue that the Linux VT console is unmaintained and/or not used
> much any longer and that one should consider a user space terminal
> alternative instead. But every such alternative that is not less maintained
> than the Linux VT console does require a full heavy graphical environment
> and that is the exact antithesis of what the Linux console is meant to be.
> 
> Furthermore, there is a significant Linux console user base represented by
> blind users (which I'm a member of) for whom the alternatives are way more
> cumbersome to use reducing our productivity. So it has to stay and
> be maintained to the best of our abilities.
> 
> That being said...
> 
> This patch series is about fixing all the above issues. This is accomplished
> with some Python scripts leveraging Python's unicodedata module to generate
> C code with lookup tables that is suitable for the kernel. In summary:
> 
> - The double-width code point table is updated to the latest Unicode version
>   and the table itself is optimized to reduce its size.
> 
> - A zero-width code point table is created and the console code is modified
>   to properly use it.
> 
> - A table with base character + combining mark pairs is created to convert
>   them into their precomposed equivalents when they're encountered.
>   By default the generated table contains most commonly used Latin, Greek,
>   and Cyrillic recomposition pairs only, but one can execute the provided
>   script with the --full argument to create a table that covers all
>   possibilities. Combining marks that are not listed in the table are simply
>   treated like zero-width code points and properly ignored.
> 
> - All those tables plus related lookup code require about 3500 additional
>   bytes of text which is not very significant these days. Yet, one
>   can still set CONFIG_CONSOLE_TRANSLATIONS=n to configure this all out
>   if need be.
> 
> Note: The generated C code makes scripts/checkpatch.pl complain about
>       "... exceeds 100 columns" because the inserted comments with code
>       point names, well, make some inlines exceed 100 columns. Please make
>       an exception for those files and disregard those warnings. When
>       checkpatch.pl is used on those files directly with -f then it doesn't
>       complain.
> 
> This series was tested on top of v6.15-rc2.

I've taken the first version of this, should I revert all of them and
then apply these, or do you want to send a diff between this and what is
in the tty-next tree?

thanks,

greg k-h