lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250415192212.33949-1-nico@fluxnic.net>
Date: Tue, 15 Apr 2025 15:17:49 -0400
From: Nicolas Pitre <nico@...xnic.net>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>
Cc: Nicolas Pitre <npitre@...libre.com>,
	linux-serial@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH v2 00/13] vt: implement proper Unicode handling

The Linux VT console has many problems with regards to proper Unicode
handling:

- All new double-width Unicode code points which have been introduced since
  Unicode 5.0 are not recognized as such (we're at Unicode 16.0 now).

- Zero-width code points are not recognized at all. If you try to edit files
  containing a lot of emojis, you will see the rendering issues. When there
  are a lot of zero-width characters (like "variation selectors"), long
  lines get wrapped, but any Unicode-aware editor thinks that the content
  was rendered properly and its rendering logic starts to work in very bad
  ways. Combine this with tmux or screen, and there is a huge mess going on
  in the terminal.

- Also, text which uses combining diacritics has the same effect as text
  with zero-width characters as programs expect the characters to take fewer
  columns than what they actually do.

Some may argue that the Linux VT console is unmaintained and/or not used
much any longer and that one should consider a user space terminal
alternative instead. But every such alternative that is not less maintained
than the Linux VT console does require a full heavy graphical environment
and that is the exact antithesis of what the Linux console is meant to be.

Furthermore, there is a significant Linux console user base represented by
blind users (which I'm a member of) for whom the alternatives are way more
cumbersome to use reducing our productivity. So it has to stay and
be maintained to the best of our abilities.

That being said...

This patch series is about fixing all the above issues. This is accomplished
with some Python scripts leveraging Python's unicodedata module to generate
C code with lookup tables that is suitable for the kernel. In summary:

- The double-width code point table is updated to the latest Unicode version
  and the table itself is optimized to reduce its size.

- A zero-width code point table is created and the console code is modified
  to properly use it.

- A table with base character + combining mark pairs is created to convert
  them into their precomposed equivalents when they're encountered.
  By default the generated table contains most commonly used Latin, Greek,
  and Cyrillic recomposition pairs only, but one can execute the provided
  script with the --full argument to create a table that covers all
  possibilities. Combining marks that are not listed in the table are simply
  treated like zero-width code points and properly ignored.

- All those tables plus related lookup code require about 3500 additional
  bytes of text which is not very significant these days. Yet, one
  can still set CONFIG_CONSOLE_TRANSLATIONS=n to configure this all out
  if need be.

Note: The generated C code makes scripts/checkpatch.pl complain about
      "... exceeds 100 columns" because the inserted comments with code
      point names, well, make some inlines exceed 100 columns. Please make
      an exception for those files and disregard those warnings. When
      checkpatch.pl is used on those files directly with -f then it doesn't
      complain.

This series was tested on top of v6.15-rc2.

Changes from v1 (https://lkml.org/lkml/2025/4/9/1952):

- Moved much of the C functions out of the Python generator, leaving only
  lookup tables to C code generation

- Cleaned up the Python code

- Unicode processing in vt.c moved to a function of its own

- Folded bug fixes into the series, fixed style, typos, etc.

Thanks to Jiri Slaby for the review.

diffstat:
 drivers/tty/vt/Makefile                   |   3 +-
 drivers/tty/vt/consolemap.c               |   2 -
 drivers/tty/vt/gen_ucs_recompose_table.py | 255 +++++++++++++
 drivers/tty/vt/gen_ucs_width_table.py     | 299 ++++++++++++++++
 drivers/tty/vt/ucs.c                      | 156 ++++++++
 drivers/tty/vt/ucs_recompose_table.h      | 102 ++++++
 drivers/tty/vt/ucs_width_table.h          | 453 ++++++++++++++++++++++++
 drivers/tty/vt/vt.c                       | 138 +++++---
 include/linux/consolemap.h                |  18 +
 9 files changed, 1376 insertions(+), 50 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ