lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a91b1b3-a8b8-4040-add6-857c8207b97c@kernel.org>
Date: Tue, 6 May 2025 08:33:16 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Nicolas Pitre <nico@...xnic.net>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Nicolas Pitre <npitre@...libre.com>, linux-serial@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create
 ucs_fallback_table.h

On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@...libre.com>
> 
> The generated table maps complex characters to their simpler fallback
> forms for a terminal display when corresponding glyphs are unavailable.
> This includes diacritics, symbols as well as many drawing characters.
> Fallback characters aren't perfect replacements, obviously. But they are
> still far more useful than a bunch of squared question marks.
> 
> Signed-off-by: Nicolas Pitre <npitre@...libre.com>
> ---
>   drivers/tty/vt/gen_ucs_fallback_table.py | 882 +++++++++++++++++++++++
>   1 file changed, 882 insertions(+)
>   create mode 100755 drivers/tty/vt/gen_ucs_fallback_table.py
> 
> diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py b/drivers/tty/vt/gen_ucs_fallback_table.py
> new file mode 100755
> index 000000000000..cb4e75b454fe
> --- /dev/null
> +++ b/drivers/tty/vt/gen_ucs_fallback_table.py
> @@ -0,0 +1,882 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Leverage Python's unicodedata module to generate ucs_fallback_table.h
> +#
> +# The generated table maps complex characters to their simpler fallback forms
> +# for a terminal display when corresponding glyphs are unavailable.
> +#
> +# Usage:
> +#   python3 gen_ucs_fallback_table.py         # Generate fallback tables
> +#   python3 gen_ucs_fallback_table.py -o FILE # Specify output file
> +
> +import unicodedata
> +import sys
> +import argparse
> +from collections import defaultdict
> +
> +# This script's file name
> +from pathlib import Path
> +this_file = Path(__file__).name
> +
> +# Default output file name
> +DEFAULT_OUT_FILE = "ucs_fallback_table.h"
> +
> +def collect_accented_latin_letters():
> +    """Collect already composed Latin letters with diacritics."""
> +    fallback_map = {}
> +
> +    # Latin-1 Supplement (0x00C0-0x00FF)
> +    # Capital letters with accents to their base forms
> +    fallback_map[0x00C0] = ord('A')  # À LATIN CAPITAL LETTER A WITH GRAVE
> +    fallback_map[0x00C1] = ord('A')  # Á LATIN CAPITAL LETTER A WITH ACUTE
> +    fallback_map[0x00C2] = ord('A')  # Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
> +    fallback_map[0x00C3] = ord('A')  # Ã LATIN CAPITAL LETTER A WITH TILDE
> +    fallback_map[0x00C4] = ord('A')  # Ä LATIN CAPITAL LETTER A WITH DIAERESIS
> +    fallback_map[0x00C5] = ord('A')  # Å LATIN CAPITAL LETTER A WITH RING ABOVE
> +    fallback_map[0x00C7] = ord('C')  # Ç LATIN CAPITAL LETTER C WITH CEDILLA
> +    fallback_map[0x00C8] = ord('E')  # È LATIN CAPITAL LETTER E WITH GRAVE
> +    fallback_map[0x00C9] = ord('E')  # É LATIN CAPITAL LETTER E WITH ACUTE
> +    fallback_map[0x00CA] = ord('E')  # Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX
> +    fallback_map[0x00CB] = ord('E')  # Ë LATIN CAPITAL LETTER E WITH DIAERESIS
> +    fallback_map[0x00CC] = ord('I')  # Ì LATIN CAPITAL LETTER I WITH GRAVE
> +    fallback_map[0x00CD] = ord('I')  # Í LATIN CAPITAL LETTER I WITH ACUTE
> +    fallback_map[0x00CE] = ord('I')  # Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX
> +    fallback_map[0x00CF] = ord('I')  # Ï LATIN CAPITAL LETTER I WITH DIAERESIS
> +    fallback_map[0x00D1] = ord('N')  # Ñ LATIN CAPITAL LETTER N WITH TILDE
> +    fallback_map[0x00D2] = ord('O')  # Ò LATIN CAPITAL LETTER O WITH GRAVE
> +    fallback_map[0x00D3] = ord('O')  # Ó LATIN CAPITAL LETTER O WITH ACUTE
> +    fallback_map[0x00D4] = ord('O')  # Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
> +    fallback_map[0x00D5] = ord('O')  # Õ LATIN CAPITAL LETTER O WITH TILDE
> +    fallback_map[0x00D6] = ord('O')  # Ö LATIN CAPITAL LETTER O WITH DIAERESIS
> +    fallback_map[0x00D9] = ord('U')  # Ù LATIN CAPITAL LETTER U WITH GRAVE
> +    fallback_map[0x00DA] = ord('U')  # Ú LATIN CAPITAL LETTER U WITH ACUTE
> +    fallback_map[0x00DB] = ord('U')  # Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
> +    fallback_map[0x00DC] = ord('U')  # Ü LATIN CAPITAL LETTER U WITH DIAERESIS
> +    fallback_map[0x00DD] = ord('Y')  # Ý LATIN CAPITAL LETTER Y WITH ACUTE


So you are in fact doing iconv's utf-8 -> ascii//translit conversion. 
Does python not have an iconv lib?

 > perl -e 'use Text::Iconv; print Text::Iconv->new("UTF8", 
"ASCII//TRANSLIT")->convert("áąà"), "\n";'
aaa

/me digging

Ah, unidecode:
 > python3 -c 'from unidecode import unidecode; print(unidecode("áąà"))'
aaa

Perhaps use that instead of manual table?

-- 
js
suse labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ