[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a91b1b3-a8b8-4040-add6-857c8207b97c@kernel.org>
Date: Tue, 6 May 2025 08:33:16 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Nicolas Pitre <nico@...xnic.net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Nicolas Pitre <npitre@...libre.com>, linux-serial@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create
ucs_fallback_table.h
On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@...libre.com>
>
> The generated table maps complex characters to their simpler fallback
> forms for a terminal display when corresponding glyphs are unavailable.
> This includes diacritics, symbols as well as many drawing characters.
> Fallback characters aren't perfect replacements, obviously. But they are
> still far more useful than a bunch of squared question marks.
>
> Signed-off-by: Nicolas Pitre <npitre@...libre.com>
> ---
> drivers/tty/vt/gen_ucs_fallback_table.py | 882 +++++++++++++++++++++++
> 1 file changed, 882 insertions(+)
> create mode 100755 drivers/tty/vt/gen_ucs_fallback_table.py
>
> diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py b/drivers/tty/vt/gen_ucs_fallback_table.py
> new file mode 100755
> index 000000000000..cb4e75b454fe
> --- /dev/null
> +++ b/drivers/tty/vt/gen_ucs_fallback_table.py
> @@ -0,0 +1,882 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Leverage Python's unicodedata module to generate ucs_fallback_table.h
> +#
> +# The generated table maps complex characters to their simpler fallback forms
> +# for a terminal display when corresponding glyphs are unavailable.
> +#
> +# Usage:
> +# python3 gen_ucs_fallback_table.py # Generate fallback tables
> +# python3 gen_ucs_fallback_table.py -o FILE # Specify output file
> +
> +import unicodedata
> +import sys
> +import argparse
> +from collections import defaultdict
> +
> +# This script's file name
> +from pathlib import Path
> +this_file = Path(__file__).name
> +
> +# Default output file name
> +DEFAULT_OUT_FILE = "ucs_fallback_table.h"
> +
> +def collect_accented_latin_letters():
> + """Collect already composed Latin letters with diacritics."""
> + fallback_map = {}
> +
> + # Latin-1 Supplement (0x00C0-0x00FF)
> + # Capital letters with accents to their base forms
> + fallback_map[0x00C0] = ord('A') # À LATIN CAPITAL LETTER A WITH GRAVE
> + fallback_map[0x00C1] = ord('A') # Á LATIN CAPITAL LETTER A WITH ACUTE
> + fallback_map[0x00C2] = ord('A') # Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
> + fallback_map[0x00C3] = ord('A') # Ã LATIN CAPITAL LETTER A WITH TILDE
> + fallback_map[0x00C4] = ord('A') # Ä LATIN CAPITAL LETTER A WITH DIAERESIS
> + fallback_map[0x00C5] = ord('A') # Å LATIN CAPITAL LETTER A WITH RING ABOVE
> + fallback_map[0x00C7] = ord('C') # Ç LATIN CAPITAL LETTER C WITH CEDILLA
> + fallback_map[0x00C8] = ord('E') # È LATIN CAPITAL LETTER E WITH GRAVE
> + fallback_map[0x00C9] = ord('E') # É LATIN CAPITAL LETTER E WITH ACUTE
> + fallback_map[0x00CA] = ord('E') # Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX
> + fallback_map[0x00CB] = ord('E') # Ë LATIN CAPITAL LETTER E WITH DIAERESIS
> + fallback_map[0x00CC] = ord('I') # Ì LATIN CAPITAL LETTER I WITH GRAVE
> + fallback_map[0x00CD] = ord('I') # Í LATIN CAPITAL LETTER I WITH ACUTE
> + fallback_map[0x00CE] = ord('I') # Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX
> + fallback_map[0x00CF] = ord('I') # Ï LATIN CAPITAL LETTER I WITH DIAERESIS
> + fallback_map[0x00D1] = ord('N') # Ñ LATIN CAPITAL LETTER N WITH TILDE
> + fallback_map[0x00D2] = ord('O') # Ò LATIN CAPITAL LETTER O WITH GRAVE
> + fallback_map[0x00D3] = ord('O') # Ó LATIN CAPITAL LETTER O WITH ACUTE
> + fallback_map[0x00D4] = ord('O') # Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
> + fallback_map[0x00D5] = ord('O') # Õ LATIN CAPITAL LETTER O WITH TILDE
> + fallback_map[0x00D6] = ord('O') # Ö LATIN CAPITAL LETTER O WITH DIAERESIS
> + fallback_map[0x00D9] = ord('U') # Ù LATIN CAPITAL LETTER U WITH GRAVE
> + fallback_map[0x00DA] = ord('U') # Ú LATIN CAPITAL LETTER U WITH ACUTE
> + fallback_map[0x00DB] = ord('U') # Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
> + fallback_map[0x00DC] = ord('U') # Ü LATIN CAPITAL LETTER U WITH DIAERESIS
> + fallback_map[0x00DD] = ord('Y') # Ý LATIN CAPITAL LETTER Y WITH ACUTE
So you are in fact doing iconv's utf-8 -> ascii//translit conversion.
Does python not have an iconv lib?
> perl -e 'use Text::Iconv; print Text::Iconv->new("UTF8",
"ASCII//TRANSLIT")->convert("áąà"), "\n";'
aaa
/me digging
Ah, unidecode:
> python3 -c 'from unidecode import unidecode; print(unidecode("áąà"))'
aaa
Perhaps use that instead of manual table?
--
js
suse labs
Powered by blists - more mailing lists