linux-kernel - Re: [PATCH] docs: kfigure.py: don't crash during read/write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <vedoha3v6rf3zccoyvyh67bvqf7sjlezc6jm7kncvmcpoqdkzj@jp722nkrfei2>
Date: Wed, 20 Aug 2025 17:48:25 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>, 
	Linux Doc Mailing List <linux-doc@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] docs: kfigure.py: don't crash during read/write

On Wed, Aug 20, 2025 at 06:42:29AM -0600, Jonathan Corbet wrote:
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > By default, Python does a very bad job when reading/writing
> > from files, as it tries to enforce that the character is < 128.
> > Nothing prevents a SVG file to contain, for instance, a comment
> > with an utf-8 accented copyright notice - or even an utf-8
> > invalid char.
> 
> Do you have a locale that expects everything to be ASCII?  This seems a
> bit weird.  I would expect utf8 to work by default these days.
> 
> > While testing PDF and html builds, I recently faced one build
> > that got an error at kfigure.py saying that a char was > 128,
> > crashing PDF output.
> >
> > To avoid such issues, let's use PEP 383 subrogate escape encoding
> > to prevent read/write errors on such cases.
> 
> Being explicit about utf8 is good...but where are the errors coming
> from?  Is this really a utf8 file?

Unfortunately, I forgot to store a note when I got it the error... 
heh, I almost forgot to also write/submit this one ;-)

Yet, see: kfigure.py reads a .dot or .svg file. both may contain utf-8
characters on strings. For instance, they may have an accent inside a
copyright comment, a greek letter, a math symbol, ...

So, IMO we should change read to work with encoding and have a
fallback like PEP 383. 

Now, I did a git grep treewide at svg and dot files. Currently,
they're all ascii only. 

-

That's said, I guess the error I got was during write. This script
tries to write in "w" mode, instead of "wb" (it came from python 2.7
times, where Python were following the typical standards for write
in Linux). 

Anyway, let's not apply this one for now. It will require extra
changes.

I'll return to this when I have some time.

-- 
Thanks,
Mauro