lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <vedoha3v6rf3zccoyvyh67bvqf7sjlezc6jm7kncvmcpoqdkzj@jp722nkrfei2>
Date: Wed, 20 Aug 2025 17:48:25 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>, 
	Linux Doc Mailing List <linux-doc@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] docs: kfigure.py: don't crash during read/write

On Wed, Aug 20, 2025 at 06:42:29AM -0600, Jonathan Corbet wrote:
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > By default, Python does a very bad job when reading/writing
> > from files, as it tries to enforce that the character is < 128.
> > Nothing prevents a SVG file to contain, for instance, a comment
> > with an utf-8 accented copyright notice - or even an utf-8
> > invalid char.
> 
> Do you have a locale that expects everything to be ASCII?  This seems a
> bit weird.  I would expect utf8 to work by default these days.
> 
> > While testing PDF and html builds, I recently faced one build
> > that got an error at kfigure.py saying that a char was > 128,
> > crashing PDF output.
> >
> > To avoid such issues, let's use PEP 383 subrogate escape encoding
> > to prevent read/write errors on such cases.
> 
> Being explicit about utf8 is good...but where are the errors coming
> from?  Is this really a utf8 file?

Unfortunately, I forgot to store a note when I got it the error... 
heh, I almost forgot to also write/submit this one ;-)

Yet, see: kfigure.py reads a .dot or .svg file. both may contain utf-8
characters on strings. For instance, they may have an accent inside a
copyright comment, a greek letter, a math symbol, ...

So, IMO we should change read to work with encoding and have a
fallback like PEP 383. 

Now, I did a git grep treewide at svg and dot files. Currently,
they're all ascii only. 

-

That's said, I guess the error I got was during write. This script
tries to write in "w" mode, instead of "wb" (it came from python 2.7
times, where Python were following the typical standards for write
in Linux). 

Anyway, let's not apply this one for now. It will require extra
changes.

I'll return to this when I have some time.

-- 
Thanks,
Mauro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ