lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 May 2021 10:44:16 +0200
From:   Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To:     Gabriel Krisman Bertazi <krisman@...labora.com>
Cc:     Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        "Daniel W. S. Almeida" <dwlsalmeida@...il.com>,
        "Jonathan Corbet" <corbet@....net>, Arnd Bergmann <arnd@...db.de>,
        Borislav Petkov <bp@...en8.de>,
        David Howells <dhowells@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        James Morse <james.morse@....com>,
        Kees Cook <keescook@...omium.org>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Robert Richter <rric@...nel.org>,
        Thorsten Leemhuis <linux@...mhuis.info>,
        Tony Luck <tony.luck@...el.com>, keyrings@...r.kernel.org,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 06/53] docs: admin-guide: avoid using UTF-8 chars

Em Mon, 10 May 2021 14:40:09 -0400
Gabriel Krisman Bertazi <krisman@...labora.com> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > 	- U+00a0 (' '): NO-BREAK SPACE
> > 	- U+2013 ('–'): EN DASH
> > 	- U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
> > ---
> >  Documentation/admin-guide/index.rst           |  2 +-
> >  Documentation/admin-guide/module-signing.rst  |  4 +-
> >  Documentation/admin-guide/ras.rst             | 94 +++++++++----------
> >  .../admin-guide/reporting-issues.rst          | 12 +--
> >  4 files changed, 56 insertions(+), 56 deletions(-)  
> 
> Hi Mauro,
> 
> This patch misses one occurrence of U+2014 in
> Documentation/admin-guide/sysctl/kernel.rst:1288.

It ended to be on a separate patch.

> There are also countless occurrences in Documentation/, outside of
> Documentation/admin-guide.  I suppose another patch in the series, which
> I didn't receive, will fix them?

Yes. This series should fix all occurrences inside Documentation/ on
*.rst files and on ABI, except for Documentation/translations[1].

[1] Still it probably makes sense to do a subset of the changes
from this series there, but touching non-Latin translations are riskier.

> These characters will just reappear elsewhere, eventually. I'm not sure
> what is the gain here, other than minor consistence improvements.

The main point here is that a large amount of those UTF-8 characters
appeared as result of document conversion from DocBook/LaTeX/Markdown.

As the conversion ended, I don't expect the need of re-doing a series
like that in the near future.

There are even some cases where the UTF-8 were doing wrong things, like
using an EN DASH instead of an hyphen in order to pass a command line
parameter, and the addition of non-printable BOM characters.

So, IMO, this is a necessarily cleanup after the conversion.

> But we
> should add a Warning during documentation generation (if there isn't one
> already), to prevent them from spreading again.

Not sure if it is worth... See: people can (and should) use UTF-8
characters when needed, like for instance using Latin accented 
characters on names and translations, and use Greek letters when
pertinent, like using MICRO SIGN or GREEK SMALL LETTER MU to
represent microsseconds.

On the other hand, using curly commas instead of ASCII ones and
dashes instead of -- and --- only makes harder for people to type
documents with normal editors without any gain, as Sphinx already
convert those into curly commas and EN/EM DASH when it generates 
html/pdf docs.


Thanks,
Mauro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ