linux-kernel - Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210512102901.3de0fdb7@coco.lan>
Date:   Wed, 12 May 2021 10:29:01 +0200
From:   Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To:     David Gow <davidgow@...gle.com>
Cc:     Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>,
        Daniel Latypov <dlatypov@...gle.com>,
        Marco Elver <elver@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid
 using UTF-8 chars

Em Tue, 11 May 2021 07:35:29 +0800
David Gow <davidgow@...gle.com> escreveu:

> On Mon, May 10, 2021 at 6:27 PM Mauro Carvalho Chehab
> <mchehab+huawei@...nel.org> wrote:
> >
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> >         - U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
> > ---  
> 
> Oh dear, I do have a habit of overusing em-dashes. I've no problem in
> theory with exchanging them for an ASCII approximation.
> I suppose there's a reason it's the one dash to rule them all: :-)
> https://twitter.com/FakeUnicode/status/727888721312260096/photo/1
> 
> >  Documentation/dev-tools/testing-overview.rst | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > index b5b46709969c..8adffc26a2ec 100644
> > --- a/Documentation/dev-tools/testing-overview.rst
> > +++ b/Documentation/dev-tools/testing-overview.rst
> > @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> >  groups of tests easier, as well as providing helpers to aid in writing new
> >  tests.
> >
> > -If you're looking to verify the behaviour of the Kernel — particularly specific
> > -parts of the kernel — then you'll want to use KUnit or kselftest.
> > +If you're looking to verify the behaviour of the Kernel - particularly specific
> > +parts of the kernel - then you'll want to use KUnit or kselftest.  
> 
> As Marco pointed out, having multiple HYPHEN-MINUS symbols in a row is
> probably a better replacement, as it does distinguish the em-dash from
> smaller dashes better.
> However, I need three for sphinx to output an em-dash here (2 hyphens
> only gives me an en-dash).
> 
> So, if we want to get rid of the UTF-8 em-dash, my preferences would
> be (in descending order):
> 1. Three hyphens: '---' (sphinx generates an em-dash)
> 2. Two hyphens: '--' (worst case, an en-dash surrounded by spaces --
> as sphinx generates for me -- is still readable, and it's still
> readable as an em-dash in plain text)
> 3. One hyphen as in this patch (which I don't like as much, but will
> no doubt learn to live with)
> 
> But it looks like you've got several similar comments on other patches
> in this series, so I'm happy for you to use whatever ends up being
> agreed upon generally.

Yeah, from the comments I received so far, it seems that most developers
want to use '---' for EM DASH and '--' for EN DASH, typing it as ASCII
instead of using U+<number> as this is easier on most editors.

Yet, my understanding is that we don't have a consensus with that
regards, as some patches I sent using a single hyphen were 
accepted/reviewed/acked.

So, I sent (and it was already applied) a small patch series (/5)
fixing the cases where UTF-8 chars (including DASH) were added
by mistake (probably due to some conversion tool). 

For the remaining issues, my plan is to split this series in two
parts:

The first one with non-polemic UTF-8 changes, and a second one with
just EM/EN DASH, using '---' to replace EM DASH and '--' to replace
EN DASH, as this way, the produced HTML/LaTeX/PDF docs won't change.

This should make easier to discuss the EM/EN DASH changes on
each patch, and see if the above default is the better fit for a
particular usecase.

Thanks,
Mauro