lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241102100837.anfonowxfx4ekn3d@illithid>
Date: Sat, 2 Nov 2024 05:08:37 -0500
From: "G. Branden Robinson" <g.branden.robinson@...il.com>
To: Alejandro Colomar <alx@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, David Airlie <airlied@...il.com>,
	Simona Vetter <simona@...ll.ch>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	Jonathan Corbet <corbet@....net>, dri-devel@...ts.freedesktop.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-man@...r.kernel.org, cjwatson@...ian.org, groff@....org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the
 page

[adding Colin Watson to CC; and the groff list because I started musing]

Hi Alex,

At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > >
> > > I wouldn't add formatting here for now.  That's something I prefer
> > > to be cautious about, and if we do it, we should do it in a
> > > separate commit.
> > 
> > I'll move it to a separate patch. Is the caution due to a lack of
> > test infrastructure? That could be something to get resolved,
> > perhaps through Google summer-of-code and the like.
> 
> That change might be controversial.

Then let those with objections step forward and make them!

(I may be one of them; see below.)

> We'd first need to check that all software that reads the NAME section
> would behave well for this.

Not _all_ software, surely.  Anybody can write a craptastic man(7)
scraper, and several have, mainly back when Web 1.0 was going to eat the
world.  Most of those have withered on the vine.

This is the _Linux_ man-pages project, so what matters are (1) man page
formatters and (2) man page indexers that GNU/Linux systems actually
use.  Where people get nervous with the "NAME" section is because of the
indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
hasn't earned the name.

Here's a sample input.

$ cat /tmp/proc_pid_fdinfo_mini.5
.TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
.SH Name
.IR /proc/ pid /fdinfo " \- information about file descriptors"
.SH Description
Text text text text.

Starting with formatters, let's see how they do.

$ nroff -man /tmp/proc_pid_fdinfo_mini.5
proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)

Name
       /proc/pid/fdinfo - information about file descriptors

Description
       Text text text text.

example                           2024‐11‐02           proc_pid_fdinfo_mini(5)
$ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)

Name
       /proc/pid/fdinfo - information about file descriptors

Description
       Text text text text.

example                           2024-11-02           proc_pid_fdinfo_mini(5)
$ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)



Name
       /proc/pid/fdinfo - information about file descriptors

Description
       Text text text text.



example                           2024-11-02           proc_pid_fdinfo_mini(5)
$ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul

       proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)

       Name
            /proc/pid/fdinfo - information about file descriptors

       Description
            Text text text text.

       Page 1                                        (printed 11/2/2024)

I leave the execution of these to perceive the correct font style
changes as an exercise for the reader, but they all get the
"/proc/pid/fdinfo" line right.

On GNU/Linux systems, the only man page indexer I know of is Colin
Watson's man-db--specifically, its mandb(8) program.  But it's nicely
designed so that the "topic and summary description extraction" task is
delegated to a standalone tool, lexgrog(1), and we can use that.

$ lexgrog /tmp/proc_pid_fdinfo_mini.5
/tmp/proc_pid_fdinfo_mini.5: parse failed

Oh, damn.  I wasn't expecting that.  Maybe this is what defeats Michael
Kerrisk's scraper with respect to groff's man pages.[1]

Well, I can find a silver lining here, because it gives me an even
better reason than I had to pitch an idea I've been kicking around for a
while.  Why not enhance groff man(7) to support a mode where _it_ will
spit out the "Name"/"NAME" section, and only that, _for_ you?

This would be as easy as checking for an option, say '-d EXTRACT=Name',
and having the package's "TH" and "SH" macro definitions divert
(literally, with the `di` request) everything _except_ the section of
interest to a diversion that is then never called/output.  (This is
similar to an m4 feature known as the "black hole diversion".)

All of the features necessary to implement this[2] were part of troff as
far as back as the birth of the man(7) package itself.  It's not clear
to me why it wasn't done back in the 1980s.

lexgrog(1) itself will of course have to stay around for years to come,
but this could take a significant distraction off of Colin's plate--I
believe I have seen him grumble about how much *roff syntax he has to
parse to have the feature be workable, and that's without upstart groff
maintainers exploring up to every boundary that existed even in 1979 and
cheerfully exercising their findings in man pages.

I also of course have ideas for generalizing the feature, so that you
can request any (sub)section by name, and, with a bit more ambition,[4]
paragraph tags (`TP`) too.

So you could do things like:

nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3

and:

nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8

...does this sound appetizing to anyone?

> Also, many other pages might need to be changed accordingly for
> consistency.

I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
has groff(1) do the lifting.  I'm sorry for prompting churn, Ian.

> No, this isn't outdated, since that reduces the quality of the diff.
> Also, I review a lot of patches in the mail client, without running
> git(1).  And it's not just for reviewing diffs, but also for writing
> them.  Semantic newlines reduce the amount of work for producing the
> diffs.

It's a real win for diffs.

Here's a very recent example from groff.

diff --git a/man/groff.7.man b/man/groff.7.man
index 1fb635f2b..1d248b237 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -1281,6 +1281,7 @@ .SH Identifiers
 typeface,
 color,
 special character or character class,
+hyphenation language code,
 environment,
 or stream.
 .


(So recent that in fact I haven't pushed that yet.)

Lists like the foregoing are common in man pages.

Regards,
Branden

[1] https://man7.org/linux/man-pages/dir_by_project.html#groff
[2] String definitions, "string comparisons"[3], and diversions.
[3] strictly, "formatted output comparisons"

    https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html

    You can do stricter string comparisons in GNU troff.  And I've
    thought of some syntactic sugar for performing them that wouldn't
    break backward compatibility.

[4] To really land the feature, we need automatic tag generation from
    input text (we don't want to make the man page author construct
    their own tags).  Another reason we want the construction to be
    automatic is to make the tags unique when multiple man pages are
    formatted in one run, as one might do when making a book of man
    pages.  Automatic tagging will also enable the slaying of two other
    ancient dragons.

    1.  deep internal links for PDF bookmarks
    2.  pod2man's `IX`-happy output; the widespread use of this
        nonstandard macro confuses way too many novice page authors, and
        bloats document size.

   Another feature we'll really want to do this right is improved string
   processing facilities.  That, too, is something that will pay
   dividends in several areas.  With a proper string iterator in the
   formatter (and a couple more conditional operators),[5] it will be
   possible to write a string library as a macro file, slimming down the
   formatter itself a little and making macro writers' lives easier.
   We're only two days into the month and this has already come up on
   the groff list.

   https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html

[5] https://savannah.gnu.org/bugs/?62264

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ