[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241102103937.ose4y72a7yl3dcmz@devuan>
Date: Sat, 2 Nov 2024 11:39:37 +0100
From: Alejandro Colomar <alx@...nel.org>
To: "G. Branden Robinson" <g.branden.robinson@...il.com>
Cc: Ian Rogers <irogers@...gle.com>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
Jonathan Corbet <corbet@....net>, dri-devel@...ts.freedesktop.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-man@...r.kernel.org, cjwatson@...ian.org, groff@....org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the
page
Hi Branden,
On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
>
> Hi Alex,
>
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now. That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > >
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> >
> > That change might be controversial.
>
> Then let those with objections step forward and make them!
Sure! But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit. :)
I'm not opposed to the change. Only cautious.
>
> (I may be one of them; see below.)
>
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
>
> Not _all_ software, surely. Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world. Most of those have withered on the vine.
Ahh, yeah, I committed the same mistake I criticise in others every now
and then. $all does not really mean "all". (-Wall, `make all`, ...)
I meant all [of which I care], which is basically groff(1) and
mandoc(1). :)
> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use. Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.
Yup.
>
> Here's a sample input.
>
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
>
> Starting with formatters, let's see how they do.
>
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024‐11‐02 proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
>
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
>
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
>
> proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> Page 1 (printed 11/2/2024)
>
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
>
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program. But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
>
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
>
> Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
>
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while. Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
>
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output. (This is
> similar to an m4 feature known as the "black hole diversion".)
Sounds good. And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?
> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself. It's not clear
> to me why it wasn't done back in the 1980s.
Not enough energy of activation, probably, as with most stuff.
> lexgrog(1) itself will of course have to stay around for years to come,
You can make it a wrapper around groff(1) with flags, no?
> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
>
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
>
> So you could do things like:
>
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
I certainly use this.
# man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS,
# ...) of all manual pages in a directory (or in a single manual page file).
# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';
man_section()
{
if [ $# -lt 2 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
return $EX_USAGE;
fi
local page="$1";
shift;
local sect="$*";
find "$page" -type f \
|xargs wc -l \
|grep -v -e '\b1 ' -e '\btotal\b' \
|awk '{ print $2 }' \
|sort \
|while read -r manpage; do
(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
for s in $sect; do
<"$manpage" \
sed -n \
-e "/^\.SH $s/p" \
-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
done;) \
|mandoc -Tutf8 2>/dev/null \
|col -pbx;
done;
}
# man_lsfunc() prints the name of all C functions declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsfunc man2;
man_lsfunc()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
|grep '^[0-9]' \
|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
|uniq;
}
# man_lsvar() prints the name of all C variables declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsvar man3;
man_lsvar()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
|pcregrep -Mn \
-e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
-e '^ +extern [\w ]+ \**[\w ]+; *$' \
|grep '^[0-9]' \
|grep -v 'typedef' \
|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
|sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
|uniq;
}
Even grepc(1) derived from those scripts.
>
> and:
>
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8
While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.
>
> ...does this sound appetizing to anyone?
Certainly.
> > Also, many other pages might need to be changed accordingly for
> > consistency.
>
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting. I'm sorry for prompting churn, Ian.
>
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1). And it's not just for reviewing diffs, but also for writing
> > them. Semantic newlines reduce the amount of work for producing the
> > diffs.
>
> It's a real win for diffs.
And diffs are a real win for text. Thus, semantic newlines are a real
win for text. "Write poems, not prose." (Any chance we may get that
warning added to groff(1)? :D)
Cheers,
Alex
>
> Here's a very recent example from groff.
>
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
> typeface,
> color,
> special character or character class,
> +hyphenation language code,
> environment,
> or stream.
> .
>
>
> (So recent that in fact I haven't pushed that yet.)
>
> Lists like the foregoing are common in man pages.
>
> Regards,
> Branden
>
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
>
> https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
>
> You can do stricter string comparisons in GNU troff. And I've
> thought of some syntactic sugar for performing them that wouldn't
> break backward compatibility.
>
> [4] To really land the feature, we need automatic tag generation from
> input text (we don't want to make the man page author construct
> their own tags). Another reason we want the construction to be
> automatic is to make the tags unique when multiple man pages are
> formatted in one run, as one might do when making a book of man
> pages. Automatic tagging will also enable the slaying of two other
> ancient dragons.
>
> 1. deep internal links for PDF bookmarks
> 2. pod2man's `IX`-happy output; the widespread use of this
> nonstandard macro confuses way too many novice page authors, and
> bloats document size.
>
> Another feature we'll really want to do this right is improved string
> processing facilities. That, too, is something that will pay
> dividends in several areas. With a proper string iterator in the
> formatter (and a couple more conditional operators),[5] it will be
> possible to write a string library as a macro file, slimming down the
> formatter itself a little and making macro writers' lives easier.
> We're only two days into the month and this has already come up on
> the groff list.
>
> https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
>
> [5] https://savannah.gnu.org/bugs/?62264
--
<https://www.alejandro-colomar.es/>
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists