[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241103005023.kdv5bkpqkpmsom5g@illithid>
Date: Sat, 2 Nov 2024 19:50:23 -0500
From: "G. Branden Robinson" <g.branden.robinson@...il.com>
To: Alejandro Colomar <alx@...nel.org>, Ian Rogers <irogers@...gle.com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
Jonathan Corbet <corbet@....net>, dri-devel@...ts.freedesktop.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-man@...r.kernel.org, groff@....org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the
page
Hi Colin,
At 2024-11-02T19:06:53+0000, Colin Watson wrote:
> How embarrassing. Could somebody please file a bug on
> https://gitlab.com/man-db/man-db/-/issues to remind me to fix that?
Done; <https://gitlab.com/man-db/man-db/-/issues/46>.
> lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if
> you focus on that then you'll end up with a design that's not very
> useful. What really matters is indexing the whole system's manual
> pages, and mandb(8) does not do that by invoking lexgrog(1) one page
> at a time, but rather by running more or less the same code
> in-process.
Ah, I see it now--"lexgrog.l" is in both the Automake macros
"lexgrog_SOURCES" and "mandb_SOURCES". Nice and DRY!
> I already know that getting acceptable performance for
> this requires care, as illustrated by one of the NEWS entries for
> man-db 2.10.0:
>
> * Significantly improve `mandb(8)` and `man -K` performance in the
> common case where pages are of moderate size and compressed using
> `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test
> system.
>
> ... so I'm prepared to bet that forking nroff one page at a time will
> be unacceptably slow.
Probably, but there is little reason to run nroff that way (as of groff
1.23). It already works well, but I have ideas for further hardening
groff's man(7) and mdoc(7) packages such that they return to a
well-defined state when changing input documents.
> (This also combines with the fact that man-db applies some sandboxing
> when it's calling nroff just in case it might happen that a
> moderately-sized C++ project has less than 100% perfect security when
> doing text processing, which I'm sure everyone agrees would never
> happen.)
Inconceivable, yes! But fortunately you can run nroff over N documents
and pay its own startup overhead costs as well as those of sandboxing
only once.
> If it were possible to run nroff over a whole batch of pages and get
> output for each of them in one go, then maaaaybe.
That's already true for formatting the entire page. It's how this was
created.
https://www.gnu.org/software/groff/manual/groff-man-pages.utf8.txt
(...best viewed with "less -R")
With the `-d EXTRACT` feature I have in mind, in its
as-simple-as-possible first-cut form, the problem you anticipate...
> man-db would need a reliable way to associate each line (or sometimes
> multiple lines) of output with each source file,
...would remain. I'll have to think of a good way to write out
"metadata" (the input file name and the arguments to the `TH` request)
as each page is encountered, and of an interface to enable that. I
don't see it happening before groff 1.25.
> and of course care would be needed around error handling and so on.
I need to give this thought, too. What sorts of error scenarios do you
foresee? GNU troff itself, if it can't open a file to be formatted,
reports an error diagnostic and continues to the next `argv` string
until it reaches the end of input.
> I can see the appeal, in terms of processing the actual language
> rather than a pile of hacks that try to guess what to do with it
...a major selling point, IMO...
> but on the other hand this starts to feel like a much less natural fit
> for the way nroff is run in every other situation, where you're
> processing one document at a time.
This I disagree with. Or perhaps more precisely, it's another example
of the exception (man(1)) swallowing the rule (nroff/troff). nroff and
troff were written as Unix filters; they read the standard input stream
(and/or argument list)[1], do some processing, and write to standard
output.[2]
Historically, troff (or one of its preprocessors) was commonly used with
multiple input files to catenate them.
Here's an example of this practice from 1980.
https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/doc/pascal/makefile
Regards,
Branden
[1] ...including this option from Seventh Edition Unix (1979) or
earlier, which survives in GNU troff to this day.
-i Read standard input after the input files are
exhausted.
[2] Seventh Edition troff didn't write to stdout by default, but tried
to open the typesetter device. But it had an option to write to
standard output.
-t Direct output to the standard output instead of the
phototypesetter.
Running old school Unix under emulation these days, you _have_ to use
this option to avoid the dreaded "Typesetter busy." diagnostic.
When Kernighan refactored troff for device-independence, he
reseated it more squarely in the Unix filter tradition by writing
its plain-text page description language to stdout. The output
driver, such as "dpost" for PostScript, also read its standard input,
and could thus become just one more stage in a pipeline. [CSTR #97]
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists