lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZybX9q_zReTgdMxU@riva.ucam.org>
Date: Sun, 3 Nov 2024 01:55:02 +0000
From: Colin Watson <cjwatson@...ian.org>
To: "G. Branden Robinson" <g.branden.robinson@...il.com>
Cc: Alejandro Colomar <alx@...nel.org>, Ian Rogers <irogers@...gle.com>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	Jonathan Corbet <corbet@....net>, dri-devel@...ts.freedesktop.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-man@...r.kernel.org, groff@....org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the
 page

On Sat, Nov 02, 2024 at 07:50:23PM -0500, G. Branden Robinson wrote:
> At 2024-11-02T19:06:53+0000, Colin Watson wrote:
> > How embarrassing.  Could somebody please file a bug on
> > https://gitlab.com/man-db/man-db/-/issues to remind me to fix that?
> 
> Done; <https://gitlab.com/man-db/man-db/-/issues/46>.

Thanks, working on it.

> > I already know that getting acceptable performance for
> > this requires care, as illustrated by one of the NEWS entries for
> > man-db 2.10.0:
> > 
> >  * Significantly improve `mandb(8)` and `man -K` performance in the
> >    common case where pages are of moderate size and compressed using
> >    `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test
> >    system.
> > 
> > ... so I'm prepared to bet that forking nroff one page at a time will
> > be unacceptably slow.
> 
> Probably, but there is little reason to run nroff that way (as of groff
> 1.23).  It already works well, but I have ideas for further hardening
> groff's man(7) and mdoc(7) packages such that they return to a
> well-defined state when changing input documents.

Being able to keep track of which output goes with which input pages is
critical to the indexer, though (as you acknowledge later in your
reply).  It can't just throw the whole lot at nroff and call it a day.

One other thing: mandb/lexgrog also looks for preprocessing filter hints
in pages (`'\" te` and the like).  This is obscure, to be sure, but
either a replacement would need to do the same thing or we'd need to be
certain that it's no longer required.

> > and of course care would be needed around error handling and so on.
> 
> I need to give this thought, too.  What sorts of error scenarios do you
> foresee?  GNU troff itself, if it can't open a file to be formatted,
> reports an error diagnostic and continues to the next `argv` string
> until it reaches the end of input.

That might be sufficient, or man-db might need to be able to detect
which pages had errors.  I'm not currently sure.

> > but on the other hand this starts to feel like a much less natural fit
> > for the way nroff is run in every other situation, where you're
> > processing one document at a time.
> 
> This I disagree with.  Or perhaps more precisely, it's another example
> of the exception (man(1)) swallowing the rule (nroff/troff).  nroff and
> troff were written as Unix filters; they read the standard input stream
> (and/or argument list)[1], do some processing, and write to standard
> output.[2]
> 
> Historically, troff (or one of its preprocessors) was commonly used with
> multiple input files to catenate them.

But this application is not conceptually like catenation (even if it
might be possible to implement it that way).  The collection of all
manual pages on a system is not like one long document that happens to
be split over multiple files, certainly not from an indexer's point of
view.

-- 
Colin Watson (he/him)                              [cjwatson@...ian.org]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ