linux-kernel - Re: [PATCH v3] docs: Use make invocation's -j argument for parallelism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b95ded2e-474a-5f7b-af07-30732e8cdb41@rasmusvillemoes.dk>
Date:   Sun, 6 Oct 2019 21:33:05 +0200
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Kees Cook <keescook@...omium.org>
Cc:     Jonathan Corbet <corbet@....net>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] docs: Use make invocation's -j argument for
 parallelism

On 04/10/2019 18.08, Kees Cook wrote:
> On Fri, Oct 04, 2019 at 11:15:46AM +0200, Rasmus Villemoes wrote:
>> On 25/09/2019 01.29, Kees Cook wrote:
>>> +# Extract and prepare jobserver file descriptors from envirnoment.
>>
>> Ah, reading more carefully you set O_NONBLOCK explicitly. Well, older
>> Makes are going to be very unhappy about that (remember that it's a
>> property of the file description and not file descriptor). They don't
>> expect EAGAIN when fetching a token, so fail hard. Probably not when
>> htmldocs is the only target, because in that case the toplevel Make just
>> reads back the exact number of tokens it put in as a sanity check, but
>> if it builds other targets/spawns other submakes, I think this breaks.
> 
> Do you mean the processes sharing the file will suddenly gain
> O_NONBLOCK? I didn't think that was true, 

It is. Quoting man fcntl

   File status flags
       Each  open  file  description has certain associated status
flags, initialized by open(2) and possibly modified by
       fcntl().  Duplicated file descriptors (made with dup(2),
fcntl(F_DUPFD), fork(2), etc.) refer  to  the  same  open
       file description, and thus share the same file status flags.

...  On Linux, this  command
              can  change  only  the O_APPEND, O_ASYNC, O_DIRECT,
O_NOATIME, and O_NONBLOCK flags.

> we could easily just restore the state before exit.

That doesn't really help - and I'm a bit surprised you'd even suggest
that. I don't know if open(/proc/self/fd/...) would give you a new
struct file.

>>> +# Return all the reserved slots.
>>> +os.write(writer, jobs)
>>
>> Well, that probably works ok for the isolated case of a toplevel "make
>> -j12 htmldocs", but if you're building other targets ("make -j12
>> htmldocs vmlinux") this will effectively inject however many tokens the
>> above loop grabbed (which might not be all if the top-level make has
>> started things related to the vmlinux target), so for the duration of
>> the docs build, there will be more processes running than asked for.
> 
> That is true, yes, though I still think it's an improvement over the
> existing case of sphinx-build getting run with -jauto which lands us in
> the same (or worse) position.

Yes, I agree that that's not ideal either. And probably it's not a big
problem in practice (I don't think a lot of people build the docs, let
alone do it while also building the kernel), but it might be rather
surprising and somewhat hard to "debug" to suddenly have a load twice
what one expected.

> The best solution would be to teach sphinx-build about the Make
> jobserver, though I expect that would be weird. Another idea would be to
> hold the reservation until sphinx-build finishes and THEN return the
> slots? That would likely need to change from a utility to a sphinx-build
> wrapper...

Yes, a more general solution would be some kind of generic wrapper that
would hog however many tokens it could get hold of and run a given
command with a commandline slightly modified to hand over those tokens -
then wait for that process to exit and give back the tokens. That would
work for any command that knows about parallelism but doesn't support
the make jobserver model. (I'd probably implement that by creating a
pipe, fork(), then exec into the real command, while the child simply
blocks in a read on the pipe waiting for EOF and then writes back the
tokens, to simplify the "we have to report exit/killed-by-signal status
to the parent).

Rasmus