lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Jan 2022 10:09:28 -0600
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Olivier Langlois <olivier@...llion01.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "<linux-arch@...r.kernel.org>" <linux-arch@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        Alexey Gladkov <legion@...nel.org>,
        Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>,
        Pavel Begunkov <asml.silence@...il.com>
Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit
 special case

Olivier Langlois <olivier@...llion01.com> writes:

> On Fri, 2022-01-14 at 18:12 -0600, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@...ux-foundation.org> writes:
>> 
>> > On Tue, Jan 11, 2022 at 10:51 AM Eric W. Biederman
>> > <ebiederm@...ssion.com> wrote:
>> > > 
>> > > +       while ((n == -ERESTARTSYS) &&
>> > > test_thread_flag(TIF_NOTIFY_SIGNAL)) {
>> > > +               tracehook_notify_signal();
>> > > +               n = __kernel_write(file, addr, nr, &pos);
>> > > +       }
>> > 
>> > This reads horribly wrongly to me.
>> > 
>> > That "tracehook_notify_signal()" thing *has* to be renamed before
>> > we
>> > have anything like this that otherwise looks like "this will just
>> > loop
>> > forever".
>> > 
>> > I'm pretty sure we've discussed that "tracehook" thing before - the
>> > whole header file is misnamed, and most of the functions in theer
>> > are
>> > too.
>> > 
>> > As an ugly alternative, open-code it, so that it's clear that "yup,
>> > that clears the TIF_NOTIFY_SIGNAL flag".
>> 
>> A cleaner alternative looks like to modify the pipe code to use
>> wake_up_XXX instead of wake_up_interruptible_XXX and then have code
>> that does pipe_write_killable instead of pipe_write_interruptible.
>
> Do not forget that the problem might not be limited to the pipe FS as
> Oleg Nesterov pointed out here:
>
> https://lore.kernel.org/io-uring/20210614141032.GA13677@redhat.com/
>
> This is why I did like your patch fixing __dump_emit. If the only
> problem is the tracehook_notify_signal() function unclear name, that
> should be addressed instead of trying to fix the problem in a different
> way.

It might be that the fix is to run a portion of the exit_to_userspace
loop that does:

	if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
		handle_signal_work(regs, ti_work);

I am deep in brainstorm mode trying to find something that comes out
clean.

Oleg is right that while to be POSIX compliant and otherwise compatible
with traditional unix behavior sleeps in filesystems need to be
uninterruptible.  NFS has not always provided that compatibility.


>> There is also a question of how all of this should interact with the
>> freezer, as I think changing from interruptible to killable means
>> that
>> the coredumps became unfreezable.
>> 
>> I am busily simmering this on my back burner and I hope I can come up
>> with something sensible.
>
> IMHO, fixing the problem on the emit function side has the merit of
> being future proof if something else than io_uring in the future would
> raise the TIF_NOTIFY_SIGNAL flag
>
> but I am wondering why no one commented anything about my proposal of
> cancelling io_uring before generating the core dump therefore stopping
> it to flip TIF_NOTIFY_SIGNAL while the core dump is generated.
>
> Is there something wrong with my proposed approach?
> https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/
>
> It did flawlessly created many dozens of io_uring app core dumps in the
> last months for me...

>From my perspective I am not at all convinced that io_uring is the only
culprit.

Beyond that the purpose of a coredump is to snapshot the process as it
is, before anything is shutdown so that someone can examine the coredump
and figure out what failed.  Running around changing the state of the
process has a very real chance of hiding what is going wrong.

Further your change requires that there be a place for io_uring to clean
things up.  Given that fundamentally that seems like the wrong thing to
me I am not interested in making it easy to what looks like the wrong
thing.

All of this may be perfection being the enemy of the good (especially as
your io_uring magic happens as a special case in do_coredump).  My work
in this area is to remove hacks so I can be convinced the code works
100% of the time so unfortunately I am not interested in pick up a
change that is only good enough.  Someone else like Andrew Morton might
be.


None of that changes the fact that tracehook_notify_signal needs to be
renamed.  That effects your approach and my proof of concept approach.
So renaming tracehook_notify_signal just needs to be done.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ