netdev - Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ad133b58-9bc1-4da9-73a2-957512e3e162@kernel.dk>
Date:   Mon, 6 Feb 2023 13:19:24 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Dominique Martinet <asmadeus@...ewreck.org>,
        Christian Schoenebeck <linux_oss@...debyte.com>,
        Eric Van Hensbergen <ericvh@...nel.org>
Cc:     netdev <netdev@...r.kernel.org>, Jakub Kicinski <kuba@...nel.org>,
        Pengfei Xu <pengfei.xu@...el.com>,
        v9fs-developer@...ts.sourceforge.net
Subject: Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on
 recalc_sigpending()

On 2/5/23 3:02?AM, Dominique Martinet wrote:
> meta-comment: 9p is usually handled separately from netdev, I just saw
> this by chance when Simon replied to v1 -- please cc
> v9fs-developer@...ts.sourceforge.net for v3 if there is one
> (well, it's a bit of a weird tree because patches are sometimes taken
> through -net...)
> 
> Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for
> attention.

Thanks! I can send out a v3, but let's get the discussion sorted first.
Only change I want to make is the comment format, which apparently is
different in net/ that most other spots in the kernel.

> Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700:
>> signal_pending() really means that an exit to userspace is required to
>> clear the condition, as it could be either an actual signal, or it could
>> be TWA_SIGNAL based task_work that needs processing. The 9p client
>> does a recalc_sigpending() to take care of the former, but that still
>> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
>> task_work pending, then we'll sit in a tight loop spinning as
>> signal_pending() remains true even after recalc_sigpending().
>>
>> Move the signal_pending() logic into a helper that deals with both, and
>> return -ERESTARTSYS if the reason for signal_pendding() being true is
>> that we have task_work to process.
>> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
>> Reported-and-tested-by: Pengfei Xu <pengfei.xu@...el.com>
>> Signed-off-by: Jens Axboe <axboe@...nel.dk>
>> ---
>> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
>>     that point. For one, we don't want to export task_work_run(), it's
>>     in-kernel only. And secondly, we need to ensure we have a sane state
>>     before running task_work. The latter did look fine before, but this
>>     should be saner. Tested this also fixes the report as well for me.
> 
> Hmm, just bailing out here is a can of worm -- when we get the reply
> from server depending on the transport hell might break loose (zc
> requests in particular on virtio will probably just access the memory
> anyway... fd will consider it got a bogus reply and close the connection
> which is a lesser evil but still not appropriatey)
> 
> We really need to get rid of that retry loop in the first place, and req
> refcounting I added a couple of years ago was a first step towards async
> flush which will help with that, but the async flush code had a bug I
> never found time to work out so it never made it and we need an
> immediate fix.
> 
> ... Just looking at code out loud, sorry for rambling: actually that
> signal handling in virtio is already out of p9_virtio_zc_request() so
> the pages are already unpinned by the time we do that flush, and I guess
> it's not much worse -- refcounting will make it "mostly work" exactly as
> it does now, as in the pages won't be freed until we actually get the
> reply, so the pages can get moved underneath virtio which is bad but is
> the same as right now, and I guess it's a net improvement?
> 
> 
> I agree with your assessment that we can't use task_work_run(), I assume
> it's also quite bad to just clear the flag?
> I'm not familiar with these task at all, in which case do they happen?
> Would you be able to share an easy reproducer so that I/someone can try
> on various transports?

You can't just clear the flag without also running the task_work. Hence
it either needs to be done right there, or leave it pending and let the
exit to userspace take care of it.

> If it's "rare enough" I'd say sacrificing the connection might make more
> sense than a busy loop, but if it's becoming common I think we'll need
> to spend some more time thinking about it...
> It might be less effort to dig out my async flush commits if this become
> too complicated, but I wish I could say I have time for it...

It can be a number of different things - eg fput() will do it. The
particular case that I came across was io_uring, which will use
TWA_SIGNAL based task_work for retry operations (and other things). If
you use io_uring, and depending on how you setup the ring, it can be
quite common or will never happen. Dropping the connection task_work
being pending is not a viable solution, I'm afraid.

-- 
Jens Axboe