linux-kernel - [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20230211075023.137253-1-asmadeus@codewreck.org>
Date:   Sat, 11 Feb 2023 16:50:18 +0900
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     v9fs-developer@...ts.sourceforge.net,
        Eric Van Hensbergen <ericvh@...il.com>,
        Christian Schoenebeck <linux_oss@...debyte.com>
Cc:     Latchesar Ionkov <lucho@...kov.net>, linux-kernel@...r.kernel.org,
        Jens Axboe <axboe@...nel.dk>,
        Pengfei Xu <pengfei.xu@...el.com>,
        Dominique Martinet <asmadeus@...ewreck.org>
Subject: [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

I've been working on async RPCs for a while and never had time to debug
the last issues this had, but by limiting the async clunks to failures
the impact is drastically smaller and I've not been able to reproduce
any more bug so far.

This will require some more testing and I'm tempted to say this is not
worth rushing this into the merge window next week-ish; the new problem
Jens reported with task_work isn't really new and I'd rather get this
right than rush new bugs in given the sour experience I've had with this
patch series... Hopefully it'll get in this time.
With that in mind I plan to take the patches in to my -next branch after
the merge window, so this has time to get reviewed first.

I'd like to measure impact on performance as well, but really spent way
more time on this than I already have, so that'll have to wait a bit.

The only good thing here is that this shouldn't conflict with Eric's
rework...

A few problems I ran into today:
 - not doing async clunks for retries led to massive fid leaks as soon
as I started doing async flush; I've described that in the clunk patch
but basically all the servers we tested with always replied with the
clunk before the flush, so the first clunk was never an error, so there
wasn't a need to retry at all... Because if it had it'd fall with
ERESTARTSYS immediately again, and it didn't.
This isn't perfect, but hopefully should be good enough to avoid too
many problems.

 - for flush itself, the waiting for all in-flight flushes before
sending new rpc isn't great, but I don't have any better idea.
I think in the general case we could get away with not waiting at all
most of the time (check if there are any pending flush sent by the
current tid?), but the current approach by making the thread not
killable at all (!) is much more conservative, so I feel like we
should try this much even if it costs a bit.

Anyway, here goes nothing. Comments please!

Dominique Martinet (5):
  9p/net: move code in preparation of async rpc
  9p/net: share pooled receive buffers size exception in p9_tag_alloc
  9p/net: implement asynchronous rpc skeleton
  9p/net: add async clunk for retries
  9p/net: make flush asynchronous

 include/net/9p/client.h |  15 +-
 net/9p/client.c         | 508 +++++++++++++++++++++++++---------------
 2 files changed, 339 insertions(+), 184 deletions(-)

-- 
2.39.1