lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 11 Feb 2023 16:50:18 +0900
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     v9fs-developer@...ts.sourceforge.net,
        Eric Van Hensbergen <ericvh@...il.com>,
        Christian Schoenebeck <linux_oss@...debyte.com>
Cc:     Latchesar Ionkov <lucho@...kov.net>, linux-kernel@...r.kernel.org,
        Jens Axboe <axboe@...nel.dk>,
        Pengfei Xu <pengfei.xu@...el.com>,
        Dominique Martinet <asmadeus@...ewreck.org>
Subject: [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

I've been working on async RPCs for a while and never had time to debug
the last issues this had, but by limiting the async clunks to failures
the impact is drastically smaller and I've not been able to reproduce
any more bug so far.

This will require some more testing and I'm tempted to say this is not
worth rushing this into the merge window next week-ish; the new problem
Jens reported with task_work isn't really new and I'd rather get this
right than rush new bugs in given the sour experience I've had with this
patch series... Hopefully it'll get in this time.
With that in mind I plan to take the patches in to my -next branch after
the merge window, so this has time to get reviewed first.

I'd like to measure impact on performance as well, but really spent way
more time on this than I already have, so that'll have to wait a bit.

The only good thing here is that this shouldn't conflict with Eric's
rework...


A few problems I ran into today:
 - not doing async clunks for retries led to massive fid leaks as soon
as I started doing async flush; I've described that in the clunk patch
but basically all the servers we tested with always replied with the
clunk before the flush, so the first clunk was never an error, so there
wasn't a need to retry at all... Because if it had it'd fall with
ERESTARTSYS immediately again, and it didn't.
This isn't perfect, but hopefully should be good enough to avoid too
many problems.

 - for flush itself, the waiting for all in-flight flushes before
sending new rpc isn't great, but I don't have any better idea.
I think in the general case we could get away with not waiting at all
most of the time (check if there are any pending flush sent by the
current tid?), but the current approach by making the thread not
killable at all (!) is much more conservative, so I feel like we
should try this much even if it costs a bit.


Anyway, here goes nothing. Comments please!


Dominique Martinet (5):
  9p/net: move code in preparation of async rpc
  9p/net: share pooled receive buffers size exception in p9_tag_alloc
  9p/net: implement asynchronous rpc skeleton
  9p/net: add async clunk for retries
  9p/net: make flush asynchronous

 include/net/9p/client.h |  15 +-
 net/9p/client.c         | 508 +++++++++++++++++++++++++---------------
 2 files changed, 339 insertions(+), 184 deletions(-)

-- 
2.39.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ