linux-kernel - Re: [PATCH 3/3] 9p: Add mempools for RPCs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12950409.o0bIpVV1Ut@silver>
Date:   Sun, 10 Jul 2022 17:16:45 +0200
From:   Christian Schoenebeck <linux_oss@...debyte.com>
To:     Dominique Martinet <asmadeus@...ewreck.org>
Cc:     Kent Overstreet <kent.overstreet@...il.com>,
        linux-kernel@...r.kernel.org, v9fs-developer@...ts.sourceforge.net,
        Eric Van Hensbergen <ericvh@...il.com>,
        Latchesar Ionkov <lucho@...kov.net>, Greg Kurz <groug@...d.org>
Subject: Re: [PATCH 3/3] 9p: Add mempools for RPCs

On Sonntag, 10. Juli 2022 15:19:56 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Sun, Jul 10, 2022 at 02:57:58PM +0200:
> > On Samstag, 9. Juli 2022 22:50:30 CEST Dominique Martinet wrote:
> > > Christian Schoenebeck wrote on Sat, Jul 09, 2022 at 08:08:41PM +0200:
[...]
> > > late replies to the oldtag are no longer allowed once rflush has been
> > > sent.
> > 
> > That's not quite correct, it also explicitly says this:
> > 
> > "The server may respond to the pending request before responding to the
> > Tflush."
> > 
> > And independent of what the 9p2000 spec says, consider this:
> > 
> > 1. client sends a huge Twrite request
> > 2. server starts to perform that write but it takes very long
> > 3.A impatient client sends a Tflush to abort it
> > 3.B server finally responds to Twrite with a normal Rwrite
> > 
> > These last two actions 3.A and 3.B may happen concurrently within the same
> > transport time frame, or "at the same time" if you will. There is no way
> > to
> > prevent that from happening.
> 
> Yes, and that is precisely why we cannot free the buffers from the
> Twrite until we got the Rflush.
> Until the Rflush comes, a Rwrite can still come at any time so we cannot
> just free these resources.

With current client version, agreed, as it might potentially incorrectly 
lookup a wrong (new) request with the already recycled tag number then. With 
consecutive tag numbers this would not happen. Client lookup with the old tag 
number would fail -> ignore reply. However ...

> In theory it'd be possible to free the buffers for some protocol and
> throw the data with the bathwater, but the man page says that in this
> case we should ignore the flush and behave as if the request behaved
> properly because of side-effects e.g. even if you try to interrupt an
> unlink() call if the server says it removed it, well, it's removed so we
> should tell userspace that.

... good point! I was probably too much thinking about Twrite/Tread examples, 
so I haven't considered that case indeed.

> > > > When the client sends a Tflush, it must wait to receive the
> > > > corresponding Rflush before reusing oldtag for subsequent messages
> > > 
> > > if we free the request at this point we'd reuse the tag immediately,
> > > which definitely lead to troubles.
> > 
> > Yes, that's the point I never understood why this is done by Linux client.
> > I find it problematic to recycle IDs in a distributed system within a
> > short time window. Additionally it also makes 9p protocol debugging more
> > difficult, as you often look at tag numbers in logs and think, "does this
> > reference the previous request, or is it about a new one now?"
> 
> I can definitely agree with that.
> We need to keep track of used tags, but we don't need to pick the lowest
> tag available -- maybe the IDR code that allocates tag can be configured
> to endlessly increment and loop around, only avoiding duplicates?
> 
> Ah, here it is, from Documentation/core-api/idr.rst:
> 
>   If you need to allocate IDs sequentially, you can use
>   idr_alloc_cyclic().  The IDR becomes less efficient when dealing
>   with larger IDs, so using this function comes at a slight cost.
> 
> 
> That would be another "easy change", if you'd like to check that cost at
> some point...

Nice! I'll definitely give this a whirl and will report back!

> (until we notice that some server has a static array for tags and stop
> working once you use a tag > 64 or something...)

That would be an incorrect server implementation then, a.k.a. bug. The spec is 
clear that tag numbers are generated by client and does not mandate any 
sequential structure.

> Anyway, this is getting off-topic -- the point is that we need to keep
> resources around for the original reply when we send a tflush, so we
> can't just free that buffer first unless you're really good with it.
> 
> It'd be tempting to just steal its buffers but these might still be
> useful, if e.g. both replies come in parallel.
> (speaking of which, why do we need two buffers? Do we ever re-use the
> sent buffer once the reply comes?... this all looks sequential to me...)

Yep, I was thinking the exact same, but for now I would leave it this way.

> So instead of arguing here I'd say let's first finish your smaller reqs
> patches and make mempool again on top of that with a failsafe just for
> flush buffers to never fallback on mempool; I think that'll be easier to
> do in this order.

OK then, fine with me!

No time today, but I hope to post a new version next week.

Best regards,
Christian Schoenebeck