[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Y5hbXanne5IryJBV@codewreck.org>
Date: Tue, 13 Dec 2022 20:00:45 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Hillf Danton <hdanton@...a.com>
Cc: Christian Schoenebeck <linux_oss@...debyte.com>,
v9fs-developer@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
Marco Elver <elver@...gle.com>
Subject: Re: [PATCH] 9p/virtio: add a read barrier in p9_virtio_zc_request
(Your mailer breaks threads, please have a look at how to make it send
In-Reply-To and/or References headers)
Hillf Danton wrote on Tue, Dec 13, 2022 at 02:59:01PM +0800:
> On 10 Dec 2022 09:10:44 +0900 Dominique Martinet <asmadeus@...ewreck.org>
> > @@ -533,6 +533,12 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> > p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
> > err = wait_event_killable(req->wq,
> > READ_ONCE(req->status) >= REQ_STATUS_RCVD);
> > +
> > + /* Make sure our req is coherent with regard to updates in other
> > + * threads - echoes to wmb() in the callback like p9_client_rpc
> > + */
> > + smp_rmb();
> > +
> > // RERROR needs reply (== error string) in static data
> > if (READ_ONCE(req->status) == REQ_STATUS_RCVD &&
> > unlikely(req->rc.sdata[4] == P9_RERROR))
>
> No sense can be made without checking err before req->status,
> given the comment below. Worse after this change.
Hmm, I don't see how it's worse (well, it makes it more likely for
req->status to be RCVD after the barrier without the rest of the data
being coherent I guess), but it's definitely incorrect, yes...
Thanks for bringing it up.
Having another look I also don't see how this can possibly be safe at
all: if a process is killed during waiting here, p9_virtio_zc_request
will drop pages it reserved for the response (in the need_drop case) and
sg lists will be freed but the response can still come for a while --
these need to be dropped only after flush has been handled.
If these buffers are reused while the response comes we'll be overriding
some random data...
This isn't an easy fix, I'll just drop this patch for now; but I guess
we should try to address that next cycle.
Perhaps I can try to find time to dust off my async flush code, some
other fix might have resolved the race I used to see with it...
--
Dominique
Powered by blists - more mailing lists