linux-kernel - Re: [git pull] vfs part 2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 2 Jul 2015 21:44:15 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Jeff Layton <jlayton@...chiereds.net>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [git pull] vfs part 2

On Thu, Jul 02, 2015 at 12:16:14PM -0700, Linus Torvalds wrote:
> On Thu, Jul 2, 2015 at 11:40 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> >
> > All they are used for is matching response to request.  Basically, you
> > can have up to 65535 pending requests.  Reusing it right after getting
> > the response is fine.
> 
> Reusing a tag right after getting the completion may be fine in
> theory, but it still sounds like a bad idea. Sure, it's used to match
> the command with the reply, but using those kinds of things for
> matching re-sends and to index into various "current data structures"
> is also very common (not having looked at p9 I don't know how much it
> does), and basically reusing tags "soon" tends to make those kidns of
> things fragile.

_All_ retransmits are done in transport layer there.  It's not NFS - it
really expects reliable ordered connection for transport.  No retransmits,
no duplicates, etc.  I'm not dead against circular allocation, but I would
really like to figure out what's going on first.

I still wonder if we are seeing wraparound (should've posted a diff instead
of verbal description - mea culpa).  If we are not, it smells like response
to request having arrived while the tag had been not in use from the client
POV, _or_ buggered barriers of some kind.  Maybe buggered ordering of
replies somewhere, but that's only if Tflush had been involved (as in
-> Twrite tag = 3
-> Tflush tag = 42 old_tag = 3		<- Rwrite tag = 3
<- Rflush tag = 42
mark tag 3 free to be reused
reuse tag 3
... somehow get to seeing Rwrite only now

But I don't see where such ordering violation could've happened at the moment.
The way it's supposed to work is that the sequence
-> Twhatever tag = N
-> Tflush old_tag = N
must either end up with no response to the former arriving at all, or
arriving before the response to the latter.  Transport itself does preserve
ordering (TCP certainly would, but virtio queue also does, AFAICS) and
we really need to have p9_client_cb() called in order of arrival.

Hmm...  This is a stab in the dark, but... we have vring_interrupt() calling
req_done(), which does
        while (1) {
                spin_lock_irqsave(&chan->lock, flags);
                rc = virtqueue_get_buf(chan->vq, &len);
                if (rc == NULL) {
                        spin_unlock_irqrestore(&chan->lock, flags);
                        break;
                }
                chan->ring_bufs_avail = 1;
                spin_unlock_irqrestore(&chan->lock, flags);
                /* Wakeup if anyone waiting for VirtIO ring space. */
                wake_up(chan->vc_wq);
                p9_debug(P9_DEBUG_TRANS, ": rc %p\n", rc);
                p9_debug(P9_DEBUG_TRANS, ": lookup tag %d\n", rc->tag);
                req = p9_tag_lookup(chan->client, rc->tag);
                p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
        }

What's to prevent *another* vring_interrupt() (called from some kind of
IRQ handler) hitting on another CPU and competing with this one for the
queue?  While we are at it, both p9_tag_lookup() and p9_client_cb()
should be find with being called under spin_lock_irqsave, so why not
hold it outside of the loop?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/