[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pqg1kiuu.fsf@rustcorp.com.au>
Date: Tue, 06 Dec 2011 22:33:21 +1030
From: Rusty Russell <rusty@...tcorp.com.au>
To: Avi Kivity <avi@...hat.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
Sasha Levin <levinsasha928@...il.com>,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
markmc@...hat.com
Subject: Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@...hat.com> wrote:
> On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > the rings themselves would be shared A would say "copy this to/from B's
> > ring entry N" and you know that A can't have changed B's entry.
>
> Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
> A's address space into the ring, there's no way B can do anything with
> it, it's an opaque number. Xen solves this with an extra layer of
> indirection (grant table handles) that cost extra hypercalls to map or
> copy.
It's not symmetric. B can see the desc and avail pages R/O, and the
used page R/W. It needs to ask the something to copy in/out of
descriptors, though, because they're an opaque number, and it doesn't
have access. ie. the existence of the descriptor in the ring *implies*
a grant.
Perhaps this could be generalized further into a "connect these two
rings", but I'm not sure. Descriptors with both read and write parts
are tricky.
> > Every driver really wants to put a pointer in there. We have an array
> > to map desc. numbers to cookies inside the virtio core.
> >
> > We really want 64 bits.
>
> With multiqueue, it may be cheaper to do the extra translation locally
> than to ship the cookie across cores (or, more likely, it will make no
> difference).
Indeed.
> However, moving pointers only works if you trust the other side. That
> doesn't work if we manage to share a ring.
Yes, that part needs to be trusted too.
> > I'm just not sure how the host would even know to hint.
>
> For JBOD storage, a good rule of thumb is (number of spindles) x 3.
> With less, you might leave an idle spindle; with more, you're just
> adding latency. This assumes you're using indirects so ring entry ==
> request. The picture is muddier with massive battery-backed RAID
> controllers or flash.
>
> For networking, you want (ring size) * min(expected packet size, page
> size) / (link bandwidth) to be something that doesn't get the
> bufferbloat people after your blood.
OK, so while neither side knows, the host knows slightly more.
Now I think about it, from a spec POV, saying it's a "hint" is useless,
as it doesn't tell the driver what to do with it. I'll say it's a
maximum, which keeps it simple.
Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists