[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EE38AE6.1060508@redhat.com>
Date: Sat, 10 Dec 2011 17:37:58 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: James Bottomley <James.Bottomley@...senPartnership.com>
CC: linux-kernel@...r.kernel.org,
"Michael S. Tsirkin" <mst@...hat.com>,
linux-scsi <linux-scsi@...r.kernel.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Stefan Hajnoczi <stefanha@...ux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] virtio-scsi: first version
On 12/09/2011 09:06 PM, James Bottomley wrote:
> On Thu, 2011-12-08 at 14:09 +0100, Paolo Bonzini wrote:
>>> Well, no it's not, the transports are the fastest evolving piece of the
>>> SCSI spec.
>>
>> No, I mean when something is added to the generic definition of SCSI
>> transport (SAM, more or less), not the individual transports. When the
>> virtio-scsi transport has to change, you still have to update
>> spec+host+guest, but that's relatively rare.
>
> This doesn't make sense: You talk about wanting TMF access which *is*
> transport defined.
TMF access is transport defined. The definition of TMFs is part of SAM
and not fast moving. The virtio-scsi spec tells you how to access TMFs
on virtio-scsi; it doesn't tell you what the TMFs do, because it just
refers you to SAM.
Device commands can be treated opaquely when doing passthrough, so their
rate of change does not matter. And you can always leave them out in
the emulated target, too. If some new command turns out to be
interesting enough to implement it in the emulated target, you do it and
guests that can use the feature will start using it.
>> So, for virtio-blk, SG_IO is good for persistent reservations, burning
>> CDs, and basically nothing else. Neither of these can really be done in
>> the host by interpreting, so for virtio-blk it makes sense to simply
>> pass through.
>
> It is a pass through for user space ... I don't get what your point is.
> All of the internal commands for setup are handled in the host.
In the host or in the guest kernel? I'm not sure I understand your
point either. :)
> All the guest is doing is attaching to a formed block queue. I think,
> as I've said several times before, all of this indicates virtio-blk
> doesn't do discovery of the host block queue properly, but that's
> fixable.
Well, the only fix is to disable SG_IO. For example, suppose the host
disk is 4k-lbs and you present it to the guest as 512b-logical,
4096-byte physical. That's a sensible thing to do if you want the guest
boot from that disk.
Now, SG_IO will see 4k-lbs, and you cannot change it. To avoid showing
mismatched geometry to the guest, _the only fix is to disable SG_IO_.
If you do so, you prevent the guest from doing possibly useful things
with it (e.g. PR). If you don't, you have to cross your fingers and
hope the guest won't do possibly harmful things with it.
Of course, virtio-scsi is not a silver bullet. If you want to modify
the block limits you won't be able to pass the LUN through anymore, and
you will have to use an emulated target; that's obvious. However, in
_no_ case will there be a mismatch between the queue parameters seen by
the kernel and what you get in SG_IO.
>>> You worry me enormously talking about TMFs because they're transport
>>> specific.
>>
>> True, but virtio-blk for example cannot even retry a command at all.
>
> Why would it need to. You seem to understand that architecturally the
> queue is sliced, but what you don't seem to appreciate is that error
> handling is done below this ... i.e. in the host in your model, so
> virtio-blk properly implemented *shouldn't* be doing retries.
There may be no error handling in the host at all, for example if the
host is using as a simple userspace iSCSI initiator that just sends
commands over TCP. It's also possible that non-Linux OSes cannot be
told "no error handling". Windows expects the driver to be able to
reset LUNs/buses/hosts, for example.
> You seem to be stating error handling in a way that necessarily
> violates the layering of block and then declaring this to be a
> problem. It isn't; in virtio-block, errors are handled in the host
> and passed up to the guest when resolved.
I agree, but in practice it doesn't always work like that, depending on
your storage backends. Again, the choice with virtio-blk is either
"keep it broken" or "don't do it".
> Why do you worry about WCE? That's a SCSI feature and it's handled in
> the host.
No, the guest must also be able to toggle it. But that's irrelevant.
The point is: you need discovery of geometry parameters, of topology
parameters, of cache parameters. You need reads, writes, flushes,
discards. Why reinvent the wheel every time, and not encapsulate those
within SPC/SBC commands? Consider that:
1) you also need to support generic SCSI commands for userspace, and
virtio-blk's solution for that sucks;
2) you would anyway need the SCSI encapsulation code for the sake of
Windows drivers (only it would run in the Windows guests rather than in
the host).
At some point you start wondering whether you're heading straight to a
local optimum, and why every other virtualization platform is doing
something else. virtio-blk's main feature is its simplicity; it's quite
possible that we're past the break-even point for virtio-blk's simplicity.
> The point here is that virtio-blk operates at the
> block level, so you should too [...] you don't ask to pierce the
> abstraction to try to see SCSI parameters.
Exactly! That's why I say SG_IO on virtio-blk is a very bad idea, and
if clients need SCSI (and they do) they should be presented a real SCSI
device, which virtio-scsi provides.
>> Regarding updates to the targets, you have much more control on the host
>> than the guest. Updating the host is trivial compared to updating the
>> guest.
>
> So is this a turf war? virto-blk isn't evolving fast enough (and since
> you say lagging behind and DISCARD was a 2008 feature, that seems
> reasonable) so you want to invent and additional backend that can move
> faster?
No turf war at all, simply different choices favoring flexibility and
extensibility over simplicity. (And even that is not entirely true: the
actual virtio drivers are simpler for virtio-scsi, though of course the
whole stack is more complex).
virtio-blk lags behind by design, because it tries to follow the Linux
block layer's protocol. To add a new feature to the protocol, in
practice it has to be already in the block layer, even if there is a
useful addition that non-Linux guests could use. Then you have to come
to an agreement on spec updates, implement it in host, and get the guest
driver updated.
With virtio-scsi, you sidestep the problems completely, because all you
need to do in the host is provide a SCSI target with a decent command
set. The spec heavily relies on SAM and refers you to it and the other
SCSI specifications. New features can be added even before Linux adopts
a new feature, as soon as SPC or SBC includes them. You do not need
separate work on a separate spec, and you do not risk getting that part
wrong. And once Linux non-virt devices do gain support for the new
feature, virt devices also gain it. Sometimes for free, for example if
you had already done the host implementation for Windows guests.
>>> Incidentally, REQ_DISCARD was added in 2008. In that time close to
>>> 50 new commands have been added to SCSI, so the block protocol is
>>> pretty slow moving.
>>
>> That also means that virtio-blk cannot give guests access to the full
>> range of features that might want to use. Not all OSes are Linux, not
>> all OSes limit themselves to the features of the Linux block protocol.
>
> So you're trying to solve the non-linux guest problem? My observation
> from Windows has been that windows device queues mostly behave
> reasonably similarly to Linux ... that's not exactly, but similarly
> enough that we can translate the requests.
That's not the case, actually. I don't know how Windows device queues
work, but Windows storage drivers can only hook themselves at the SCSI
layer. A Windows storage driver cannot distinguish a read that came
from the disk driver, from a READ that came from userspace via
passthrough (not unlike a Linux driver for a SCSI host).
For this reason, Windows virtio-blk devices do not have the equivalent
of SG_IO. If you send a READ command via SCSI passthrough, it becomes a
regular read. If you send an INQUIRY, you get artificial data that the
Windows virtio-blk device makes up.
[snip]
> OK, so I think the problem boils down to two components:
>
> 1. virtio-blk isn't developing fast enough. This looks to be a
> fairly easily fixable problem
Agreed, the immediate shortcomings are fixable, though the slowness is
inherent in virtio-blk. It can even be considered a feature, because it
is a consequence of virtio-blk's simplicity.
> 2. Discover in virtio-blk isn't done properly. Again, this looks
> to be easily fixable.
No, this is not a problem. virtio-blk does discovery very well. The
problems are all with the SG_IO interface:
1. When you create a virtio-blk device on say /dev/sdb, you have more
flexibility than just passing /dev/sdb through to the guest. But if you
use this flexibility, you have no choice but to disable SG_IO altogether
(or leave it enabled, and hope the guest doesn't corrupt its own data
inadvertently).
2. SG_IO is limited to Linux guests, so that non-Linux guests are
limited in practice to the feature set of the Linux block layer.
3. Even on Linux, SG_IO is not reliably a part of the userspace ABI for
virtio disks. That's because it may work or not depending on how
storage has been configured.
4. SG_IO on virtio-blk does not cover non-block SCSI devices.
> Once you fix the above, most of what you're asking for, which is mainly
> SCSI encapsulation for discovery and error handling in the guest for no
> reason I can discern, becomes irrelevant.
SCSI encapsulation is not an end by itself. It just lets you reuse work
on an existing spec rather than making up one.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists