[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211005064817-mutt-send-email-mst@kernel.org>
Date: Tue, 5 Oct 2021 07:11:05 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, markver@...ibm.com,
Christian Borntraeger <borntraeger@...ibm.com>,
qemu-devel@...gnu.org, Cornelia Huck <cohuck@...hat.com>,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
Xie Yongji <xieyongji@...edance.com>, stefanha@...hat.com,
Raphael Norwitz <raphael.norwitz@...anix.com>
Subject: Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 12:43:03PM +0200, Halil Pasic wrote:
> On Mon, 4 Oct 2021 05:07:13 -0400
> "Michael S. Tsirkin" <mst@...hat.com> wrote:
>
> > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote:
> > > On Sat, 2 Oct 2021 14:13:37 -0400
> > > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > >
> > > > > Anyone else have an idea? This is a nasty regression; we could revert the
> > > > > patch, which would remove the symptoms and give us some time, but that
> > > > > doesn't really feel right, I'd do that only as a last resort.
> > > >
> > > > Well we have Halil's hack (except I would limit it
> > > > to only apply to BE, only do devices with validate,
> > > > and only in modern mode), and we will fix QEMU to be spec compliant.
> > > > Between these why do we need any conditional compiles?
> > >
> > > We don't. As I stated before, this hack is flawed because it
> > > effectively breaks fencing features by the driver with QEMU. Some
> > > features can not be unset after once set, because we tend to try to
> > > enable the corresponding functionality whenever we see a write
> > > features operation with the feature bit set, and we don't disable, if a
> > > subsequent features write operation stores the feature bit as not set.
> >
> > Something to fix in QEMU too, I think.
>
> Possibly. But it is the same situation: it probably has a long
> history. And it may even make some sense. The obvious trigger for
> doing the conditional initialization for modern is the setting of
> FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would
> need a different trigger.
>
> >
> > > But it looks like VIRTIO_1 is fine to get cleared afterwards.
> >
> > We'd never clear it though - why would we?
> >
>
> Right.
>
> > > So my hack
> > > should actually look like posted below, modulo conditions.
> >
> >
> > Looking at it some more, I see that vhost-user actually
> > does not send features to the backend until FEATURES_OK.
>
> I.e. the hack does not work for transitional vhost-user devices,
> but it doesn't break them either.
>
> Furthermore, I believe there is not much we can do to support
> transitional devices with vhost-user and similar, without extending
> the protocol. The transport specific detection idea would need a new
> vhost-user thingy to tell the device what has been figured
> out, right?
>
> In theory modern only could work, if the backends were paying extra
> attention to endianness, instead of just assuming that the code is
> running little-endian.
I think a reasonable thing is to send SET_FEATURES before each
GET_CONFIG, to tell backend which format is expected.
> > However, the code in contrib for vhost-user-blk at least seems
> > broken wrt endian-ness ATM.
>
> Agree. For example config is native endian ATM AFAICT.
>
> > What about other backends though?
>
> I think whenever the config is owned and managed by the vhost-backend
> we have a problem with transitional. And we don't have everything in
> the protocol to deal with this problem.
>
> I didn't check modern for the different vhost-user backends. I don't
> think we recommend our users on s390 to use those. My understanding
> of the use-cases is far form complete.
>
> > Hard to be sure right?
>
> I agree.
>
> > Cc Raphael and Stefan so they can take a look.
> > And I guess it's time we CC'd qemu-devel too.
> >
> > For now I am beginning to think we should either revert or just limit
> > validation to LE and think about all this some more. And I am inclining
> > to do a revert.
>
> I'm fine with either of these as a quick fix, but we will eventually have
> to find a solution. AFAICT this solution works for the s390 setups we
> care about the most, but so would a revert.
The reason I like this one is that it also fixes MTU for virtio net,
and that one we can't really revert.
>
>
> > These are all hypervisors that shipped for a long time.
> > Do we need a flag for early config space access then?
>
> You mean a feature bit? I think it is a good idea even if
> it weren't strictly necessary. We will have a behavior change
> for some devices, and I think the ability to detect those
> is valuable.
>
> Your spec change proposal, makes it IMHO pretty clear, that
> we are changing our understanding of how transitional should work.
> Strictly, transitional is not a normative part of the spec AFAIU,
> but still...
>
>
> >
> >
> >
> > >
> > > Regarding the conditions I guess checking that driver_features has
> > > F_VERSION_1 already satisfies "only modern mode", or?
> >
> > Right.
> >
> > > For now
> > > I've deliberately omitted the has verify and the is big endian
> > > conditions so we have a better chance to see if something breaks
> > > (i.e. the approach does not work). I can add in those extra conditions
> > > later.
> >
> > Or maybe if we will go down that road just the verify check (for
> > performance). I'm a bit unhappy we have the extra exit but consistency
> > seems more important.
> >
>
> I'm fine either way. The extra exit is only for the initialization and
> one per 1 device, I have no feeling if this has a measurable performance
> impact.
>
>
> > >
> > > --------------------------8<---------------------
> > >
> > > From: Halil Pasic <pasic@...ux.ibm.com>
> > > Date: Thu, 30 Sep 2021 02:38:47 +0200
> > > Subject: [PATCH] virtio: write back feature VERSION_1 before verify
> > >
> > > This patch fixes a regression introduced by commit 82e89ea077b9
> > > ("virtio-blk: Add validation for block size in config space") and
> > > enables similar checks in verify() on big endian platforms.
> > >
> > > The problem with checking multi-byte config fields in the verify
> > > callback, on big endian platforms, and with a possibly transitional
> > > device is the following. The verify() callback is called between
> > > config->get_features() and virtio_finalize_features(). That we have a
> > > device that offered F_VERSION_1 then we have the following options
> > > either the device is transitional, and then it has to present the legacy
> > > interface, i.e. a big endian config space until F_VERSION_1 is
> > > negotiated, or we have a non-transitional device, which makes
> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > > thus presents a little endian config space. Because at this point we
> > > can't know if the device is transitional or non-transitional, we can't
> > > know do we need to byte swap or not.
> >
> > Well we established that we can know. Here's an alternative explanation:
>
>
> I thin we established how this should be in the future, where a transport
> specific mechanism is used to decide are we operating in legacy mode or
> in modern mode. But with the current QEMU reality, I don't think so.
> Namely currently the switch native-endian config -> little endian config
> happens when the VERSION_1 is negotiated, which may happen whenever
> the VERSION_1 bit is changed, or only when FEATURES_OK is set
> (vhost-user).
>
> This is consistent with device should detect a legacy driver by checking
> for VERSION_1, which is what the spec currently says.
>
> So for transitional we start out with native-endian config. For modern
> only the config is always LE.
>
> The guest can distinguish between a legacy only device and a modern
> capable device after the revision negotiation. A legacy device would
> reject the CCW.
>
> But both a transitional device and a modern only device would accept
> a revision > 0. So the guest does not know for ccw.
>
Sorry I was talking about the host not the guest.
when host sees revision > 0 it knows it's a modern guest
and so config should be LE.
>
> >
> > The virtio specification virtio-v1.1-cs01 states:
> >
> > Transitional devices MUST detect Legacy drivers by detecting that
> > VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> > This is exactly what QEMU as of 6.1 has done relying solely
> > on VIRTIO_F_VERSION_1 for detecting that.
> >
> > However, the specification also says:
> > driver MAY read (but MUST NOT write) the device-specific
> > configuration fields to check that it can support the device before
> > accepting it.
>
> s/ accepting it/setting FEATURES_OK
> >
> > In that case, any device relying solely on VIRTIO_F_VERSION_1
>
> s/any device/any transitional device/
>
> > for detecting legacy drivers will return data in legacy format.
>
> E.g. virtio-crypto does not support legacy, and thus it is always
> providing an LE config space.
>
> > In particular, this implies that it is in big endian format
> > for big endian guests. This naturally confuses the driver
> > which expects little endian in the modern mode.
> >
> > It is probably a good idea to amend the spec to clarify that
> > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> > is complete. However, we already have regression so let's
> > try to address it.
> >
> >
>
> I can take the new description without any changes if you like.
> I care
> more about getting a decent fix, than a perfect patch description. Should
> I send out a non-RFC with that implements the proposed changes?
Also add a shortened version to the code comment pls.
>
> > >
> > > The virtio spec explicitly states that the driver MAY read config
> > > between reading and writing the features so saying that first accessing
> > > the config before feature negotiation is done is not an option. The
> > > specification ain't clear about setting the features multiple times
> > > before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
> > > since at this point we already know that we are about to negotiate
> > > F_VERSION_1.
> > >
> > > I don't consider this patch super clean, but frankly I don't think we
> > > have a ton of options. Another option that may or man not be cleaner,
> > > but is also IMHO much uglier is to figure out whether the device is
> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > > according tho what we have figured out, hoping that the characteristics
> > > of the device didn't change.
> >
> > An empty line before tags.
> >
>
> Sure!
>
> > > Signed-off-by: Halil Pasic <pasic@...ux.ibm.com>
> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > > Reported-by: markver@...ibm.com
> >
> > Let's add more commits that are affected. E.g. virtio-net with MTU
> > feature bit set is affected too.
> >
> > So let's add Fixes tag for:
> > commit 14de9d114a82a564b94388c95af79a701dc93134
> > Author: Aaron Conole <aconole@...hat.com>
> > Date: Fri Jun 3 16:57:12 2016 -0400
> >
> > virtio-net: Add initial MTU advice feature
> >
>
> I believe drv->probe(dev) is called after the real finalize, so
> that access should be fine or?
>
> Don't we just have to look out for verify?
you mean validate.
> Isn't the problematic commit fe36cbe0671e ("virtio_net: clear MTU when
> out of range")?
exactly.
> The problem with commit 14de9d114a82a is that the device won't know,
> the driver didn't take the advice (for the MTU because it deemed its
> value invalid). But that doesn't really hurt us.
> On the other hand with fe36cbe0671e we may deem a valid MTU in the
> config space invalid because of the endiannes mess-up. I that case
> we would discard a perfectly good MTU advice.
right.
>
> > I think that's all, but pls double check me.
>
>
> Looks good!
> $ git grep -e '\.validate' -- '*virtio*'
> drivers/block/virtio_blk.c: .validate = virtblk_validate,
> drivers/firmware/arm_scmi/virtio.c: .validate = scmi_vio_validate,
> drivers/net/virtio_net.c: .validate = virtnet_validate,
> drivers/virtio/virtio_balloon.c: .validate = virtballoon_validate,
> sound/virtio/virtio_card.c: .validate = virtsnd_validate,
>
> But only blk and net access config space from validate.
>
> >
> >
> > > ---
> > > drivers/virtio/virtio.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..2b9358f2e22a 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
> > > driver_features_legacy = driver_features;
> > > }
> > >
> > > + /* Write F_VERSION_1 feature to pin down endianness */
> > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
> > > + dev->features = (1ULL << VIRTIO_F_VERSION_1);
> > > + dev->config->finalize_features(dev);
> > > + }
> > > +
> > > if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > > dev->features = driver_features & device_features;
> > > else
> > > --
> > > 2.31.1
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@...ts.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Powered by blists - more mailing lists