[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161208181924-mutt-send-email-mst@kernel.org>
Date: Thu, 8 Dec 2016 18:30:30 +0200
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Cao jin <caoj.fnst@...fujitsu.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
izumi.taku@...fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery
On Thu, Dec 08, 2016 at 10:46:59PM +0800, Cao jin wrote:
>
>
> On 12/06/2016 11:35 PM, Alex Williamson wrote:
> > On Tue, 6 Dec 2016 18:46:04 +0800
> > Cao jin <caoj.fnst@...fujitsu.com> wrote:
> >
> >> On 12/06/2016 12:59 PM, Alex Williamson wrote:
> >>> On Tue, 6 Dec 2016 05:55:28 +0200
> >>> "Michael S. Tsirkin" <mst@...hat.com> wrote:
> >>>
> >>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
> >>>>> If you're going to take the lead for these AER patches, I would
> >>>>> certainly suggest that understanding the reasoning behind the bus reset
> >>>>> behavior is a central aspect to this series. This effort has dragged
> >>>>> out for nearly two years and I apologize, but I don't really have a lot
> >>>>> of patience for rehashing some of these issues if you're not going to
> >>>>> read the previous discussions or consult with your colleagues to
> >>>>> understand how we got to this point. If you want to challenge some of
> >>>>> the design points, that's great, it could use some new eyes, but please
> >>>>> understand how we got here first.
> >>>>
> >>>> Well I'm guessing Cao jin here isn't the only one not
> >>>> willing to plough through all historical versions of the patchset
> >>>> just to figure out the motivation for some code.
> >>>>
> >>>> Including a summary of a high level architecture couldn't hurt.
> >>>>
> >>>> Any chance of writing such? Alternatively, we can try to build it as
> >>>> part of this thread. Shouldn't be hard as it seems somewhat
> >>>> straight-forward on the surface:
> >>>>
> >>>> - detect link error on the host, don't reset link as we would normally do
> >>>
> >>> This is actually a new approach that I'm not sure I agree with. By
> >>> skipping the host directed link reset, vfio is taking responsibility
> >>> for doing this, but then we just assume the user will do it. I have
> >>> issues with this.
> >>>
> >>> The previous approach was to use the error detected notifier to block
> >>> access to the device, allowing the host to perform the link reset. A
> >>> subsequent notification in the AER process released the user access
> >>> which allowed the user AER process to proceed. This did result in both
> >>> a host directed and a guest directed link reset, but other than
> >>> coordinating the blocking of the user process during host reset, that
> >>> hasn't been brought up as an issue previously.
> >>>
> >>
> >> Tests on previous versions didn't bring up issues as I find, I think
> >> that is because we didn't test it completely. As I know, before August
> >> of this year, we didn't have cable connected to NIC, let alone
> >> connecting NIC to gateway.
> >
> > Lack of testing has been a significant issue throughout the development
> > of this series.
> >
> >> Even if I fixed the guest oops issue in igb driver that Alex found in
> >> v9, v9 still cannot work in my test. And in my test, disable link
> >> reset(in host) in aer core for vfio-pci is the most significant step to
> >> get my test passed.
> >
> > But is it the correct step? I'm not convinced. Why did blocking guest
> > access not work? How do you plan to manage vfio taking the
> > responsibility to perform a bus reset when you don't know whether QEMU
> > is the user of the device or whether the user supports AER recovery?
> >
>
> Maybe currently we don't have enough proof to prove the correctness, but
> I think I did find some facts to prove that link reset in host is a big
> trouble, and can answer part of questions above.
>
> 1st, some thoughts:
> In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
> a recovery consists of several steps(callbacks), link reset is one of
> them, and except link reset, the others are seems kind of device
> specific. In our case, both host & guest will do recovery, I think the
> host recovery actually is some kind of fake recovery, see vfio-pci
> driver's error_detected & resume callback, they don't do anything
> special, mainly signal error to user, but the link reset in host "fake
> reset" does some serious work, in other words, I think host does the
> recovery incompletely, so I was thinking, why not just drop incompletely
> host recovery(drop link reset) for vfio-pci, and let the guest take care
> of the whole serious recovery. This is part of the reason of why my
> version looks like this. But yes, I admit the issue Alex mentioned,
> vfio can't guarantee that user will do a bus reset, this is an issue I
> will keep looking for a solution.
>
> 2nd, some facts and analyzation from test:
> In host, the relationship between time and behviour in each component
> roughly looks as following:
>
> + HW + host kernel + qemu + guest kernel +
> | |(error recovery)| | |
> | | | | |
> | | vfio-pci's | | |
> | | error_detected | | |
> | | + | | |
> | | | | | |
> | | | error notify | | |
> | | | via eventfd | | |
> | | +---------------> +----------+ | |
> | | | +vfio_err_ | | |
> | | | |notifier_ | | |
> | +---- +<---+link reset | |handler | | |
> | | HW | | | | | | |
> | | | | | | | |
> | |r | | vfio-pci's | |pass aer | | |
> | |e.. | | resume | |to guest | | |
> | |s. | | (blocking end) | | | | |
> | |e | | | | *2* | | |
> | |t | | | +----+-----+ | |
> | |i | | | | | |
> | |n | | | +--------> +----------+ |
> | |g | | | | | guest | |
> | | | | | | | recovery | |
> | | | | | | | process | |
> | | | | | | |(include | |
> | | | | | | |register | |
> | | *1* | | | | |access) | |
> | | | | | | | | |
> | | | | | | | *3* | |
> | | | | +----------+ |
> Time | | | | |
> | | | | |
> | | | | |
> | | | | |
> v | | | |
>
> Now let me try to answer: Why did blocking guest access not work?
> Some important factor:
> 1. host recovery doesn't do anything special except error notifying, so
> it may be executed very fast.
> 2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing
> it need many ms, pretty long?
>
> some facts found in v9(block config write, not read, during host
> recovery) test:
> 1. reading uncor error register in vfio_err_notifier_handler sometimes
> returns correct value, sometimes return invalid value(All F's)
>
> So, I am thinking, if host blocking on host end early, and *2*, *3* is
> parallel with *1*, the way used in v9 to blocking guest access, may not
> work.
I don't think this answers Alex's question. He is simply asking
which kind of errors are we recovering from, and why is FLR
sufficient. All this info is completely lacking in
current linux code, too. Why does it reset the link?
Express spec certainly does not say we should.
Link reset is the strongest kind of reset we can do though,
so maybe the thinking goes, let's just do it.
You would like to weaken this, but it's hard to
be sure it's enough, we are dealing with misbehaving
hardware after all.
So what would help, is a list:
- error type
- recovery required
- how to do it in a VM
As for blocking guest access, I'm not sure why it's important
to do. Guests can put devices in bad states, that's a given.
Why isn't it enough to tell guest that device is in a bad state?
I think it will try to avoid accesses then, if it does not,
it is not supposed to do anything too bad.
> --
> Sincerely,
> Cao jin
>
Powered by blists - more mailing lists