linux-kernel - Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58CFD01F.1060508@cn.fujitsu.com>
Date:   Mon, 20 Mar 2017 20:50:39 +0800
From:   Cao jin <caoj.fnst@...fujitsu.com>
To:     Alex Williamson <alex.williamson@...hat.com>
CC:     <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <qemu-devel@...gnu.org>, <mst@...hat.com>,
        <izumi.taku@...fujitsu.com>
Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
 fatal error

Sorry for late.

On 03/14/2017 06:06 AM, Alex Williamson wrote:
> On Mon, 27 Feb 2017 15:28:43 +0800
> Cao jin <caoj.fnst@...fujitsu.com> wrote:
> 
>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
> 
> Perhaps you're only focusing on AER, but don't the error handlers we're
> using support both AER and EEH generically?  I don't think we can
> completely disregard how this affects EEH behavior, if at all.
> 

After taking a rough look at the EEH,  find that EEH always feed
error_detected with pci_channel_io_frozen, from perspective of
error_detected, EEH is not affected.  

I am not sure about a question: when assign devices in spapr host,
should all functions/devices in a PE be bound to vfio? I am kind of
confused about the relationship between a PE & a tce iommu group

>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
> 
> What does this patch change to make this happen?  I don't see
> anything.  Was this always the case?  No change?
> 

yes, no change on correctable error.

>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
> 
> Ok, so no change here either.
> 
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
> 
> In practice, what actual errors do we expect userspace to see as
> non-fatal errors? It would be useful for the commit log to describe
> the actual benefit we're going to see by splitting out non-fatal errors
> for the user (not always a guest) to see separately.  Justify that this
> is actually useful.
> 
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
> 
> This outline format was really more useful for Michael to try to
> generate discussion, for a commit log, I'd much rather see a definitive
> statement such as:
> 
>  "To maintain backwards compatibility with userspace, non-fatal errors
>  will continue to trigger via the existing error interrupt index if a
>  non-fatal signaling mechanism has not been registered."
> 
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
> 
> So again, no change here.  I'm not sure what this adds to the commit
> log, perhaps we can reference this as a link to Michael's original
> proposal.
>  
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
> 
> This needs more description and seems a bit misleading.  This patch
> adds a slot_reset handler, such that if the slot is reset, we notify
> the user, essentially promoting the non-fatal error to fatal.  But what
> condition gets us to this point?  AIUI, AER is a voting scheme and if
> any driver affected says they need a reset, everyone gets a reset.  So
> the PF driver we're talking about here is not vfio-pci and it's not the
> user, the user has no way to signal that the device is completely
> broken, this only handles the case of other collateral devices with
> native host drivers that might signal this, right?
> 

Yes, same understanding as you, if I don't miss something from michael.

> It seems like this is where this patch has the greatest exposure to
> regressions.  If we take the VM use case, previously we could have a
> non-AER aware guest and the hypervisor could stop the VM on all
> errors.  Now the hypervisor might support the distinction between fatal
> and non-fatal, but the guest may still not have AER support.  That
> doesn't imply a problem with this approach, the user (hypervisor) would
> be at fault for any difference in handling in that case.
> 

>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
> 
> What about the case where the user has not registered for receiving
> non-fatal errors, now we send an error signal on both error_detected
> and slot_reset.  Is that useful/desirable?
> 

Not desirable, but seems not harmful, guest user will stop anyway. How
to avoid this case gracefully seems not easy.

-- 
Sincerely,
Cao jin