linux-kernel - Re: [PATCH] vfio/pci: Support error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58465F0B.5040401@cn.fujitsu.com>
Date:   Tue, 6 Dec 2016 14:47:39 +0800
From:   Cao jin <caoj.fnst@...fujitsu.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
CC:     Alex Williamson <alex.williamson@...hat.com>,
        <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <izumi.taku@...fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery



On 12/06/2016 11:46 AM, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2016 at 09:40:23PM +0800, Cao jin wrote:
>>
>>
>> On 12/01/2016 12:51 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 30, 2016 at 09:04:13PM -0700, Alex Williamson wrote:
>>>> On Sun, 27 Nov 2016 19:34:17 +0800
>>>> Cao jin <caoj.fnst@...fujitsu.com> wrote:
>>>>
>>
>>>>> @@ -1187,10 +1200,30 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>>>  		return PCI_ERS_RESULT_DISCONNECT;
>>>>>  	}
>>>>>  
>>>>> +	/* get device's uncorrectable error status as soon as possible,
>>>>> +	 * and signal it to user space. The later we read it, the possibility
>>>>> +	 * the register value is mangled grows. */
>>>>
>>>> Bad comment style (throughout).
>>>>
>>>>> +	aer_cap_offset = pci_find_ext_capability(vdev->pdev, PCI_EXT_CAP_ID_ERR);
>>>>> +	ret = pci_read_config_dword(vdev->pdev, aer_cap_offset + 
>>>>> +                                    PCI_ERR_UNCOR_STATUS, &uncor_status);
>>>>> +        if (ret)
>>>>> +                return PCI_ERS_RESULT_DISCONNECT;
>>>>> +
>>>>> +	pr_err("device %d got AER detect notification. uncorrectable error status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
>>>>>  	mutex_lock(&vdev->igate);
>>>>> +    
>>>>> +	vdev->aer_recovering = true;
>>>>> +	reinit_completion(&vdev->aer_error_completion);
>>>>> +
>>>>> +	/* suspend config space access from user space,
>>>>> +	 * when vfio-pci's error recovery process is on */
>>>>> +	pci_cfg_access_trylock(vdev->pdev);
>>>>>  
>>>>> -	if (vdev->err_trigger)
>>>>> -		eventfd_signal(vdev->err_trigger, 1);
>>>>> +	if (vdev->err_trigger && uncor_status) {
>>>>> +		pr_err("device %d signal uncor status to user space", pdev->devfn);//may be removed
>>>>> +		/* signal uncorrectable error status to user space */
>>>>> +		eventfd_signal(vdev->err_trigger, uncor_status);
>>>
>>> What does this mean that we trigger with uncor_status as opposed to 1?
>>>
>>
>> I guess you missed the changelog in qemu patchset's cover letter, see if
>> it helps(copy from cover letter):
>>
>> 5. Change what eventfd signals(check vfio driver patch). Before,
>> vfio-pci driver add 1 to the counter, which doesn't have meaning, just
>> for notification. Now, vfio-pci's error detected handler read the
>> uncorrectable error status and signal it to qemu.
> 
> I don't think you can use an eventfd to send values like this.
> 
> eventfd does a sum of these values, so sending e.g.
> value 2 will look the same as sending value 1 twice.
> 

Yes, eventfd has a uint_64 counter, and eventfd_signal does a sum.

manpage of "eventfd" says:

The semantics of read(2) depend on whether the eventfd counter currently
has a nonzero value and whether the EFD_SEMAPHORE flag was specified
when creating the eventfd file descriptor:

  *  If EFD_SEMAPHORE was not specified and the eventfd counter has a
nonzero value, then a read(2) returns 8 bytes containing that value, and
the counter's value is reset to zero.

......

  *  If the eventfd counter is zero at the time of the call to  read(2),
 then the call either blocks until the counter becomes  nonzero (at
which time, the read(2) proceeds as described above) or fails with the
error EAGAIN if the file descriptor has been made nonblocking.

And in actual tests, the debug line I add in vfio_err_notifier_handler
shows[*]: qemu read this value right every time, the same as in vfio-pci
driver.

quick reference:
[*]https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg04831.html

> 
>> Why? When testing previous version(host aer driver still reset link),
>> found that there is quite possibility that register reading returns the
>> invalid value 0xFFFFFFFF, which results in all(2 in my environment)
>> qemu's vfio-pci function send AER to root port while I only inject error
>> into one function. This is unreasonable, and I think it is the result of
>> reading during device reset.
>>
>> Previous patch does considered to find the
>> real function who has error happened, but won't work under the situation
>> I found. So move the register reading as early as possible would be the
>> nature thoughts, and it is moved into vfio-pci driver. Although now
>> reset_link is disabled in aer driver, get the uncor error status as
>> early as possible still make sense, for example: if user space driver
>> does device reset once receiving the error notification, and then read
>> register.
> 
> I had trouble understanding the above. Let me ask you this:
> should we try to avoid triggering uncorrectable errors?
> Aren't any such errors likely to crash guests?
> 

Sorry I don't quite understand why your questions are raised, it seems
just describing the current situation without my AER patch: when assign
device via vfio, we don't handle uncorrectable error, just
vm_stop()(this is what you mean "crash guests", right?), this is the
contents of current vfio_err_notifier_handler().

Without our AER patches, the answer to your questions are YES. But the
purpose of our patch is to fix your questions. Not sure my answer helps
you, but feel free to ask if still any questions.


-- 
Sincerely,
Cao jin