linux-kernel - Re: [PATCH RFC] vfio error recovery: kernel support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5881E2C2.2060502@cn.fujitsu.com>
Date:   Fri, 20 Jan 2017 18:13:22 +0800
From:   Cao jin <caoj.fnst@...fujitsu.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
CC:     <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <qemu-devel@...gnu.org>, <izumi.taku@...fujitsu.com>,
        <alex.williamson@...hat.com>
Subject: Re: [PATCH RFC] vfio error recovery: kernel support



On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote:
> This is a design and an initial patch for kernel side for AER
> support in VFIO.
> 
> 0. What happens now (PCIE AER only)
>    Fatal errors cause a link reset.
>    Non fatal errors don't.
>    All errors stop the VM eventually, but not immediately
>    because it's detected and reported asynchronously.
>    Interrupts are forwarded as usual.
>    Correctable errors are not reported to guest at all.
>    Note: PPC EEH is different. This focuses on AER.
> 
> 1. Correctable errors
>    I don't see a need to report these to guest. So let's not.
> 
> 2. Fatal errors
>    It's not easy to handle them gracefully since link reset
>    is needed. As a first step, let's use the existing mechanism
>    in that case.
>    
> 2. Non-fatal errors
>    Here we could make progress by reporting them to guest
>    and have guest handle them.
>    Issues:
>     a. this behaviour should only be enabled with new userspace
>        old userspace should work without changes
>     Suggestion: One way to address this would be to add a new eventfd
>     non_fatal_err_trigger. If not set, invoke err_trigger.
> 
>     b. drivers are supposed to stop MMIO when error is reported
>     if vm keeps going, we will keep doing MMIO/config
>     Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway,
>     so we are not making things much worse
>     Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>     Patch below implements Suggestion 1.
> 
>     c. PF driver might detect that function is completely broken,
>     if vm keeps going, we will keep doing MMIO/config
>     Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway,
>     so we are not making things much worse
>     Suggestion 2: detect this and invoke err_trigger to stop VM
> 
>     Patch below implements Suggestion 2.
> 
> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device
> is not attached. This seems bogus, likely based on the confusing name.
> We probably should return PCI_ERS_RESULT_CAN_RECOVER.
> 
> The following patch does not change that.
> 
> Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> 
> ---
> 
> The patch is completely untested. Let's discuss the design first.
> Cao jin, if this is deemed acceptable please take it from here.
> 

Ok, thanks very much.

> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index dce511f..fdca683 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1292,7 +1292,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>  	mutex_lock(&vdev->igate);
>  
> -	if (vdev->err_trigger)
> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);
> +	else if (vdev->err_trigger)
>  		eventfd_signal(vdev->err_trigger, 1);
>  
>  	mutex_unlock(&vdev->igate);
> @@ -1302,8 +1304,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev,
> +						pci_channel_state_t state)
> +{
> +	struct vfio_pci_device *vdev;
> +	struct vfio_device *device;
> +
> +	device = vfio_device_get_from_dev(&pdev->dev);
> +	if (!device)
> +		goto err_dev;
> +
> +	vdev = vfio_device_data(device);
> +	if (!vdev)
> +		goto err_dev;
> +
> +	mutex_lock(&vdev->igate);
> +
> +	if (vdev->err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);
> +
> +	mutex_unlock(&vdev->igate);
> +
> +	vfio_device_put(device);
> +
> +err_data:
> +	vfio_device_put(device);
> +err_dev:
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.slot_reset = vfio_pci_aer_slot_reset,
>  };
>  

if .slot_reset wants to be called, .error_detected should return
PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code.

Is .slot_reset now just a copy of .error_detected and we are going do
some tricks here? or else don't get why .slot_reset signal user again.

>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..e883db5 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>  					       count, flags, data);
>  }
>  
> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> +				    unsigned index, unsigned start,
> +				    unsigned count, uint32_t flags, void *data)
> +{
> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> +		return -EINVAL;
> +
> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> +					       count, flags, data);
> +}
> +
>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>  			break;
>  		}
>  		break;
> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			if (pci_is_pcie(vdev->pdev))
> +				func = vfio_pci_set_err_trigger;

s/vfio_pci_set_err_trigger/vfio_pci_set_non_fatal_err_trigger

> +			break;
> +		}
> +		break;
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f37c73b..c27a507 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
>  	struct eventfd_ctx	*err_trigger;
> +	struct eventfd_ctx	*non_fatal_err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
>  };
> 

-- 
Sincerely,
Cao jin