[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180924143247.4f4d656e.cohuck@redhat.com>
Date: Mon, 24 Sep 2018 14:32:47 +0200
From: Cornelia Huck <cohuck@...hat.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: Tony Krowiak <akrowiak@...ux.vnet.ibm.com>,
linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, freude@...ibm.com, schwidefsky@...ibm.com,
heiko.carstens@...ibm.com, borntraeger@...ibm.com,
kwankhede@...dia.com, bjsdjshi@...ux.vnet.ibm.com,
pbonzini@...hat.com, alex.williamson@...hat.com,
pmorel@...ux.vnet.ibm.com, alifm@...ux.vnet.ibm.com,
mjrosato@...ux.vnet.ibm.com, jjherne@...ux.vnet.ibm.com,
thuth@...hat.com, pasic@...ux.vnet.ibm.com, berrange@...hat.com,
fiuczy@...ux.vnet.ibm.com, buendgen@...ibm.com,
frankja@...ux.ibm.com, Tony Krowiak <akrowiak@...ux.ibm.com>
Subject: Re: [PATCH v10 13/26] s390: vfio-ap: zeroize the AP queues
On Mon, 24 Sep 2018 14:16:42 +0200
Halil Pasic <pasic@...ux.ibm.com> wrote:
> On 09/24/2018 01:36 PM, Cornelia Huck wrote:
> > On Wed, 12 Sep 2018 15:43:03 -0400
> > Tony Krowiak <akrowiak@...ux.vnet.ibm.com> wrote:
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index f8b276a..48b1b78 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >> return NOTIFY_OK;
> >> }
> >>
> >> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >> + unsigned int retry)
> >> +{
> >> + struct ap_queue_status status;
> >> +
> >> + do {
> >> + status = ap_zapq(AP_MKQID(apid, apqi));
> >> + switch (status.response_code) {
> >> + case AP_RESPONSE_NORMAL:
> >> + return 0;
> >> + case AP_RESPONSE_RESET_IN_PROGRESS:
> >> + case AP_RESPONSE_BUSY:
> >> + msleep(20);
> >> + break;
> >> + default:
> >> + /* things are really broken, give up */
> >> + return -EIO;
> >> + }
> >> + } while (retry--);
> >> +
> >> + return -EBUSY;
> >
> > So, this function may either return 0, -EIO (things are really broken),
> > or -EBUSY (still busy after multiple tries)...
> >
> >> +}
> >> +
> >> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >> +{
> >> + int ret;
> >> + int rc = 0;
> >> + unsigned long apid, apqi;
> >> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> >> + matrix_mdev->matrix.apm_max + 1) {
> >> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> >> + matrix_mdev->matrix.aqm_max + 1) {
> >> + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >> + if (ret)
> >> + rc = ret;
> >
> > ...and here, we return the last error of any of the resets. Two
> > questions:
> >
> > - Does it make sense to continue if we get -EIO? IOW, does "really
> > broken" only refer to a certain tuple and other tuples still can/need
> > to be reset?
>
> I think it does make sense to continue, because IMHO "things are really
> broken" is an overstatement (I mean the APQN invalid case). One could
> argue would skipping the current card (adapter) be justified or not.
A short comment ("even after -EIO, other devices still need to be
reset") may be helpful here (remember that I don't have any way to
verify this with the architecture).
>
> IMHO the current code is good enough for the first shot, and we can think
> about fine-tuning it later.
Sure.
>
> > - Is the return code useful in any way, as we don't know which tuple it
> > refers to?
> >
>
> Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY
> is mostly fine given what the architecture say if we are satisfied with just
> reset. And the cases behind -EIO might actually be OK too in the same sense.
> My guess is, that based on the return value client code can tell if we have
> zeroize for all queues or basically just reset (like rapq). We could log that
> to some debug facility or whatever -- I guess, but at the moment we don't care.
Logging would probably be more useful than the return code, but that
can be added later.
>
> In the end I think the code is good enough as is, and if we want we can
> improve on it later.
I don't object to that; but this is all a bit confusing to readers
without access to the architecture, so I think a comment or two would
really improve things.
>
> Regards,
> Halil
>
>
> >> + }
> >> + }
> >> +
> >> + return rc;
> >> +}
> >> +
> >> static int vfio_ap_mdev_open(struct mdev_device *mdev)
> >> {
> >> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> >> if (matrix_mdev->kvm)
> >> kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>
> >> + vfio_ap_mdev_reset_queues(mdev);
> >> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> >> &matrix_mdev->group_notifier);
> >> matrix_mdev->kvm = NULL;
> >
>
Powered by blists - more mailing lists