linux-kernel - Re: [PATCH] s390/vfio-ap: handle response code 01 on queue reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <483f23b2-0c88-49e2-8b40-7b17cd2b46cc@linux.ibm.com>
Date:   Thu, 7 Dec 2023 10:31:06 -0500
From:   Anthony Krowiak <akrowiak@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>,
        Harald Freudenberger <freude@...ux.ibm.com>
Cc:     Christian Borntraeger <borntraeger@...ux.ibm.com>,
        linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, jjherne@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        frankja@...ux.ibm.com, imbrenda@...ux.ibm.com, david@...hat.com,
        Reinhard Buendgen <BUENDGEN@...ibm.com>
Subject: Re: [PATCH] s390/vfio-ap: handle response code 01 on queue reset


On 12/6/23 12:17 PM, Halil Pasic wrote:
> On Tue, 05 Dec 2023 09:04:23 +0100
> Harald Freudenberger <freude@...ux.ibm.com> wrote:
>
>> On 2023-12-04 17:15, Halil Pasic wrote:
>>> On Mon, 4 Dec 2023 16:16:31 +0100
>>> Christian Borntraeger <borntraeger@...ux.ibm.com> wrote:
>>>    
>>>> Am 04.12.23 um 15:53 schrieb Tony Krowiak:
>>>>>
>>>>> On 11/29/23 12:12, Christian Borntraeger wrote:
>>>>>> Am 29.11.23 um 15:35 schrieb Tony Krowiak:
>>>>>>> In the current implementation, response code 01 (AP queue number not valid)
>>>>>>> is handled as a default case along with other response codes returned from
>>>>>>> a queue reset operation that are not handled specifically. Barring a bug,
>>>>>>> response code 01 will occur only when a queue has been externally removed
>>>>>>> from the host's AP configuration; nn this case, the queue must
>>>>>>> be reset by the machine in order to avoid leaking crypto data if/when the
>>>>>>> queue is returned to the host's configuration. The response code 01 case
>>>>>>> will be handled specifically by logging a WARN message followed by cleaning
>>>>>>> up the IRQ resources.
>>>>>>>   
>>>>>> To me it looks like this can be triggered by the LPAR admin, correct? So it
>>>>>> is not desireable but possible.
>>>>>> In that case I prefer to not use WARN, maybe use dev_warn or dev_err instead.
>>>>>> WARN can be a disruptive event if panic_on_warn is set.
>>>>> Yes, it can be triggered by the LPAR admin. I can't use dev_warn here because we don't have a reference to any device, but I can use pr_warn if that suffices.
>>>> Ok, please use pr_warn then.
>>> Shouldn't we rather make this an 'info'. I mean we probably do not want
>>> people complaining about this condition. Yes it should be a besNo info logging is done via the S390 Debug Feature in vfio_ap.
>>>        There are a few warning messages logged solely in the handle_pqap
>>>        and vfio_ap_irq_enable functions. The question is, why are we
>>>        talking about the S390 Debug Feature? We are talking about using
>>>        pr_warn verses pr_info. What am I missing here?t
>>> practice
>>> to coordinate such things with the guest, and ideally remove the
>>> resource
>>> from the guest first. But AFAIU our stack is supposed to be able to
>>> handle something like this. IMHO issuing a warning is excessive
>>> measure.
>>> I know Reinhard and Tony probably disagree with the last sentence
>>> though.
>> Halil, Tony, the thing about about info versus warning versus error is
>> our
>> own stuff. Keep in mind that these messages end up in the "debug
>> feature"
>> as FFDC data. So it comes to the point which FFDC data do you/Tony want
>> to
>> see there ? It should be enough to explain to a customer what happened
>> without the need to "recreate with higher debug level" if something
>> serious
>> happened. So my private decision table is:
>> 1) is it something serious, something exceptional, something which may
>> not
>>      come up again if tried to recreate ? Yes -> make it visible on the
>> first
>>      occurrence as error msg.
>> 2) is it something you want to read when a customer hits it and you tell
>> him
>>      to extract and examine the debug feature data ? Yes -> make it a
>> warning
>>      and make sure your debug feature by default records warnings.
>> 3) still serious, but may flood the debug feature. Good enough and high
>>      probability to reappear on a recreate ? Yes -> make it an info
>> message
>>      and live with the risk that you may not be able to explain to a
>> customer
>>      what happened without a recreate and higher debug level.
>> 4) not 1-3, -> maybe a debug msg but still think about what happens when
>> a
>>      customer enables "debug feature" with highest level. Does it squeeze
>> out
>>      more important stuff ? Maybe make it dynamic debug with pr_debug()
>> (see
>>      kernel docu admin-guide/dynamic-debug-howto.rst).
> AFAIU the default log level of the S390 Debug Feature is 3 that is
> error. So warnings do not help us there by default. And if we are
> already asking the reporter to crank up the loglevel of the debug
> feature, we can as the reporter to crank it up to 5, assumed there
> is not too much stuff that log level 5 in that area... How much
> info stuff do we have for the 'ap' debug facility (I hope
> that is the facility used by vfio_ap)?


No info logging is done via the S390 Debug Feature in vfio_ap. There are 
a few warning messages logged solely in the handle_pqap and 
vfio_ap_irq_enable functions. The question is, why are we talking about 
the S390 Debug Feature given the discussion is about using pr_warn 
verses pr_info. What am I missing here?


>
> I think log levels are supposed to be primarily about severity, and
> and I'm not sure that a queue becoming unavailable in G1 without
> fist re-configuring the G2 so that it no more has access to the
> given queue is not really a warning severity thing. IMHO if we
> really do want people complaining about this should they ever see it,
> yes it should be a warning. If not then probably not.
>
> Regards,
> Halil