linux-kernel - Re: [PATCH] s390/vfio-ap: handle response code 01 on queue reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d780a15a7c073e7d437f8120a72e8d29@linux.ibm.com>
Date:   Tue, 05 Dec 2023 09:04:23 +0100
From:   Harald Freudenberger <freude@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Tony Krowiak <akrowiak@...ux.ibm.com>,
        linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, jjherne@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        frankja@...ux.ibm.com, imbrenda@...ux.ibm.com, david@...hat.com,
        Reinhard Buendgen <BUENDGEN@...ibm.com>
Subject: Re: [PATCH] s390/vfio-ap: handle response code 01 on queue reset

On 2023-12-04 17:15, Halil Pasic wrote:
> On Mon, 4 Dec 2023 16:16:31 +0100
> Christian Borntraeger <borntraeger@...ux.ibm.com> wrote:
> 
>> Am 04.12.23 um 15:53 schrieb Tony Krowiak:
>> >
>> >
>> > On 11/29/23 12:12, Christian Borntraeger wrote:
>> >> Am 29.11.23 um 15:35 schrieb Tony Krowiak:
>> >>> In the current implementation, response code 01 (AP queue number not valid)
>> >>> is handled as a default case along with other response codes returned from
>> >>> a queue reset operation that are not handled specifically. Barring a bug,
>> >>> response code 01 will occur only when a queue has been externally removed
>> >>> from the host's AP configuration; nn this case, the queue must
>> >>> be reset by the machine in order to avoid leaking crypto data if/when the
>> >>> queue is returned to the host's configuration. The response code 01 case
>> >>> will be handled specifically by logging a WARN message followed by cleaning
>> >>> up the IRQ resources.
>> >>>
>> >>
>> >> To me it looks like this can be triggered by the LPAR admin, correct? So it
>> >> is not desireable but possible.
>> >> In that case I prefer to not use WARN, maybe use dev_warn or dev_err instead.
>> >> WARN can be a disruptive event if panic_on_warn is set.
>> >
>> > Yes, it can be triggered by the LPAR admin. I can't use dev_warn here because we don't have a reference to any device, but I can use pr_warn if that suffices.
>> 
>> Ok, please use pr_warn then.
> 
> Shouldn't we rather make this an 'info'. I mean we probably do not want
> people complaining about this condition. Yes it should be a best 
> practice
> to coordinate such things with the guest, and ideally remove the 
> resource
> from the guest first. But AFAIU our stack is supposed to be able to
> handle something like this. IMHO issuing a warning is excessive 
> measure.
> I know Reinhard and Tony probably disagree with the last sentence
> though.

Halil, Tony, the thing about about info versus warning versus error is 
our
own stuff. Keep in mind that these messages end up in the "debug 
feature"
as FFDC data. So it comes to the point which FFDC data do you/Tony want 
to
see there ? It should be enough to explain to a customer what happened
without the need to "recreate with higher debug level" if something 
serious
happened. So my private decision table is:
1) is it something serious, something exceptional, something which may 
not
    come up again if tried to recreate ? Yes -> make it visible on the 
first
    occurrence as error msg.
2) is it something you want to read when a customer hits it and you tell 
him
    to extract and examine the debug feature data ? Yes -> make it a 
warning
    and make sure your debug feature by default records warnings.
3) still serious, but may flood the debug feature. Good enough and high
    probability to reappear on a recreate ? Yes -> make it an info 
message
    and live with the risk that you may not be able to explain to a 
customer
    what happened without a recreate and higher debug level.
4) not 1-3, -> maybe a debug msg but still think about what happens when 
a
    customer enables "debug feature" with highest level. Does it squeeze 
out
    more important stuff ? Maybe make it dynamic debug with pr_debug() 
(see
    kernel docu admin-guide/dynamic-debug-howto.rst).

> 
> Regards,
> Halil