linux-kernel - Re: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f8e73ed7-f45f-0f5d-9055-486fb83dcd82@linux.alibaba.com>
Date:   Thu, 14 Oct 2021 22:18:54 +0800
From:   乱石 <zhangliguang@...ux.alibaba.com>
To:     James Morse <james.morse@....com>
Cc:     linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        Tony Luck <tony.luck@...el.com>,
        linux-arm-kernel@...ts.infradead.org,
        Borislav Petkov <bp@...en8.de>, Len Brown <lenb@...nel.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        huangming@...ux.alibaba.com
Subject: Re: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei
 flow

Hi,

在 2021/10/14 1:44, James Morse 写道:
> Hello!
>
> On 12/10/2021 15:29, Liguang Zhang wrote:
>> When hest acpi table configure Hardware Error Notification type as
>> Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with
>> ATF by SDEI mechanism. On the firmware first system, OS was notified by
>> ATF sdei call.
>>
>> The calling flow like as below when fatal RAS error happens:
>>
>> ATF notify OS flow:
>>    sdei_dispatch_event()
>>      ehf_activate_priority()
>>        call sdei callback  // callback registered by OS
>>      ehf_deactivate_priority()
>>
>> OS sdei callback:
>>    sdei_asm_handler()
>>      __sdei_handler()
>>        _sdei_handler()
>>          sdei_event_handler()
>>            ghes_sdei_critical_callback()
>>              ghes_in_nmi_queue_one_entry()
>>                /* if RAS error is fatal */
>>                __ghes_panic()
>>                  panic()
>>
>> If fatal RAS error occured, panic was called in sdei_asm_handle()
>> without ehf_deactivate_priority executed, which lead interrupt masked.
> So far the story is:
> Firmware generated and SDEI event (a kind of software NMI) because of a firmware
> interrupt, but it hasn't completely handled the interrupt.
>
>
>> If interrupt masked, system would be halted in kdump flow like this:
>>
>> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq
>> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq
>> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq
>> arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting...
> How and why do firmware interrupts affect the IOMMU?
>
> It sounds like you are sharing something with firmware that you shouldn't.
>
>
>> After debug, we found accurate halted position is:
>> arm_smmu_device_probe()
>>    arm_smmu_device_reset()
>>      arm_smmu_device_disable()
>>        arm_smmu_write_reg_sync()
>>          readl_relaxed_poll_timeout()
>>            readx_poll_timeout()
>>              read_poll_timeout()
>>                usleep_range() // hrtimer is never waked.
>>
>> So interrupt should be restored before panic otherwise kdump will trigger
>> error.
> Why can't firmware finish with the interrupt before injecting the SDEI event?
> If you need it to not happen a second time while the handler runs, you can always disable it.
>
> The text in the spec about the interaction of complete and physical interrupts is for
> bound interrupts. Linux doesn't support these. It isn't possible for linux to know whether
> firmware tied any other kind of event to a physical interrupt or not.
>
>
>> In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call
>> should be called before panic for a completed run of ehf_deactivate_priority().
> SDEI_EVENT_COMPLETE_AND_RESUME is a complete, it tells firmware to restore the execution
> state from before the event. You get almost get away with x17-x30 being corrupted as
> panic() won't return - but the stack trace produced will be corrupt. If the original
> exception was from user-space, SP_EL0 will have been restored to be the user value. The
> kernel uses this for 'current'.
>
>
> The way this is supposed to work is the die-ing kernel calls SDEI_PE_MASK while it does
> the kdump reboot. Once the kdump kernel has started, the SDEI_PRIVATE_RESET and
> SDEI_SHARED_RESET calls should fix anything left over in firmware.
>
>
> Could you debug why firmware interrupts being active prevent the SMMU from being reset. As
> far as I can tell, those should be totally independent.

If ehf_deactivate_priority() was not executed, pmr_el1 register was not 
resumed to >0x80, which leads

non-secure interrupts masked. arm_smmu_device_probe() finally called 
usleep_range() which based on

hrtimer. Because non-secure timer interrupts was masked, usleep_range 
would not reponse.

Thanks.

Liguang

>
>
> Thanks,
>
> James