linux-kernel - Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb51485d-1d53-3b67-f123-d9eded545454@linux.ibm.com>
Date:   Fri, 21 Jan 2022 16:40:30 +0100
From:   Christian Borntraeger <borntraeger@...ux.ibm.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        Michael Ellerman <mpe@...erman.id.au>,
        aleksandar.qemu.devel@...il.com, alexandru.elisei@....com,
        anup.patel@....com, aou@...s.berkeley.edu, atish.patra@....com,
        bp@...en8.de, catalin.marinas@....com, chenhuacai@...nel.org,
        dave.hansen@...ux.intel.com, frankja@...ux.ibm.com,
        frederic@...nel.org, gor@...ux.ibm.com, hca@...ux.ibm.com,
        james.morse@....com, jmattson@...gle.com, joro@...tes.org,
        luto@...nel.org, maz@...nel.org, mingo@...hat.com,
        nsaenzju@...hat.com, palmer@...belt.com, paulmck@...nel.org,
        paul.walmsley@...ive.com, peterz@...radead.org, seanjc@...gle.com,
        suzuki.poulose@....com, svens@...ux.ibm.com, tglx@...utronix.de,
        tsbogend@...ha.franken.de, vkuznets@...hat.com,
        wanpengli@...cent.com, will@...nel.org,
        Anup Patel <anup@...infault.org>,
        Atish Patra <atishp@...shpatra.org>
Subject: Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs



Am 21.01.22 um 16:29 schrieb Mark Rutland:
> On Fri, Jan 21, 2022 at 03:42:48PM +0100, Christian Borntraeger wrote:
>> Am 21.01.22 um 15:30 schrieb Mark Rutland:
>>> On Fri, Jan 21, 2022 at 03:17:01PM +0100, Christian Borntraeger wrote:
>>>> Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
>>>>> Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
>>>>>> Am 20.01.22 um 13:03 schrieb Mark Rutland:
>>>>>>> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>>>>>>>> On 1/19/22 20:22, Mark Rutland wrote:
>>>>>>>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>>>>>>>
>>>>>>>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>>>>>>>> things before a ctx-switch to the idle thread, which would then be able
>>>>>>>>> to hit this.
>>>>>>>>>
>>>>>>>>> I'll need to go audit the other architectures for similar.
>>>>>>>>
>>>>>>>> They don't enable interrupts in the entry/exit path so they should be okay.
>>>>>>>
>>>>>>> True.
>>>>>>>
>>>>>>> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
>>>>>>> right thing to do. I'll add that and explanatory commentary.
>>>>>>
>>>>>> That would not be trivial I guess. We do allow (and need) page faults on sie for guest
>>>>>> demand paging and
>>>>>>
>>>>>> this piece of arch/s390/mm/fault.c
>>>>>>
>>>>>>           case GMAP_FAULT:
>>>>>>                    if (faulthandler_disabled() || !mm)
>>>>>>                            goto out;
>>>>>>                    break;
>>>>>>            }
>>>>>>
>>>>>> would no longer work since faulthandler_disabled checks for the preempt count.
>>>>>>
>>>>>
>>>>>
>>>>> Something like this
>>>>>
>>>>>
>>>>> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
>>>>> index d30f5986fa85..1c7d45346e12 100644
>>>>> --- a/arch/s390/mm/fault.c
>>>>> +++ b/arch/s390/mm/fault.c
>>>>> @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>>>>>                            return 0;
>>>>>                    goto out;
>>>>>            case USER_FAULT:
>>>>> -       case GMAP_FAULT:
>>>>>                    if (faulthandler_disabled() || !mm)
>>>>>                            goto out;
>>>>>                    break;
>>>>> +               /*
>>>>> +                * We know that we interrupted SIE and we are not in a IRQ.
>>>>> +                * preemption might be disabled thus checking for in_atomic
>>>>> +                * would result in failures
>>>>> +                */
>>>>> +       case GMAP_FAULT:
>>>>> +               if (pagefault_disabled() || !mm)
>>>>> +                       goto out;
>>>>> +               break;
>>>>>            }
>>>>>
>>>>>            perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
>>>>>
>>>>> seems to work with preemption disabled around sie. Not sure yet if this is correct.
>>>>
>>>>
>>>> No it does not work. scheduling while preemption is disabled.
>>>
>>> Hmm... which exceptions do we expect to take from a guest?
>>>
>>> I wonder if we can handle those more like other architectures by getting those
>>> to immediately return from the sie64a() call with some status code that we can
>>> handle outside of the guest_state_{enter,exit}_irqoff() critical section.
>>
>> We take all kind of page faults (invalid->valid, ro->rw) on the sie instruction and
>> run that in the context of the pgm_check handler just like for userspace.
> 
> Do we only expect to tak faults from a guest (which IIUC at the GMAP_FAULT
> cases in the bit above), or are there other esceptions we expect to take from
> the middle of a SIE?
> 
>> the pgm_check handler does a sie_exit (similar to the interrupt handlers) by
>> setting the return IA.
> 
> Sure, but that sie_exit happens *after* the handler runs, where as I'm asking
> whether we can structure this like all the other architectures and turn all
> exceptions into returns from sie64a() that we can handle after having called
> guest_state_exit_irqoff().
> 
>>> On arm64 in VHE mode, we swap our exception vectors when entering/exiting the
>>> guest to allow us to do that; I wonder if we could do similar here?
> 
> Does this idea sound at all plausible?

Maybe. We already have something like that for async_pf (see kvm_arch_setup_async_pf)
where we handle the pagefault later on after first kicking off the resolution for major
page faults (via FAULT_FLAG_RETRY_NOWAIT) in the low level handler.
Let me think about it.