linux-kernel - Re: [PATCH 19/25] mm/s390: Use mm_fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8bd8dcf6-f2f0-d44e-9bf8-6fd4fe299aa9@de.ibm.com>
Date:   Wed, 17 Jun 2020 18:14:52 +0200
From:   Christian Borntraeger <borntraeger@...ibm.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     Alexander Gordeev <agordeev@...ux.ibm.com>,
        linux-kernel@...r.kernel.org,
        Gerald Schaefer <gerald.schaefer@...ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>, linux-s390@...r.kernel.org
Subject: Re: [PATCH 19/25] mm/s390: Use mm_fault_accounting()



On 17.06.20 18:06, Peter Xu wrote:
> Hi, Christian,
> 
> On Wed, Jun 17, 2020 at 08:19:29AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 16.06.20 18:35, Peter Xu wrote:
>>> Hi, Alexander,
>>>
>>> On Tue, Jun 16, 2020 at 05:59:33PM +0200, Alexander Gordeev wrote:
>>>>> @@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>>>>>  	if (unlikely(fault & VM_FAULT_ERROR))
>>>>>  		goto out_up;
>>>>>
>>>>> -	/*
>>>>> -	 * Major/minor page fault accounting is only done on the
>>>>> -	 * initial attempt. If we go through a retry, it is extremely
>>>>> -	 * likely that the page will be found in page cache at that point.
>>>>> -	 */
>>>>>  	if (flags & FAULT_FLAG_ALLOW_RETRY) {
>>>>> -		if (fault & VM_FAULT_MAJOR) {
>>>>> -			tsk->maj_flt++;
>>>>> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
>>>>> -				      regs, address);
>>>>> -		} else {
>>>>> -			tsk->min_flt++;
>>>>> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
>>>>> -				      regs, address);
>>>>> -		}
>>>>>  		if (fault & VM_FAULT_RETRY) {
>>>>>  			if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
>>>>>  			    (flags & FAULT_FLAG_RETRY_NOWAIT)) {
> 
> [1]
> 
>>>>
>>>> Seems like the call to mm_fault_accounting() will be missed if
>>>> we entered here with FAULT_FLAG_RETRY_NOWAIT flag set, since it
>>>> jumps to "out_up"...
>>>
>>> This is true as a functional change.  However that also means that we've got a
>>> VM_FAULT_RETRY, which hints that this fault has been requested to retry rather
>>> than handled correctly (for instance, due to some try_lock failed during the
>>> fault process).
>>>
>>> To me, that case should not be counted as a page fault at all?  Or we might get
>>> the same duplicated accounting when the page fault retried from a higher stack.
>>>
>>> Thanks
>>
>> This case below (the one with the gmap) is the KVM case for doing a so called
>> pseudo page fault to our guests. (we notify our guests about major host page
>> faults and let it reschedule to something else instead of halting the vcpu).
>> This is being resolved with either gup or fixup_user_fault asynchronously by
>> KVM code (this can also be sync when the guest does not match some conditions)
>> We do not change the counters in that code as far as I can tell so we should
>> continue to do it here.
>>
>> (see arch/s390/kvm/kvm-s390.c
>> static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>> {
>> [...]
>>         } else if (current->thread.gmap_pfault) {
>>                 trace_kvm_s390_major_guest_pfault(vcpu);
>>                 current->thread.gmap_pfault = 0;
>>                 if (kvm_arch_setup_async_pf(vcpu))
>>                         return 0;
>>                 return kvm_arch_fault_in_page(vcpu, current->thread.gmap_addr, 1);
>>         }
> 
> Please correct me if I'm wrong... but I still think what this patch does is the
> right thing to do.
> 
> Note again that IMHO when reached [1] above it means the page fault is not
> handled correctly so we need to fallback to KVM async page fault, then we
> shouldn't increment the accountings until it's finally handled correctly. That
> final accounting should be done in the async pf path in gup code where the page
> fault is handled:
> 
>   kvm_arch_fault_in_page
>     gmap_fault
>       fixup_user_fault
> 
> Where in fixup_user_fault() we have:
> 
> 	if (tsk) {
> 		if (major)
> 			tsk->maj_flt++;
> 		else
> 			tsk->min_flt++;
> 	}
> 

Right that case does work. Its the case where we do not inject a pseudo pagefault and
instead fall back to synchronous fault-in.
What is about the other case:

kvm_setup_async_pf
	->workqueue
		async_pf_execute
			get_user_pages_remote

Does get_user_pages_remote do the accounting as well? I cant see that.