[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <896458e8daf87a274ba1ce8ced30ac8e@codeaurora.org>
Date: Thu, 20 Aug 2020 21:17:28 -0700
From: cgoldswo@...eaurora.org
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org, pratikp@...eaurora.org,
pdaly@...eaurora.org, sudraja@...eaurora.org,
iamjoonsoo.kim@....com, linux-arm-msm-owner@...r.kernel.org
Subject: Re: cma_alloc(), add sleep-and-retry for temporary page pinning
On 2020-08-11 15:20, cgoldswo@...eaurora.org wrote:
> On 2020-08-06 18:31, Andrew Morton wrote:
>> On Wed, 5 Aug 2020 19:56:21 -0700 Chris Goldsworthy
>> <cgoldswo@...eaurora.org> wrote:
>>
>>> On mobile devices, failure to allocate from a CMA area constitutes a
>>> functional failure. Sometimes during CMA allocations, we have
>>> observed
>>> that pages in a CMA area allocated through alloc_pages(), that we're
>>> trying
>>> to migrate away to make room for a CMA allocation, are temporarily
>>> pinned.
>>> This temporary pinning can occur when a process that owns the pinned
>>> page
>>> is being forked (the example is explained further in the commit
>>> text).
>>> This patch addresses this issue by adding a sleep-and-retry loop in
>>> cma_alloc() . There's another example we know of similar to the above
>>> that
>>> occurs during exit_mmap() (in zap_pte_range() specifically), but I
>>> need to
>>> determine if this is still relevant today.
>>
>
>> Sounds fairly serious but boy, we're late for 5.9.
>>
>> I can queue it for 5.10 with a cc:stable so that it gets backported
>> into earlier kernels a couple of months from now, if we think the
>> seriousness justifies backporting(?).
>>
>
> Queuing this seems like the best way to proceed, if we were to pick up
> this patch.
> I think we can forgo back-porting this, as this is something that will
> only be
> needed as vendors such as our selves start using Google's Generic
> Kernel Image
> (we've carried this patch in our tree for over four years).
>
>>
>> And... it really is a sad little patch, isn't it? Instead of fixing
>> the problem, it reduces the problem's probability by 5x. Can't we do
>> better than this?
>
> I have one alternative in mind. I have been able to review the
> exit_mmap()
> case, so before proceeding, let's do a breakdown of the problem: we can
> categorize the pinning issue we're trying to address here as being one
> of
> (1) incrementing _refcount and getting context-switched out before
> incrementing _mapcount (applies to forking a process / copy_one_pte()),
> and
> (2) decrementing _mapcount and getting context-switched out before
> decrementing _refcount (applies to tearing down a process /
> exit_mmap()).
> So, one alternative would be to insert preempt_disable/enable() calls
> at
> affected sites. So, for the copy_one_pte() pinning case, we could do
> the
> following inside of copy_one_pte():
>
> if (page) {
> + preempt_disable();
> get_page(page);
> page_dup_rmap(page, false);
> + preempt_enable();
> rss[mm_counter(page)]++;
> }
>
> I'm not sure if this approach would be acceptable for the exit_mmap()
> pinning case (applicable when CONFIG_MMU_GATHER_NO_GATHER=y). For the
> purposes of this discussion, we can look at two function calls inside
> of
> exit_mmap(), in the order they're called in, to show how the pinning is
> occuring:
>
> 1. Calling unmap_vmas(): this unmaps the pages in each VMA for an
> exiting task, using zap_pte_range() - zap_pte_range() reduces the
> _mapcount for each page in a VMA, using page_remove_rmap(). After
> calling page_remove_rmap(), the page is placed into a list in
> __tlb_remove_page(). This list of pages will be used when flushing
> TLB entries later on during the process teardown.
>
> 2. Calling tlb_finish_mmu(): This is will flush the TLB entries
> associated with pages, before calling put_page() on them, using the
> previously collected pages from __tlb_remove_page() - the call flow
> is
> tlb_flush_mmu() > tlb_flush_mmu() > tlb_flush_mmu_free()
> > tlb_batch_pages_flush() > free_pages_and_swap_cache() >
> release_pages(), where release_pages() is described as a "batched
> put_page()"
>
> The preempt_disable/enable() approach would entail doing the following
> inside of exit_mmap():
>
> + preempt_disable();
> unmap_vmas(&tlb, vma, 0, -1);
> free_pgtables(&tlb, vma, FIRST_USER_ADDRESS,
> USER_PGTABLES_CEILING);
> tlb_finish_mmu(&tlb, 0, -1);
> + preempt_enable();
>
> I'm not sure doing this is feasible, given how long it could take to do
> the
> process teardown.
>
> The good thing about this patch is that it has been stable in our
> kernel
> for four years (though for some SoCs we increased the retry counts).
> One
> thing to stress is that there are other instances of CMA page pinning,
> that
> this patch isn't attempting to address. Please let me know if you're
> okay
> with queuing this for the 5.10 merge window - if you are, I can add an
> option to configure the number of retries, and will resend the patch
> once
> the 5.9 merge window closes.
>
> Thanks,
>
> Chris.
Hi Andrew,
Have you been able to give the patch any further consideration?
Thanks,
Chris.
--
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project
Powered by blists - more mailing lists