linux-kernel - Re: [PATCH] mm: release the spinlock on zap_pte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190729082052.GA258885@google.com>
Date:   Mon, 29 Jul 2019 17:20:52 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Miguel de Dios <migueldedios@...gle.com>,
        Wei Wang <wvw@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCH] mm: release the spinlock on zap_pte_range

On Mon, Jul 29, 2019 at 09:45:23AM +0200, Michal Hocko wrote:
> On Mon 29-07-19 16:10:37, Minchan Kim wrote:
> > In our testing(carmera recording), Miguel and Wei found unmap_page_range
> > takes above 6ms with preemption disabled easily. When I see that, the
> > reason is it holds page table spinlock during entire 512 page operation
> > in a PMD. 6.2ms is never trivial for user experince if RT task couldn't
> > run in the time because it could make frame drop or glitch audio problem.
> 
> Where is the time spent during the tear down? 512 pages doesn't sound
> like a lot to tear down. Is it the TLB flushing?

Miguel confirmed there is no such big latency without mark_page_accessed
in zap_pte_range so I guess it's the contention of LRU lock as well as
heavy activate_page overhead which is not trivial, either.

> 
> > This patch adds preemption point like coyp_pte_range.
> > 
> > Reported-by: Miguel de Dios <migueldedios@...gle.com>
> > Reported-by: Wei Wang <wvw@...gle.com>
> > Cc: Michal Hocko <mhocko@...nel.org>
> > Cc: Johannes Weiner <hannes@...xchg.org>
> > Cc: Mel Gorman <mgorman@...hsingularity.net>
> > Signed-off-by: Minchan Kim <minchan@...nel.org>
> > ---
> >  mm/memory.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 2e796372927fd..bc3e0c5e4f89b 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1007,6 +1007,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> >  				struct zap_details *details)
> >  {
> >  	struct mm_struct *mm = tlb->mm;
> > +	int progress = 0;
> >  	int force_flush = 0;
> >  	int rss[NR_MM_COUNTERS];
> >  	spinlock_t *ptl;
> > @@ -1022,7 +1023,16 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> >  	flush_tlb_batched_pending(mm);
> >  	arch_enter_lazy_mmu_mode();
> >  	do {
> > -		pte_t ptent = *pte;
> > +		pte_t ptent;
> > +
> > +		if (progress >= 32) {
> > +			progress = 0;
> > +			if (need_resched())
> > +				break;
> > +		}
> > +		progress += 8;
> 
> Why 8?

Just copied from copy_pte_range.