linux-kernel - RE: [PATCH 1/2] sched/wait: Break up long wake list walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 21 Aug 2017 18:56:20 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Mel Gorman <mgorman@...e.de>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...e.hu>, "Andi Kleen" <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
        linux-mm <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk

> > Because that code sequence doesn't actually depend on
> > "wait_on_page_lock()" for _correctness_ anyway, afaik. Anybody who
> > does "migration_entry_wait()" _has_ to retry anyway, since the page
> > table contents may have changed by waiting.
> >
> > So I'm not proud of the attached patch, and I don't think it's really
> > acceptable as-is, but maybe it's worth testing? And maybe it's
> > arguably no worse than what we have now?
> >
> > Comments?
> >
> 
> The transhuge migration path for numa balancing doesn't go through the
> migration_entry_wait patch despite similarly named functions that suggest
> it does so this may only has the most effect when THP is disabled. It's
> worth trying anyway.

I just finished the test of yield patch (only functionality not performance). 
Yes, it works well with THP disabled.
With THP enabled, I observed one LOCKUP caused by long queue wait.

Here is the call stack with THP enabled. 
#
   100.00%  (ffffffff9e1aefca)
            |
            ---wait_on_page_bit
               do_huge_pmd_numa_page
               __handle_mm_fault
               handle_mm_fault
               __do_page_fault
               do_page_fault
               page_fault
               |
               |--60.39%--0x2b7b7
               |          |
               |          |--34.26%--0x127d8
               |          |          start_thread
               |          |
               |           --25.95%--0x127a2
               |                     start_thread
               |
                --39.25%--0x2b788
                          |
                           --38.81%--0x127a2
                                     start_thread


> 
> Covering both paths would be something like the patch below which spins
> until the page is unlocked or it should reschedule. It's not even boot
> tested as I spent what time I had on the test case that I hoped would be
> able to prove it really works.

I will give it a try.

Thanks,
Kan

> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 79b36f57c3ba..31cda1288176 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -517,6 +517,13 @@ static inline void wait_on_page_locked(struct page
> *page)
>  		wait_on_page_bit(compound_head(page), PG_locked);
>  }
> 
> +void __spinwait_on_page_locked(struct page *page);
> +static inline void spinwait_on_page_locked(struct page *page)
> +{
> +	if (PageLocked(page))
> +		__spinwait_on_page_locked(page);
> +}
> +
>  static inline int wait_on_page_locked_killable(struct page *page)
>  {
>  	if (!PageLocked(page))
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a49702445ce0..c9d6f49614bc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1210,6 +1210,15 @@ int __lock_page_or_retry(struct page *page,
> struct mm_struct *mm,
>  	}
>  }
> 
> +void __spinwait_on_page_locked(struct page *page)
> +{
> +	do {
> +		cpu_relax();
> +	} while (PageLocked(page) && !cond_resched());
> +
> +	wait_on_page_locked(page);
> +}
> +
>  /**
>   * page_cache_next_hole - find the next hole (not-present entry)
>   * @mapping: mapping
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 90731e3b7e58..c7025c806420 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1443,7 +1443,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf, pmd_t pmd)
>  		if (!get_page_unless_zero(page))
>  			goto out_unlock;
>  		spin_unlock(vmf->ptl);
> -		wait_on_page_locked(page);
> +		spinwait_on_page_locked(page);
>  		put_page(page);
>  		goto out;
>  	}
> @@ -1480,7 +1480,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf, pmd_t pmd)
>  		if (!get_page_unless_zero(page))
>  			goto out_unlock;
>  		spin_unlock(vmf->ptl);
> -		wait_on_page_locked(page);
> +		spinwait_on_page_locked(page);
>  		put_page(page);
>  		goto out;
>  	}
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e84eeb4e4356..9b6c3fc5beac 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -308,7 +308,7 @@ void __migration_entry_wait(struct mm_struct *mm,
> pte_t *ptep,
>  	if (!get_page_unless_zero(page))
>  		goto out;
>  	pte_unmap_unlock(ptep, ptl);
> -	wait_on_page_locked(page);
> +	spinwait_on_page_locked(page);
>  	put_page(page);
>  	return;
>  out:
>