lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkoVdxyxKopu7D=VzvnHoiA=KYgxVWwwZwahzTcfXfcg5g@mail.gmail.com>
Date:   Wed, 27 Apr 2022 15:29:25 -0700
From:   Yang Shi <shy828301@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Bibo Mao <maobibo@...ngson.cn>, Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page

On Wed, Apr 27, 2022 at 1:48 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Thu, 17 Mar 2022 02:50:24 -0400 Bibo Mao <maobibo@...ngson.cn> wrote:
>
> > collapse huge page will copy huge page from general small pages,
> > dest node is calculated from most one of source pages, however
> > THP daemon is not scheduled on dest node. The performance may be
> > poor since huge page copying across nodes, also cache is not used
> > for target node. With this patch, khugepaged daemon switches to
> > the same numa node with huge page. It saves copying time and makes
> > use of local cache better.
> >
> > With this patch, specint 2006 base performance is improved with 6%
> > on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
> >
>
> Are there any acks for this one please?

TBH, I'm a little bit reluctant to this patch. I agree running
khugepaged on the same node with the source and dest pages could
reduce cross socket traffic and use cache more efficiently. But I'm
not sure whether it is really worth it or not. For example, on a busy
system, khugepaged may jump from cpus to cpus, that may interfere with
the scheduler, and khugepaged has to wait to run on the target cpu, it
may take indefinite time. In addition the yield also depends on the
locality of source pages (how many of them are on the same node), how
often khugepaged is woken up on a different node, etc.

Even though it was proved worth it, I'd prefer set_cpus_allowed_ptr()
is called between mmap_read_unlock() and mmap_write_lock() in order to
avoid waste effort for some error paths.

>
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> >       struct vm_area_struct *vma;
> >       struct mmu_notifier_range range;
> >       gfp_t gfp;
> > +     const struct cpumask *cpumask;
> >
> >       VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> >
> > @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
> >        * that. We will recheck the vma after taking it again in write mode.
> >        */
> >       mmap_read_unlock(mm);
> > +
> > +     /* sched to specified node before huage page memory copy */
> > +     if (task_node(current) != node) {
> > +             cpumask = cpumask_of_node(node);
> > +             if (!cpumask_empty(cpumask))
> > +                     set_cpus_allowed_ptr(current, cpumask);
> > +     }
> >       new_page = khugepaged_alloc_page(hpage, gfp, node);
> >       if (!new_page) {
> >               result = SCAN_ALLOC_HUGE_PAGE_FAIL;
> > --
> > 2.31.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ