lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Mar 2020 15:36:29 -0400
From:   Pavel Tatashin <pasha.tatashin@...een.com>
To:     Daniel Jordan <daniel.m.jordan@...cle.com>
Cc:     Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] mm: fix tick timer stall during deferred page init

I agree with Daniel, we should look into approach where
pgdat_resize_lock is taken only for the duration of updating tracking
values such as pgdat->first_deferred_pfn (perhaps we would need to add
another tracker that would show chunks that are currently being worked
on).

The vast duration of struct page initialization process should happen
outside of this lock, and only be taken when we update globally seen
data structures: lists, tracking variables. This way we can solve
several problems: 1. allow interrupt threads to grow zones if
required. 2. keep jiffies happy. 3. allow future scaling when we will
add inner node threads to initialize struct pages (i.e. ktasks from
Daniel).

Pasha

On Thu, Mar 26, 2020 at 2:58 PM Daniel Jordan
<daniel.m.jordan@...cle.com> wrote:
>
> On Thu, Mar 19, 2020 at 03:05:12PM -0400, Daniel Jordan wrote:
> > Regardless,
> > Reviewed-by: Daniel Jordan <daniel.m.jordan@...cle.com>
>
> Darn, I spoke too soon.
>
> On a two-socket Xeon, smaller values of TICK_PAGE_COUNT caused the deferred
> init timestamp to grow by over 25%.  This was with pgdatinit0 bound to the
> timer interrupt CPU to make sure the issue always reproduces.
>
>                TICK_PAGE_COUNT     node 0 deferred
>                                    init time (ms)
>                ---------------     ---------------
>                           4096                 610
>                           8192                 587
>                          16384                 487
>                          32768                 480    // used in the patch
>
> Instead of trying to find a constant that lets the timer interrupt run often
> enough, I think a better way forward is to reconsider how we handle the resize
> lock.  I plan to prototype something and reply back with what I get.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ