lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Apr 2020 18:15:59 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Daniel Jordan <daniel.m.jordan@...cle.com>
Cc:     Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] mm: fix tick timer stall during deferred page init

On 01.04.20 18:12, Michal Hocko wrote:
> On Wed 01-04-20 12:09:29, Daniel Jordan wrote:
>> On Wed, Apr 01, 2020 at 06:00:48PM +0200, Michal Hocko wrote:
>>> On Wed 01-04-20 17:50:22, David Hildenbrand wrote:
>>>> On 01.04.20 17:42, Michal Hocko wrote:
>>>>> I am sorry but I have completely missed this patch.
>>>>>
>>>>> On Wed 11-03-20 20:38:48, Shile Zhang wrote:
>>>>>> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will
>>>>>> initialise the deferred pages with local interrupts disabled. It is
>>>>>> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while
>>>>>> initializing deferred pages").
>>>>>>
>>>>>> On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to
>>>>>> the boot CPU, which could caused the tick timer long time stall, system
>>>>>> jiffies not be updated in time.
>>>>>>
>>>>>> The dmesg shown that:
>>>>>>
>>>>>>     [    0.197975] node 0 initialised, 32170688 pages in 1ms
>>>>>>
>>>>>> Obviously, 1ms is unreasonable.
>>>>>>
>>>>>> Now, fix it by restore in the pending interrupts for every 32*1204 pages
>>>>>> (128MB) initialized, give the chance to update the systemd jiffies.
>>>>>> The reasonable demsg shown likes:
>>>>>>
>>>>>>     [    1.069306] node 0 initialised, 32203456 pages in 894ms
>>>>>>
>>>>>> Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages").
>>>>>
>>>>> I dislike this solution TBH. It effectivelly conserves the current code
>>>>> and just works around the problem. Why do we hold the IRQ lock here in
>>>>> the first place? This is an early init code and a very limited code is
>>>>> running at this stage. Certainly nothing memory hotplug related which
>>>>> should be the only path really interested in the resize lock AFAIR.
>>>>
>>>> Yeah, I don't think ACPI and friends are up yet.
>>>
>>> Just to save somebody time to check. The deferred initialization blocks
>>> the further boot until all workders are done - see page_alloc_init_late
>>> (kernel_init path).
>>
>> Ha, I just finished following all the hotplug paths to check this out, and as
>> you all know there are a *lot* :-) Well at least we're in agreement.
> 
> Good to have it double checked!
>  
>>>>> This needs a double checking but I strongly believe that the lock can be
>>>>> simply dropped in this path.
>>
>> This is what my fix does, it limits the time the resize lock is held.
> 
> Just remove it from the deferred intialization and add a comment that we
> deliberately not taking the lock here because abc
> 

We should just double check what

commit 3a2d7fa8a3d5ae740bd0c21d933acc6220857ed0
Author: Pavel Tatashin <pasha.tatashin@...cle.com>
Date:   Thu Apr 5 16:22:27 2018 -0700

    mm: disable interrupts while initializing deferred pages
    
    Vlastimil Babka reported about a window issue during which when deferred
    pages are initialized, and the current version of on-demand
    initialization is finished, allocations may fail.  While this is highly
    unlikely scenario, since this kind of allocation request must be large,
    and must come from interrupt handler, we still want to cover it.
    
    We solve this by initializing deferred pages with interrupts disabled,
    and holding node_size_lock spin lock while pages in the node are being
    initialized.  The on-demand deferred page initialization that comes
    later will use the same lock, and thus synchronize with
    deferred_init_memmap().
    
    It is unlikely for threads that initialize deferred pages to be
    interrupted.  They run soon after smp_init(), but before modules are
    initialized, and long before user space programs.  This is why there is
    no adverse effect of having these threads running with interrupts
    disabled.

tried to solve does not apply.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ