linux-kernel - Re: [PATCH] Revert mm/vmstat.c: fix vmstat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180411140913.GE793541@devbig577.frc2.facebook.com>
Date:   Wed, 11 Apr 2018 07:09:13 -0700
From:   Tejun Heo <htejun@...il.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, "Steven J . Hill" <steven.hill@...ium.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

Hello,

On Wed, Apr 11, 2018 at 03:56:43PM +0200, Vlastimil Babka wrote:
> > vmstat_update() is invoked by a kworker on a specific CPU. This worker
> > it bound to this CPU. The name of the worker was "kworker/1:1" so it
> > should have been a worker which was bound to CPU1. A worker which can
> > run on any CPU would have a `u' before the first digit.
> 
> Oh my, and I have just been assured by Tejun that his cannot happen :)
> And yet, in the original report [1] I see:
> 
> CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
> 
> So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> the cpu being hotplugged cpu 1, the worker started too early before
> stuff can be scheduled on the CPU, so it has to run on different than
> designated CPU?
> 
> [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2

The report says that it happens when hotplug is attempted.  Per-cpu
doesn't pin the cpu alive, so if the cpu goes down while a work item
is in flight or a work item is queued while a cpu is offline it'll end
up executing on some other cpu.  So, if a piece of code doesn't want
that happening, it gotta interlock itself - ie. start queueing when
the cpu comes online and flush and prevent further queueing when its
cpu goes down.

Thanks.

-- 
tejun