lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Oct 2015 09:30:21 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Tejun Heo <tj@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux.com>, Michal Hocko <mhocko@...e.cz>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Shaohua Li <shli@...com>, linux-mm <linux-mm@...ck.org>
Subject: Re: [GIT PULL] workqueue fixes for v4.3-rc5

On Tue, Oct 13, 2015 at 2:49 PM, Tejun Heo <tj@...nel.org> wrote:
>
> Single patch to fix delayed work being queued on the wrong CPU.  This
> has been broken forever (v2.6.31+) but obviously doesn't trigger in
> most configurations.

So why is this a bugfix? If cpu == WORK_CPU_UNBOUND, then things
_shouldn't_ care which cpu it gets run on.

I don't think the patch is bad per se, because we obviously have other
places where we just do that "WORK_CPU_UNBOUND means current cpu", and
it's documented to "prefer" the current CPU, but at the same time I
get the very strong feeling that anything that *requires* it to mean
the current CPU is just buggy crap.

So the whole "wrong CPU" argument seems bogus. It's not a workqueue
bug, it's a caller bug.

Because I'd argue that any user that *requires* a particular CPU
should specify that. This "fix" doesn't seem to be a fix at all.

So to me, the bug seems to be that vmstat_update() is just bogus, and
uses percpu data but doesn't speicy that it needs to run on the
current cpu.

Look at vmstat_shepherd() that does this *right* and starts the whole thing:

                        schedule_delayed_work_on(cpu,
                                &per_cpu(vmstat_work, cpu), 0);

and then look at vmstat_update() that gets it wrong:

                schedule_delayed_work(this_cpu_ptr(&vmstat_work),
                        round_jiffies_relative(sysctl_stat_interval));

Notice the difference?

So I feel that this is *obviously* a vmstat bug, and that working
around it by adding ah-hoc crap to the workqueues is completely the
wrong thing to do. So I'm not going to pull this, because it seems to
be hiding the real problem rather than really "fixing" anything.

(That said, it's not obvious to me why we don't just specify the cpu
in the work structure itself, and just get rid of the "use different
functions to schedule the work" model. I think it's somewhat fragile
how you can end up using the same work in different "CPU boundedness"
models like this).

Added the mm/vmstat.c suspects to the cc. Particularly Christoph
Lameter - this goes back to commit 7cc36bbddde5 ("vmstat: on-demand
vmstat workers V8").

Comments?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ