linux-kernel - Re: [PATCH] sched: Revert 14a40ffccd61 ("sched: replace PF_THREAD_BOUND with PF_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131202153746.GA27781@gmail.com>
Date:	Mon, 2 Dec 2013 16:37:46 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Tejun Heo <htejun@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] sched: Revert 14a40ffccd61 ("sched: replace
 PF_THREAD_BOUND with PF_NO_SETAFFINITY")


* Peter Zijlstra <peterz@...radead.org> wrote:

> Subject: sched: Revert 14a40ffccd61 ("sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY")
> 
> PF_NO_SETAFFINITY, which crudely turns off affinity control and cgroups
> access to tasks, and which is in use by every workqueue thread in Linux (!),
> is conceptually wrong on many levels:
> 
>  - We should strive to never consciously place artificial limitations on
>    kernel functionality; our main reason to place limitations should be
>    correctness.
> 
>    There are cases where limiting affinity is justified: for example the
>    case of single cpu workqueue threads, which are special for their
>    limited concurrency, esp. when coupled with per-cpu resources --
>    allowing such threads to run on other cpus is a correctness violation
>    and can crash the kernel.
> 
>  - But using it outside this case is overly broad; it dis-allows usage
>    that is functionally fine and in some cases desired.
> 
>    In particular; tj argues ( http://lkml.kernel.org/r/20131128143848.GD3925@htj.dyndns.org )
> 
>    "That's just inviting people to do weirdest things and then
>    reporting things like "crypt jobs on some of our 500 machines end up
>    stuck on a single cpu once in a while" which will eventually be
>    tracked down to some weird shell script setting affinity on workers
>    doing something else."
> 
>    While that is exactly what HPC/RT people _want_ in order to avoid
>    disturbing the other CPUs with said crypt work.
> 
>  - Furthermore, the above example is also wrong in that you should not
>    protect root from itself; there's plenty root can do to shoot his
>    own foot off, let alone shoot his head off.
> 
>    Setting affinities is limited to root, and if root messes up the
>    system he can keep the pieces. But limiting in an overly broad
>    fashion stifles the functionality of the system.
> 
>  - Lastly; the flag actually blocks entry into cgroups as well as
>    sched_setaffinity(), so the name is misleading at best.
> 
> The right fix is to only set PF_THREAD_BOUND on per-cpu workers:
> 
>  --- a/kernel/workqueue.c
>  +++ b/kernel/workqueue.c
> 
>          set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> 
>  -       /* prevent userland from meddling with cpumask of workqueue workers */
>  -       worker->task->flags |= PF_NO_SETAFFINITY;
>  -
>          /*
>           * The caller is responsible for ensuring %POOL_DISASSOCIATED
>           * remains stable across this function.  See the comments above the
>           * flag definition for details.
>           */
>  -       if (pool->flags & POOL_DISASSOCIATED)
>  +       if (pool->flags & POOL_DISASSOCIATED) {
>                  worker->flags |= WORKER_UNBOUND;
>  +       } else {
>  +               /*
>  +                * Prevent userland from meddling with cpumask of workqueue
>  +                * workers:
>  +                */
>  +               worker->task->flags |= PF_THREAD_BOUND;
>  +       }
> 
> Therefore revert the patch and add an annoying but non-destructive warning
> check against abuse.

Hm, I missed these problems with the approach, but I think you are 
right.

Tejun, I suspect you concur with Peter's analysis, can I also add 
Peter's workqueue.c fixlet above to workqueue.c to this patch plus 
your Acked-by, so that the two changes are together?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/