linux-kernel - Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 16 Apr 2015 20:02:27 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mel@....ul.ie>,
	Rik van Riel <riel@...hat.com>,
	Jason Low <jason.low2@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Mel Gorman <mgorman@...e.de>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	hideaki.kimura@...com, Aswin Chandramouleeswaran <aswin@...com>,
	Scott J Norton <scott.norton@...com>
Subject: Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the
 scheduler


* Peter Zijlstra <peterz@...radead.org> wrote:

> On Wed, Apr 15, 2015 at 09:46:01AM +0200, Ingo Molnar wrote:
> 
>  > @@ -2088,7 +2088,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
>  >  
>  >  static void reset_ptenuma_scan(struct task_struct *p)
>  >  {
>  > -	ACCESS_ONCE(p->mm->numa_scan_seq)++;
>  > +	WRITE_ONCE(p->mm->numa_scan_seq, READ_ONCE(p->mm->numa_scan_seq) + 1);
>  
> vs
> 
> 	seq = ACCESS_ONCE(p->mm->numa_scan_seq);
> 	if (p->numa_scan_seq == seq)
> 		return;
> 	p->numa_scan_seq = seq;
> 
> 
> > So the original ACCESS_ONCE() barriers were misguided to begin with: I 
> > think they tried to handle races with the scheduler balancing softirq 
> > and tried to avoid having to use atomics for the sequence counter 
> > (which would be overkill), but things like ACCESS_ONCE(x)++ never 
> > guaranteed atomicity (or even coherency) of the update.
> > 
> > But since in reality this is only statistical sampling code, all these 
> > compiler barriers can be removed I think. Peter, Mel, Rik, do you 
> > agree?
> 
> ACCESS_ONCE() is not a compiler barrier

It's not a general compiler barrier (and I didn't claim so) but it is 
still a compiler barrier: it's documented as a weak, variable specific 
barrier in Documentation/memor-barriers.txt:

  COMPILER BARRIER
  ----------------

  The Linux kernel has an explicit compiler barrier function that  prevents the
  compiler from moving the memory accesses either side of it to the  other side:

        barrier();

  This is a general barrier -- there are no read-read or write-write variants
  of barrier().  However, ACCESS_ONCE() can be thought of as a weak form
  for barrier() that affects only the specific accesses flagged by the
  ACCESS_ONCE().

 [...]

> The 'read' side uses ACCESS_ONCE() for two purposes:
>  - to load the value once, we don't want the seq number to change under
>    us for obvious reasons
>  - to avoid load tearing and observe weird seq numbers
> 
> The update side uses ACCESS_ONCE() to avoid write tearing, and 
> strictly speaking it should also worry about read-tearing since its 
> not hard serialized, although its very unlikely to actually have 
> concurrency (IIRC).

So what bad effects can there be from the very unlikely read and write 
tearing?

AFAICS nothing particularly bad. On the read side:

        seq = ACCESS_ONCE(p->mm->numa_scan_seq);
        if (p->numa_scan_seq == seq)
                return;
        p->numa_scan_seq = seq;

If p->mm->numa_scan_seq gets loaded twice (very unlikely), and two 
different values happen, then we might get a 'double' NUMA placement 
run - i.e. statistical noise.

On the update side:

        ACCESS_ONCE(p->mm->numa_scan_seq)++;
        p->mm->numa_scan_offset = 0;

If the compiler tears that up we might skip an update - again 
statistical noise at worst.

Nor is compiler tearing the only theoretical worry here: in theory, 
with long cache coherency latencies we might get two updates 'mixed 
up' and resulting in a (single) missed update.

Only atomics would solve all the races, but I think that would be 
overdoing it.

This is what I meant by that there's no harm from this race.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/