linux-kernel - Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzWAWV4Vr5mhy6+EELNY4_jCr4ozjgHmPk1CMp3mTOegw@mail.gmail.com>
Date:	Wed, 20 Jul 2011 15:44:55 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com,
	eric.dumazet@...il.com, darren@...art.com, patches@...aro.org,
	greearb@...delatech.com, edt@....ca
Subject: Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()

On Wed, Jul 20, 2011 at 11:26 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> Given some common flag combinations, particularly -Os, gcc will inline
> rcu_read_unlock_special() despite its being in an unlikely() clause.
> Use noinline to prohibit this misoptimization.

Btw, I suspect that we should at least look at what it would mean if
we make the rcu_read_lock_nesting and the preempt counters both be
per-cpu variables instead of making them per-thread/process counters.

Then, when we switch threads, we'd just save/restore them from the
process register save area.

There's a lot of critical code sequences (spin-lock/unlock, rcu
read-lock/unlock) that currently fetches the thread/process pointer
only to then offset it and increment the count. I get the strong
feeling that code generation could be improved and we could avoid one
level of indirection by just making it a per-thread counter.

For example, instead of __rcu_read_lock: looking like this (and being
an external function, partly because of header file dependencies on
the data structures involved):

  push   %rbp
  mov    %rsp,%rbp
  mov    %gs:0xb580,%rax
  incl   0x100(%rax)
  leaveq
  retq

it should inline to just something like

  incl %gs:0x100

instead. Same for the preempt counter.

Of course, it would need to involve making sure that we pick a good
cacheline etc that is already always dirty. But other than that, is
there any real downside?

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/