lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 20 Jul 2011 22:09:27 -0700 From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com, dipankar@...ibm.com, akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com, patches@...aro.org, greearb@...delatech.com, edt@....ca Subject: Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() On Wed, Jul 20, 2011 at 03:44:55PM -0700, Linus Torvalds wrote: > On Wed, Jul 20, 2011 at 11:26 AM, Paul E. McKenney > <paulmck@...ux.vnet.ibm.com> wrote: > > Given some common flag combinations, particularly -Os, gcc will inline > > rcu_read_unlock_special() despite its being in an unlikely() clause. > > Use noinline to prohibit this misoptimization. > > Btw, I suspect that we should at least look at what it would mean if > we make the rcu_read_lock_nesting and the preempt counters both be > per-cpu variables instead of making them per-thread/process counters. > > Then, when we switch threads, we'd just save/restore them from the > process register save area. > > There's a lot of critical code sequences (spin-lock/unlock, rcu > read-lock/unlock) that currently fetches the thread/process pointer > only to then offset it and increment the count. I get the strong > feeling that code generation could be improved and we could avoid one > level of indirection by just making it a per-thread counter. > > For example, instead of __rcu_read_lock: looking like this (and being > an external function, partly because of header file dependencies on > the data structures involved): > > push %rbp > mov %rsp,%rbp > mov %gs:0xb580,%rax > incl 0x100(%rax) > leaveq > retq > > it should inline to just something like > > incl %gs:0x100 > > instead. Same for the preempt counter. > > Of course, it would need to involve making sure that we pick a good > cacheline etc that is already always dirty. But other than that, is > there any real downside? We would need a form of per-CPU variable access that generated efficient code, but that didn't complain about being used when preemption was enabled. __this_cpu_add_4() might do the trick, but I haven't dug fully through it yet. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists