linux-kernel - Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.10.1407212025130.20847@nanos>
Date:	Mon, 21 Jul 2014 22:16:37 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Darren Hart <dvhart@...ux.intel.com>
cc:	Andi Kleen <andi@...stfloor.org>, Waiman Long <Waiman.Long@...com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Davidlohr Bueso <davidlohr@...com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
	linux-doc@...r.kernel.org, Jason Low <jason.low2@...com>,
	Scott J Norton <scott.norton@...com>
Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex

On Mon, 21 Jul 2014, Darren Hart wrote:
> We observed some significant improvements under some very specific use
> cases, but a more thorough dive into performance impact in the other cases
> as well as security implications with the vdso is still wanting.

The security implication is that the feature can only be available for
process private futexes. There is no way to expose information which
crosses the process spaces.

But the way worse issue is storage.

While you can cache the namespace specific TID of a thread in the
task_struct, you still need a O(1) zero overhead mechanism to update
the thread state (only on/off cpu is interesting) in a per process
shared data structure from the guts of schedule()

For that you have basically two choices:

1) cpu_thread_id[NR_CPUS]

   Simple to update from the scheduler, and a halfways moderate
   storage size (NR_CPUS * 4 bytes) in the worst case, i.e. 16k
   today. Set to 0 on scheduling out and to the namespace specific TID
   on scheduling in.

   But that requires a linear search in the user space spin loop. And
   that's required for every iteration of the loop. Can you imagine
   how well that works performance wise?

2) Bitmap threads_on_cpu

   Again, simple to update from the scheduler, cache line bouncing
   issues aside. Clear the bit on schedule out and set it on schedule
   in.

   But the bitmap needs the size of PID_MAX_LIMIT, which is a whopping
   512k per process in the worst case.

Anything else would involve search/lookup schemes which are just
overkill in both the scheduler and the user space loop.

Now for enhanced fun you need immutable pages for that storage, as you
can't have pagefaults in the guts of schedule().

So once you found a way to make that opt-in as you don't want inflict
any of this to all processes by default, it might be a worthwhile
optimization. So the probably tolerable impact on schedule() would be

schedule_out()
	if (curr->threads_on_cpu)
		clear_bit(curr->ns_tid, curr->threads_on_cpu);
and

schedule_in()
	if (curr->threads_on_cpu)
		clear_bit(curr->ns_tid, curr->threads_on_cpu);

Anything more complex is just going to defeat the whole purpose.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/