lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1204830243.20004.31.camel@bobble.smo.corp.google.com>
Date:	Thu, 06 Mar 2008 11:04:03 -0800
From:	Frank Mayhar <fmayhar@...gle.com>
To:	Roland McGrath <roland@...hat.com>
Cc:	parag.warudkar@...il.com,
	Alejandro Riveira Fernández 
	<ariveira@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jakub Jelinek <jakub@...hat.com>
Subject: Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.

On Tue, 2008-03-04 at 20:08 -0800, Roland McGrath wrote:
> check_process_timers can look just like check_thread_timers, but
> consulting the shared fields instead of the per-thread ones for both the
> clock accumulators and the timers' expiry times.  Likewise, arm_timer
> only has to set signal->it_*_expires; process_timer_rebalance goes away.

Okay, my understanding of this is still evolving, so please (please!)
correct me when I get it wrong.  I take what you're saying to mean that,
first, run_posix_cpu_timers() only needs to be run once per thread
group.  It _sounds_ like it should be checking the shared fields rather
than the per-task fields for timer expiration (in fact, the more I think
about it the more sure I am that that's the case).

The old process_timer_rebalance() routine was intended to distribute the
remaining ticks across all the threads, so that the per-task fields
would cause run_posix_cpu_timers() to run at the appropriate time.  With
it checking the shared fields this becomes no longer necessary.

Since the shared fields are getting all the ticks, this will work for
per-thread timers as well.

The arm_timers() routine, instead of calling posix_timer_rebalance(),
should just directly set signal->it_*_expires to the expiration time,
e.g.:
			switch (CPUCLOCK_WHICH(timer->it_clock)) {
			default:
				BUG();
			case CPUCLOCK_VIRT:
				if (!cputime_eq(p->signal->it_virt_expires,
						cputime_zero) &&
				    cputime_lt(p->signal->it_virt_expires,
					       timer->it.cpu.expires.cpu))
					break;
				p->signal->it_virt_expires = timer->it.cpu.expires.cpu;
				goto rebalance;
			case CPUCLOCK_PROF:
				if (!cputime_eq(p->signal->it_prof_expires,
						cputime_zero) &&
				    cputime_lt(p->signal->it_prof_expires,
					       timer->it.cpu.expires.cpu))
					break;
				i = p->signal->rlim[RLIMIT_CPU].rlim_cur;
				if (i != RLIM_INFINITY &&
				    i <= cputime_to_secs(timer->it.cpu.expires.cpu))
					break;
				p->signal->it_prof_expires = timer->it.cpu.expires.cpu;
				goto rebalance;
			case CPUCLOCK_SCHED:
				p->signal->it_sched_expires = timer->it.cpu.expires.sched;
				break;
			}


> If you do all that then the time spent in run_posix_cpu_timers should
> not be affected at all by the number of threads.  The only "walking the
> timer lists" that happens is popping the expired timers off the head of
> the lists that are kept in ascending order of expiry time.  For each
> flavor of timer, there are n+1 steps in the "walk" for n timers that
> have expired.  So already no costs here should scale with the number of
> timers, just the with the number of timers that all expire at the same time.

It's still probably worthwhile to defer processing to a workqueue
thread, though, just because it's still a lot to do at interrupt.  I'll
probably end up trying it both ways.

One thing that's still unclear to me is, if there were only one run of
run_posix_cpu_timers() per thread group per tick, how would per-thread
timers be serviced?

> The simplifications I described above will obviously greatly improve
> your test case (many threads and with some process timers expiring
> pretty frequently).  We need to consider and analyze the other kinds of
> cases too.  That is, cases with a few threads (not many more than the
> number of CPUs); cases where no timer is close to expiring very often.
> The most common cases, from one-thread cases to one-million thread
> cases, are when no timers are going off before next Tuesday (if any are
> set at all).  Then run_posix_cpu_timers always bails out early, and none
> of the costs you've seen become relevant at all.  Any change to what the
> timer interrupt handler does on every tick affects those cases too.

These are all on the roadmap, and in fact the null case should already
be covered. :-)

> As I mentioned in my last message, my concern about this originally was
> with the SMP cache/lock effects of multiple CPUs touching the same
> memory in signal_struct on every tick (which presumably all tend to
> happen simultaneously on all the CPUs).  I'd insist that we have
> measurements and analysis as thorough as possible of the effects of
> introducing that frequent/synchronized sharing, before endorsing such
> changes.  I have a couple of guesses as to what might be reasonable ways
> to mitigate that.  But it needs a lot of measurement and wise opinion on
> the low-level performance effects of each proposal.

I've given this some thought.  It seems clear that there's going to be
some performance penalty when multiple CPUs are competing trying to
update the same field at the tick.  It would be much better if there
were cacheline-aligned per-cpu fields associated with either the task or
the signal structure; that way each CPU could update its own field
without competing with the others.  Processing in run_posix_cpu_timers
would still be constant, although slightly higher for having to consult
multiple fields instead of just one.  Not one per thread, though, just
one per CPU, a much smaller and fixed number.
-- 
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ