lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1207603900.23968.18.camel@bobble.smo.corp.google.com>
Date:	Mon, 07 Apr 2008 14:31:40 -0700
From:	Frank Mayhar <fmayhar@...gle.com>
To:	Roland McGrath <roland@...hat.com>
Cc:	frank@...t.com, linux-kernel@...r.kernel.org
Subject: Re: posix-cpu-timers revamp

On Mon, 2008-04-07 at 13:08 -0700, Roland McGrath wrote: 
> > Yeah, I checked that out.  The one difference here is that that was a
> > race between do_exit() (actually release_task()/__exit_signal()) and
> > run_posix_cpu_timers().  While this race was the same in that respect,
> > there was also a race between all of the timer-tick routines that call
> > any of account_group_user_time(), account_group_system_time() or
> > account_group_exec_runtime() and __exit_signal().  This is because those
> > functions all dereference tsk->signal.
> 
> The essence that matters is the same: something that current does with its
> own ->signal on a tick vs something that the release_task path does.

True.  But see below.

> > Erm, well, this isn't reorganizing the data structures per se, since
> > these are new data structures.
> Tomato, tomato.  You're adding new data structures and lifetime rules to
> replace data that was described in a different data structure before, yet
> your new data's meaningful semantic lifetime exactly matches that of
> signal_struct.

This is one thing that has been unclear.  The relationship of
signal_struct to task_struct is, as far as I can tell, an unwritten one.
Certainly the interrupt routines are adjusting values that live only
inside task_struct and (with the exception of run_posix_cpu_timers())
leave signal_struct carefully alone.

>   You could as well make everything release_task cleans up be
> done in __put_task_struct instead, but that would not be a good idea
> either.  You've added a word to task_struct (100000 words per 100000-thread
> process, vs one word).  It's just not warranted.

While true, that's not the only reason to do it.  The tradeoff here is
between performance (i.e. having to do checks before dereferencing
tsk->signal) versus space.  It's really a judgment call.  (Although
adding 100Kwords does have a bit of weight.)

> > The upshot of this is that none of the timer routines dereference
> > tsk->signal, so the races go away, no locking needed.  From my
> > perspective this was the simplest solution, since lock dependency
> > ordering is _really_ a can of worms.
> 
> With the perspective of tunnel vision to just your one test case, adding
> something entirely new considering only that case always seems simplest.

Well, yes.  And not just "seems."

> That's not how we keep the entire system from getting the wrong kinds of
> complexity.

This isn't exactly how I would state it but yes, this is generally true
as well.  The problem is that knowing exactly what is "the wrong kinds"
relies on knowledge possessed by only a few.  Prying that knowledge out
of you guys can be a chore. :-)

> > Regarding the second approach, without locking wouldn't that still be
> > racy?  Couldn't exit_state change (and therefore __exit_signal() run)
> > between the check and the dereference?
> 
> No.  current->exit_state can go from zero to nonzero only by current
> running code in the do_exit path.  current does not progress on that
> path while current is inside one update_process_times call.

Well, okay, this is the vital bit of data that puts everything above
into perspective.  Had I known this, I would not have made the change I
did.

I guess the key bit of knowledge is that a "task" is really a scheduling
unit, right?  And, really, from the scheduler's perspective, "task" is
the same as "thread."  The only thing that makes a set of threads into a
multithreaded process is that they share a signal struct (well, and
their memory map, of course).  So a "task" can only be executed on a
single cpu at any time, it can't be executed on more than one cpu at a
time.  Therefore if a "task" is executing and is interrupted, the value
of "current" at the interrupt will be that task, which is entirely
suspended for the duration of the interrupt.

Is this correct?  (This is not just for this fix, but for my general
understanding of Linux scheduling.)

Unfortunately, these things are often implicit in the code but as far as
I know aren't written down anywhere.  This whole exercise has been for
me a process of becoming really familiar with the internals of the Linux
kernel for the first time.
-- 
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ