lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080711005243.ADE90154218@magilla.localdomain>
Date:	Thu, 10 Jul 2008 17:52:43 -0700 (PDT)
From:	Roland McGrath <roland@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86_64: fix delayed signals

> You're ignoring the background question - we expressly _stopped_ doing 
> this long ago. So the real issue was the ".. if you really .." part.
> 
> Do we really? What's the actual downside here?

I'm not convinced it was real "express".  It was never expressed in a
comment or log entry.  The change came in (pre-git) with:
    [PATCH] x86-64 architecture specific sync for 2.5.8
and commit 10ffdbb8d605be88b148f127ec86452f1364d4f0 "cleaned up slightly"
making the other paths match, with no explanation on the subject.

i386 has never behaved this way, and still doesn't.  I would doubt any
other arch ever has.  (My fix makes x86_64 and i386 treatment of
_TIF_WORK_MASK and any related signal race issues identical.)

The behavior of the test case I posted is just demonstrably wrong.  I
know you're never swayed by the fact that it has always been specified
and documented clearly to behave this way (in the case of multiple
pending signals like the test case).  Since it always did on i386, it's
easy to expect that there may be all manner of applications lurking
around that have depended on the correct semantics in subtle (and
probably intermittent) ways their poor users and maintainers may never
figure out.

What really irks me about the thought of leaving this wrong is that we
have spent so much effort lately on establishing a simple rule that when
you set TIF_SIGPENDING it will be acted on.  We did this after a lot of
painful time from a lot of people went into tracking down subtle weird
problems and races.  So, KISS.  Make a rule we can rely on, and then be
damn careful that we don't break the rule.  That's been serving us well,
which is to say preventing it going from two people who can keep track
of what's going with signals on any given day, to zero.  Now that rule
that kept life barely comprehensible is amended with, unless it's
already inside signals code or some nearby arch code, or it's a race,
or, yeah, I think that's all the cases, but check with--well, noone
really knows, so I don't know who you check with, sorry.  You just can't
reason about the code if you don't maintain the invariants.

The "actual" downsides include numerous unknowns, and I always forget
not to be surprised when you aren't scared that we have no idea what-all
the code might actually do.  The easy scenarios to think of off hand
have downsides like loss of timely signal delivery, where something can
chew 15ms of CPU after you killed it.  If I try all day I can come up
with more specific cases and maybe even some with instantly terrible
outcomes.  But I won't think of them all.  The worst ones will come up
much later (or are already dogging someone unwitting now), when someone
else sinks lots of time and effort trying to figure out strange
misbehaviors in their systems.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ