lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200903181002.07584.mega@retes.hu>
Date:	Wed, 18 Mar 2009 10:02:07 +0100
From:	Gábor Melis <mega@...es.hu>
To:	Roland McGrath <roland@...hat.com>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Davide Libenzi <davidel@...ilserver.org>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chris Friesen <cfriesen@...tel.com>,
	linux-kernel@...r.kernel.org
Subject: Re: RT signal queue overflow (Was: Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order))

On Miércoles 18 Marzo 2009, Roland McGrath wrote:
> > First of all, perhaps I missed somethings and this is solvable
> > without kernel changes, but I can't see how.
>
> It depends what kind of "solved" you mean.
>
> Signals pending for the thread are always delivered before signals
> pending for the process.  POSIX does not guarantee this to the
> application, but it has always been so in Linux and it's fine enough
> to rely on that.  Truly externally-generated and asynchronous signals
> go to the process, so it's really only pthread_kill use within your
> own program that raises the issue.
>
> Among signals pending for the thread, signals < SIGRTMIN are always
> delivered before ones >= SIGRTMIN.  POSIX does not guarantee this to
> the application, but it has always been so in Linux and it's fine
> enough to rely on that.  The most sensible thing to use with
> pthread_kill is some SIGRTMIN+n signal anyway, since they are never
> confused with any other use. If your program is doing that, you don't
> have a problem.

It was just a month or so ago when I finally made to change to use a 
non-real-time signal for signalling stop-for-gc. It was motivated by 
the fact that even with rt signals there needs to be a fallback 
mechanism for when the rt signal queue overflows. Another reason was 
that _different processes_ could interfere with each other: if one 
filled the queue the other processes would hang too (there was no 
fallback mechanism implemented). From this behaviour, it seemed that 
the rt signal queue was global. Attached is a test program that 
reproduces this. 

$ gcc -lpthread rt-signal-queue-overflow.c
$ (./a.out &); sleep 1; ./a.out
pthread_kill returned EAGAIN, errno=0, count=24566
pthread_kill returned EAGAIN, errno=0, count=0

There are two notable things here. The first is that pthread_kill 
returns EAGAIN that's not mentioned on the man page, but does not set 
errno. The other is that the first process filled the rt signal queue 
and the second one could not send a single signal successfully.

Granted, without a fallback mechanism my app deserved to lose. However, 
it seemed to me that there were other programs lacking in this regard 
on my desktop as I managed to hang a few of them.

Even though within my app I could have guarenteed that the number of 
pending rt signals is below a reasonable limit, there was no way to 
defend against other processes filling up the queue so I had to 
implement fallback mechanism that used non-rt signals (changing a few 
other things as well) and when that was done, there was no reason to 
keep the rt signal based one around.

Consider this another quality-of-implementation report.

> So on the one hand it seems pretty reasonable to say it's "solved" by
> accepting it when we say, "Welcome to Unix, these things should have
> stopped surprising you in the 1980s."  It's a strange pitfall of how
> everything fits together, granted.  But you do sort of have to make
> an effort to do things screwily before you can fall into it.
>
> All that said, it's actually probably a pretty easy hack to arrange
> that the signal posted by force_sig_info is the first one dequeued in
> all but the most utterly strange situations.
>
>
> Thanks,
> Roland

View attachment "rt-signal-queue-overflow.c" of type "text/x-csrc" (1037 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ