[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200903181002.07584.mega@retes.hu>
Date: Wed, 18 Mar 2009 10:02:07 +0100
From: Gábor Melis <mega@...es.hu>
To: Roland McGrath <roland@...hat.com>
Cc: Oleg Nesterov <oleg@...hat.com>,
Davide Libenzi <davidel@...ilserver.org>,
Ingo Molnar <mingo@...e.hu>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Chris Friesen <cfriesen@...tel.com>,
linux-kernel@...r.kernel.org
Subject: Re: RT signal queue overflow (Was: Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order))
On Miércoles 18 Marzo 2009, Roland McGrath wrote:
> > First of all, perhaps I missed somethings and this is solvable
> > without kernel changes, but I can't see how.
>
> It depends what kind of "solved" you mean.
>
> Signals pending for the thread are always delivered before signals
> pending for the process. POSIX does not guarantee this to the
> application, but it has always been so in Linux and it's fine enough
> to rely on that. Truly externally-generated and asynchronous signals
> go to the process, so it's really only pthread_kill use within your
> own program that raises the issue.
>
> Among signals pending for the thread, signals < SIGRTMIN are always
> delivered before ones >= SIGRTMIN. POSIX does not guarantee this to
> the application, but it has always been so in Linux and it's fine
> enough to rely on that. The most sensible thing to use with
> pthread_kill is some SIGRTMIN+n signal anyway, since they are never
> confused with any other use. If your program is doing that, you don't
> have a problem.
It was just a month or so ago when I finally made to change to use a
non-real-time signal for signalling stop-for-gc. It was motivated by
the fact that even with rt signals there needs to be a fallback
mechanism for when the rt signal queue overflows. Another reason was
that _different processes_ could interfere with each other: if one
filled the queue the other processes would hang too (there was no
fallback mechanism implemented). From this behaviour, it seemed that
the rt signal queue was global. Attached is a test program that
reproduces this.
$ gcc -lpthread rt-signal-queue-overflow.c
$ (./a.out &); sleep 1; ./a.out
pthread_kill returned EAGAIN, errno=0, count=24566
pthread_kill returned EAGAIN, errno=0, count=0
There are two notable things here. The first is that pthread_kill
returns EAGAIN that's not mentioned on the man page, but does not set
errno. The other is that the first process filled the rt signal queue
and the second one could not send a single signal successfully.
Granted, without a fallback mechanism my app deserved to lose. However,
it seemed to me that there were other programs lacking in this regard
on my desktop as I managed to hang a few of them.
Even though within my app I could have guarenteed that the number of
pending rt signals is below a reasonable limit, there was no way to
defend against other processes filling up the queue so I had to
implement fallback mechanism that used non-rt signals (changing a few
other things as well) and when that was done, there was no reason to
keep the rt signal based one around.
Consider this another quality-of-implementation report.
> So on the one hand it seems pretty reasonable to say it's "solved" by
> accepting it when we say, "Welcome to Unix, these things should have
> stopped surprising you in the 1980s." It's a strange pitfall of how
> everything fits together, granted. But you do sort of have to make
> an effort to do things screwily before you can fall into it.
>
> All that said, it's actually probably a pretty easy hack to arrange
> that the signal posted by force_sig_info is the first one dequeued in
> all but the most utterly strange situations.
>
>
> Thanks,
> Roland
View attachment "rt-signal-queue-overflow.c" of type "text/x-csrc" (1037 bytes)
Powered by blists - more mailing lists