linux-kernel - Re: [PATCH] x86_64: fix delayed signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 11 Jul 2008 10:58:54 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Roland McGrath <roland@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Elias Oltmanns <eo@...ensachen.de>,
	Török Edwin <edwintorok@...il.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: [PATCH] x86_64: fix delayed signals

On Fri, 11 Jul 2008, Ingo Molnar wrote:
> 
> nice find! Roland, is this related to the thread started by Elias 
> Oltmanns yesterday:
> 
>     http://lkml.org/lkml/2008/7/10/57

No.

First off, the delayed sending of signals isn't actually delayed that 
much. It's the next kernel entry at the most - certainly *not* across 
scheduling points, which the fact that ^Z shows up in the xterm says 
happen.

Secondly, it seems to happen on x86-32 too, and it is claimed to be a 
regression. Neither of which would be true for this case.

Thirdly, there seems to be no other signals involved (ie it's a single 
signal).

I think the IO interactivity report is simply because of IO scheduling 
and/or paging out the shell too aggressively. No signals will be delivered 
while the process in in 'D' state (the new 'killable' thing being an 
exception in _some_ cases, but that's literally just for killing signals, 
not backgrounding).

So to look at the original report:

 "By sprinkling some printk()s all over the place, I've managed to
  establish the following sequence of events taking place in the event of
  delayed signal handling as described above:
  The first Ctrl+Z event enqueues a SIGTSTP signal which eventually
  results in a call to kick_process(). For some reason though, the signal
  isn't handled straight away but remains on the queue for some time."

the thing is, kick_process() only helps if the other thread is _running_ 
on the other CPU. If it's actively waiting for disk, it's a no-op: the 
signal handling code expects that the scheduler will either wake it up in 
"wake_up_state()" (which won't happen if it is in 'D' state, of course!) 
or that the process will just handle it in its own time when it wakes up 
when IO completes.

So if things have gotten worse latency, it's most likely simply because 
our IO has worse latency - probably because of having more requests 
outstanding, or because of unlucky/bad IO scheduler behaviour. The first 
thing to do (because it's fairly easy) is to start off testing different 
IO schedulers, but also see if there are some cases where we should try to 
return early.

In the particular case of Edwin Török, his latency problem seems to b with 
"find". That's interesting, because it's one of the cases where we *could* 
easily improve on latency, by making "readdir()" return early both for 
killable signals, but also for regular signals when we've filled the 
buffer partially (because unlike a "read()" system call, we don't promise 
to fill the whole buffer _anyway_).

It may be, for example, that the increased latency isn't actually because 
the *kernel* has increased latencies at all, but perhaps 'find' uses a 
much bigger buffer for it's readdir() code? Anyway, _if_ it's readdir() 
that has high latency, the appended patch might make a difference. I think 
it's probably a good idea regardless..

But it would be really good to have the thing bisected regardless of what 
it is.

			Linus

---

This makes 'readdir()' return early if it has a signal pending and it has 
filled at least one entry (the -EINTR will never be passed on to user 
space: the readdir() system call will return the length of the filled-in 
buffer)

 fs/readdir.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 4e026e5..ec8c192 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -159,6 +159,9 @@ static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
 		return -EOVERFLOW;
 	dirent = buf->previous;
 	if (dirent) {
+		/* Only check signals if we have filled at least one entry! */
+		if (signal_pending(current))
+			return -EINTR;
 		if (__put_user(offset, &dirent->d_off))
 			goto efault;
 	}
@@ -241,6 +244,9 @@ static int filldir64(void * __buf, const char * name, int namlen, loff_t offset,
 		return -EINVAL;
 	dirent = buf->previous;
 	if (dirent) {
+		/* Only check signals if we have filled at least one entry! */
+		if (signal_pending(current))
+			return -EINTR;
 		if (__put_user(offset, &dirent->d_off))
 			goto efault;
 	}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/