linux-kernel - Re: [PATCH] printk: Avoid softlockups in console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 31 Jan 2013 13:46:25 +0100
From:	Jan Kara <jack@...e.cz>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	LKML <linux-kernel@...r.kernel.org>, jslaby@...e.cz
Subject: Re: [PATCH] printk: Avoid softlockups in console_unlock()

On Wed 30-01-13 16:08:27, Andrew Morton wrote:
> On Tue, 29 Jan 2013 15:54:24 +0100
> Jan Kara <jack@...e.cz> wrote:
> 
> > >   So I was testing the attached patch which does what we discussed. The bad
> > > news is I was able to trigger a situation (twice) when suddently sda
> > > disappeared and thus all IO requests failed with EIO. There is no trace of
> > > what's happened in the kernel log. I'm guessing that disabled interrupts on
> > > the printing CPU caused scsi layer to time out for some request and fail the
> > > device. So where do we go from here?
> >   Andrew? I guess this fell off your radar via the "hrm, strange, need to
> > have a closer look later" path?
> 
> urgh.  I was hoping that if we left it long enough, one of both of us
> would die :(
  I'm too young for this strategy to work for me :)

> I fear we will rue the day when we changed printk() to bounce some of
> its work up to a kernel thread.
> 
> > Currently I'd be inclined to return to my original solution...
> 
> Can we make it smarter?  Say, take a peek at the current
> softlockup/nmi-watchdog intervals, work out how for how long we can
> afford to keep interrupts disabled and then use that period and
> sched_clock() to work out if we're getting into trouble?  IOW, remove
> the hard-wired "1000" thing which will always be too high or too low
> for all situations.
  Yes, I also thought that making offloading more clever (so that offload
doesn't happen unless we really have no choice) could make the approach
more acceptable. 

> Implementation-wise, that would probably end up adding a kernel-wide
> function along the lines of
> 
> /*
>  * Return the maximum number of nanosecond for which interrupts may be disabled
>  * on the current CPU
>  */
> u64 max_interrupt_disabled_duration(void)
> {
> 	return min(sortirq duration, nmi watchdog duration);
> }
  OK, that sounds good. So I'll write some patch...
 
								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/