linux-kernel - Re: Bug in disk event polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.44L0.1202121216060.18114-100000@netrider.rowland.org>
Date:	Sun, 12 Feb 2012 12:48:22 -0500 (EST)
From:	Alan Stern <stern@...land.harvard.edu>
To:	Tejun Heo <tj@...nel.org>
cc:	Jens Axboe <axboe@...nel.dk>, "Rafael J. Wysocki" <rjw@...k.pl>,
	Linux-pm mailing list <linux-pm@...r.kernel.org>,
	Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Bug in disk event polling

On Fri, 10 Feb 2012, Tejun Heo wrote:

> Hello,
> 
> On Fri, Feb 10, 2012 at 04:44:48PM -0500, Alan Stern wrote:
> > > I think it should be nrt.  It assumes that no one else is running it
> > > concurrently; otherwise, multiple CPUs could jump into
> > > disk->fops->check_events() concurrently which can be pretty ugly.
> > 
> > Come to mention it, how can a single work item ever run on more than
> > one CPU concurrently?  Are you concerned about cases where some other 
> > thread requeues the work item while it is executing?
> 
> Yeah, there are multiple paths which may queue the work item.  For
> polling work, it definitely was possible but maybe locking changes
> afterwards removed that.  Even then, it would be better to use nrt wq
> as bug caused that way would be very difficult to track down.

Okay, I'll create a new workqueue for this purpose.

> > The problem is that these async threads generally aren't freezable.  
> > They will continue to run and do I/O while a system goes through a
> > sleep transition.  How should this be handled?
> 
> I think it would be better to use wq for most kthreads.  A lot of them
> aren't strictly correct in the way they deal with
> kthread_should_stop() and freezing.  kthread in general simply seems
> way too difficult to use correctly.

Maybe so, but getting rid of it at this point would be a big change.  
Also, kthreads were originally considered more suitable for tasks that
would need to run for a long time; is this no longer true?

> > kthread_run() can be adjusted on a case-by-case basis, by inserting
> > calls to set_freezable() and try_to_freeze() at the appropriate places.  
> > But what about async_schedule()?
> 
> Given the stuff async is used for, maybe just make all async execution
> freezable?

That probably won't work.  What if a driver relies on async thread
execution to carry out its I/O?

As another example, sd_probe() calls async_schedule(sd_probe_async,...)
to handle the long-running parts of probing a SCSI disk.  In turn,
sd_remove() calls async_synchronize_full() to insure that probing is
over before the device is unbound from sd.

What happens if a hot-unpluggable disk drive is unplugged while the
system is asleep?  It's entirely possible that the bus subsystem's
resume routine would see the device was gone and would try to 
unregister it.  Then sd_remove would wait for the async thread 
to finish, which would never happen because the thread would be frozen 
and wouldn't be thawed until all the resume routines had finished.

In this case, the proper solution is to have the SCSI prepare method 
call async_synchronize_full().  Other cases will require other 
solutions.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/