linux-kernel - Re: [linux-pm] Is it supposed to be ok to call del

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 18 May 2010 21:43:27 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Nigel Cunningham <ncunningham@...a.org.au>
Cc:	Alan Stern <stern@...land.harvard.edu>,
	"linux-kernel" <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-pm" <linux-pm@...ts.linux-foundation.org>,
	Matt Reimer <mattjreimer@...il.com>
Subject: Re: [linux-pm] Is it supposed to be ok to call del_gendisk while userspace is frozen?

On Tuesday 18 May 2010, Nigel Cunningham wrote:
> Hi.
> 
> On 18/05/10 06:35, Rafael J. Wysocki wrote:
> > On Monday 17 May 2010, Nigel Cunningham wrote:
> >> On 17/05/10 12:22, Alan Stern wrote:
> >>> On Mon, 17 May 2010, Nigel Cunningham wrote:
> >>>>>> I object to the patch.
> >>>>>>
> >>>>>> Tell the patch it ought to exit once thawed, by all means.
> >>>>>
> >>>>> I'm not sure what you mean.  Care to explain?
> >>>>
> >>>> I mean "Set up some sort of flag that it can look at once thawed at
> >>>> resume time, and use that to tell it to exit at that point."
> >>>
> >>> Doesn't the patch do exactly that?  The "flag" is set by virtue of the
> >>> fact that this is part of del_gendisk -- which means the disk is being
> >>> unregistered and hence the writeback thread will exit shortly.
> >>>
> >>>>>> Make the patch unfreezeable to begin with, by all means.
> >>>>>
> >>>>> That wouldn't work.
> >>>>
> >>>> Why not?
> >>>
> >>> It would be nice to know exactly why.  Perhaps the underlying problem
> >>> can be fixed.
> >>>
> >>>>>> If you know a disk is going to be unregistered during resume,
> >>>>>
> >>>>> How do we check that, exactly?
> >>>>
> >>>> Well, if you can figure out that you need to go down this path at this
> >>>> point in the process, you must be able to apply the same logic to come
> >>>> to the same conclusion earlier in the process.
> >>>
> >>> That's not true.  You don't know that a device is going to be unplugged
> >>> until it actually _is_ unplugged.
> >>
> >> Sorry - I got unregistered during suspend (instead of resume) in my
> >> head. That said, I'd argue that we should be...
> >>
> >> 1) Syncing all the data at the start of the suspend/hibernate, so
> >> there's nothing for the workthread to do if we do del_gendisk.
> >> 2) Telling things to exit if we do find the device is gone away at
> >> resume time, but not relying on the going-away happening until post
> >> process thaw, for a couple of reasons:
> >> - Potential for races/confusion/mess etc in having $random process
> >> thawing other processes. Only the thread doing the suspend/hibernate
> >> should be freezing/thawing.
> >
> > I don't see a problem here, as far as kernel threads are concerned.  In this
> > particular case this is a subsystem thawing a thread that belongs to it.  No
> > problem.
> >
> >> - We're dealing with the symptom, not the cause. Almost always a bad idea.
> >
> > I very much prefer to have a fix for a symptom than no fix at all, which is the
> > realistic alternative in this case.
> >
> > So, I think we should merge the patch and if someone finds the root cause
> > at one point in future, then we can just use the *right* approach instead of
> > the present one.
> >
> > The problem is real and people in the field are affected by it, so if you don't
> > have a working alternative patch, please just let go.
> 
> I'm not denying that the problem is real. What I am concerned about is 
> finding a real solution, not just putting a sticky plaster over the 
> wound. It seems to me to be much wiser to deal with the issue properly 
> now instead of doing extra work later to diagnose what might be a harder 
> to reproduce symptom of the same problem. I'd happily put the time in 
> now myself, but I simply don't have the time this week.
> 
> Would it be possible to apply the patch, adding some sort of new tag 
> that can be used to say "This needs further attention", perhaps 
> including an enduring reference to this conversation.

Yeah, /* FIXME: */ is for that.  With some comment why we're doing this.  :-)

> Later, the 'real' fix could include another special tag that says "Proper fix for the 
> symptom addressed in commit 5e94f810"?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/