lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 20 Jul 2007 14:08:26 -0500
From:	Milton Miller <miltonm@....com>
To:	Alan Stern <stern@...land.harvard.edu>
Cc:	Ying Huang <ying.huang@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>, David Lang <david@...g.hm>,
	linux-pm <linux-pm@...ts.linux-foundation.org>,
	Jeremy Maitin-Shepard <jbms@....edu>
Subject: Re: [linux-pm] Re: Hibernation considerations


On Jul 20, 2007, at 1:17 PM, Alan Stern wrote:

> On Fri, 20 Jul 2007, Milton Miller wrote:
>
>> On Jul 20, 2007, at 11:20 AM, Alan Stern wrote:
>>> On Fri, 20 Jul 2007, Milton Miller wrote:
>>>>> We can't do this unless we have frozen tasks (this way, or another)
>>>>> before
>>>>> carrying out the entire operation.
>>>>
>>>> What can't we do?   We've already worked with the drivers to quesce
>>>> the
>>>> hardware and put any information to resume the device in ram.  Now 
>>>> we
>>>> ask them to put their device in low power mode so we can go to 
>>>> sleep.
>>>> Even if we schedule, the only thing userspace could touch is memory.
>>>
>>> Userspace can submit I/O requests.  Someone will have to audit every
>>> driver to make sure that such I/O requests don't cause a quiesced
>>> device to become active.  If the device is active, it will make the
>>> memory snapshot inconsistent with the on-device data.
>>
>> If a driver is waking a device between the time it was told by
>> hibernation "suspend all operations and save your device state to ram"
>> and "resume your device" then it is a buggy driver.
>
> That's exactly my point.  As far as I know nobody has done a survey,
> but I bet you'd find _many_ drivers are buggy either in this way or the
> converse (forcing an I/O request to fail immediately instead of waiting
> until the suspend is over when it could succeed).  They have this bug
> because they were written -- those which include any suspend/resume
> support at all -- under the assumption that they could rely on the
> freezer.
>
> And that's why Rafael said "We can't do this unless we have frozen
> tasks (this way, or another) before carrying out the entire operation."
> Until the drivers are fixed -- which seems like a tremendous job --
> none of this will work.

So this is in the way of removing the freezer ... but as we are not 
relying on doing any io other than suspend device operation, save state 
to ram, then later put device in low power mode for s3 and/or s4, and 
finally restore and resume to running.

>> I argue the process can make the io request after we write to disk, we
>> just can't service it.  If we are suspended it will go to the request
>> queue, and eventually the process will wait for normal throttling
>> mechanisms until the driver is woken up.
>
> Many drivers don't have request queues.  Even for the ones that do,
> there are I/O pathways that bypass the queue (think of ioctl or sysfs).

So its not a flag in make_request, fine.

>> Actually, my point was more "what kernel services do the drivers need
>> to transition from quiesced to low power for acpi S4 or
>> suspend-to-ram"?  We can't give them allocate-memory (but we give them
>> a call "we are going to suspend" when they can), but does "run this
>> tasklet" help?  What timer facilities are needed?
>
> Some drivers need the ability to schedule.  Some will need the ability
> to allocate memory (although GFP_ATOMIC is probably sufficient).  Some
> will need timers to run.

Can they allocate the memory in advance?  (Call them when we know we 
want to suspend, they make the allocations they will need; we later 
call them again to release the allocations).

If you need timers, you probably want some scheduling?

>> Do we need to differentate init (por by bios) and resume from quiesced
>> (for reboot, kexec start/resume)?  I hope not.
>
> Yes we do.

can you elabrate?   Note I was not asking resume-from-low power vs 
init-from-por.  We still get that distinction.

How do these drivers work today when we kexec?

The reason I'm asking is its hard to tell the first kernel what 
happened.  We can say "we powered off, and we were restarted", but it 
becomes much harder when each device may or may not have a driver in 
the save kernel if we have to differentate for each device if it was 
initialized and later quiesced by the jump kernel during save or never 
touched.  And we need to tell the resume from hybernate code "i touched 
it" "no i didn't" and "we resumed from s4" "no it was from s5".

This is why I've been proposing that we don't create the suspend image 
with devices in the low power state, but only in a quiesced state 
similar to the initial state.


I'm proposing a sequence like:

(1) start allocating pinned memory to reduce saved image size
(2) allocate and map blocks to save maximum image (we know how much ram 
is not in 1, so the max size)
(3) tell drivers we are going to suspend.   userspace is still running, 
swaping still active, etc.  now is the time to allocate memory to save 
device state.
(4) do what we want to slow down userspace making requests (ie run 
freezer today)
(5) call drivers while still scheduling with interrupts "save 
oppertunitiy".  From this point, any new request should be queued or 
the process put on a wait queue.
(6) suspend timers, turn off interrupts
(7) call drivers with interrupts off (final save)
(8) jump to other kernel to save the image
(9) call drivers to transition to low power
(10) finish operations to platform suspend on hybernate
(11) call drivers to resume, telling them if from suspend-to-ram or 
suspend-to-disk, possibly in two stages (interrupts off no scheduling 
and interrupts on scheduling allowed)
(12) unfreeze processes, kill the the thread holding the extra memroy 
used to reserve

So I'm asking what needs to happen in 9.  If we have to turn interrupts 
on and schedule, that's ok.  If the low power state is the initial 
state then fine.

Note that in 11, we could further differentate "from image restore in 
S4" and "from image restore in S5", and "from failed image save", but 
what needs to happen differently?


I'm guessing that the work that will take some time is seperating the 
go to low power from quiesce operations for snapshot, as it sounds like 
this is done with one driver call today?  Making this separation will 
give us our driver audit :-), but only if we decide on the requiements 
before the start.

miton

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ