[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F21A80D.6080200@linux.vnet.ibm.com>
Date: Fri, 27 Jan 2012 00:52:53 +0530
From: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To: "Rafael J. Wysocki" <rjw@...k.pl>
CC: Jiri Slaby <jirislaby@...il.com>, Tejun Heo <tj@...nel.org>,
Jiri Slaby <jslaby@...e.cz>,
LKML <linux-kernel@...r.kernel.org>, Baohua.Song@....com,
"pavel@....cz" <pavel@....cz>,
Linux PM mailing list <linux-pm@...r.kernel.org>
Subject: Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:
> Hi,
>>
>> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
>>
>> if (data->mode != O_RDONLY || !data->frozen || data->ready) {
>> error = -EPERM;
>> break;
>> }
>>
>> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
>> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
>> This makes it possible for hibernation_snapshot() and hence
>> freeze_workqueues_begin() to be called a second time, which is unfortunate.
>
> Yes, I obviously forgot about that code path when I was working on the commit
> that introduced the problem. :-(
>
> Thanks a lot for the great analysis, it's really helpful!
>
Welcome :-) It was fun!
>> And actually, the patch I posted in my previous mail is not really the right
>> long-term fix, though it might fix the particular issue that Jiri is facing..
>>
>> Because, allowing hibernation_snapshot() to get called a second time while
>> kernel threads are still frozen brings us to the same situation that commit
>> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
>> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
>> done inside hibernation_snapshot(), when kernel threads are frozen.. which
>> is known to break XFS, to give one example as mentioned in the changelog
>> of the above commit.
>
> That's exactly right.
>
>> So, the right way to fix this IMHO, would be to split up thaw_processes()
>> just like freezing phase:
>>
>> /* freezes or thaws user space processes */
>> freeze_processes() - thaw_processes()
>>
>> /* freezes or thaws kernel threads */
>> freeze_kernel_threads() - thaw_kernel_threads()
>>
>> We have to insert this thaw_kernel_threads() at appropriate places in such a
>> way as to not require another ioctl if possible... Then things would be
>> more symmetric (and hence more easy to understand) and we can avoid getting
>> into strange situations as discussed here.
>>
>> But before we venture into that, it would be good to know if the patch posted
>> in the previous mail fixes the particular problem reported in this thread,
>> atleast just to see if there are other problems lurking that we aren't aware
>> of yet..
>
> Jiri has already said that the patch works.
>
> I think we could avoid the issue entirely by introducing thaw_kernel_threads
> and making SNAPSHOT_FREE call it. No other changes should be necessary.
>
> IOW, Jiri, does the patch below help?
>
> [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?]
>
> ---
This is exactly the kind of fix I was suggesting.. Thanks Rafael!
I have a small request for a comment. Please see below.
I have a question too, but for that I'll have to reply to my earlier
thread so that I can comment on the userspace code.
> include/linux/freezer.h | 2 ++
> kernel/power/process.c | 19 +++++++++++++++++++
> kernel/power/user.c | 1 +
> 3 files changed, 22 insertions(+)
>
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> extern int freeze_processes(void);
> extern int freeze_kernel_threads(void);
> extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
>
> static inline bool try_to_freeze(void)
> {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> static inline int freeze_processes(void) { return -ENOSYS; }
> static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
>
> static inline bool try_to_freeze(void) { return false; }
>
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
> printk("done.\n");
> }
>
> +void thaw_kernel_threads(void)
> +{
> + struct task_struct *g, *p;
> +
> + pm_nosig_freezing = false;
> + printk("Restarting kernel threads ... ");
> +
> + thaw_workqueues();
> +
> + read_lock(&tasklist_lock);
> + do_each_thread(g, p) {
> + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> + __thaw_task(p);
> + } while_each_thread(g, p);
> + read_unlock(&tasklist_lock);
> +
> + schedule();
> + printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
> swsusp_free();
> memset(&data->handle, 0, sizeof(struct snapshot_handle));
> data->ready = 0;
It would be nice to have a comment here explaining why we call
thaw_kernel_threads() here. (Such a comment would avoid confusion when people
look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is
thawing involved, while the corresponding freezing is nowhere in sight..
Of course the freezing is hidden inside hibernation_snapshot(), but that
might not be immediately apparent to everyone.)
> + thaw_kernel_threads();
> break;
>
> case SNAPSHOT_PREF_IMAGE_SIZE:
Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists