[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150316104443.491f4698@notabene.brown>
Date: Mon, 16 Mar 2015 10:44:43 +1100
From: NeilBrown <neilb@...e.de>
To: Torsten Kaiser <just.for.lkml@...glemail.com>
Cc: Prakash Punnoor <prakash@...noor.de>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: udiskd high CPU usage with 4.0 git
On Sat, 14 Mar 2015 21:16:51 +0100 Torsten Kaiser
<just.for.lkml@...glemail.com> wrote:
> On Mon, Mar 9, 2015 at 12:30 AM, NeilBrown <neilb@...e.de> wrote:
> > On Sun, 08 Mar 2015 18:14:39 +0100 Prakash Punnoor <prakash@...noor.de> wrote:
> >
> >> Hi,
> >>
> >> I noticed the udisks daemon (version 2.1.4) suddenly started using high
> >> cpu (one core at 100%) with linux 4.0 git kernel. I bisected it to:
> >>
> >> 750f199ee8b578062341e6ddfe36c59ac8ff2dcb
>
> I had the same problem upgrading from 4.0-rc1 to 4.0-rc3.
> I have just finished bisecting and "fixing" it.
>
> My bisect points to the same commit.
>
> Looking at udisksd with strace sees a loop of polling and then
> accessing several md related sysfs files.
> The only file that udisksd monitors and was changes by that commit was
> "sync_action".
>
> If I revert this part of the commit, my system works normal again:
>
> static struct md_sysfs_entry md_scan_mode =
> - __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
> + __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
>
> It seems that polling is broken for peralloc files.
>
> The cause seems to be that kernfs_seq_show() updates ->event, while
> the new sysfs_kf_read() does not.
> So the polling will always trigger and udisksd goes into an inifinite
> loop looking for changes that are not there.
>
> I fixed my local system by copying the line "of->event =
> atomic_read(&of->kn->attr.open->event);" from kernfs_seq_show() into
> sysfs_kf_read(). (I also needed to move the definition of struct
> kernfs_open_node from kernfs/file.c to kefs-internal.h)
>
> udisksd now again behaves normal, but I'm not sending this change as a
> patch, because I do not know about the locking and livetime of these
> objects to evaluate, if that is really the correct fix.
Thanks for the bisection and analysis! Always easier when someone else does
the hard work :-)
There is a much simpler patch (as you probably suspected). I'll post it in a
moment.
Thank,
NeilBrown
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists