[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100216013702.GA1973@localhost.localdomain>
Date: Mon, 15 Feb 2010 20:37:02 -0500
From: Neil Horman <nhorman@...driver.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <andi@...stfloor.org>,
Jiri Slaby <jirislaby@...il.com>, linux-kernel@...r.kernel.org,
Greg KH <gregkh@...e.de>, Kay Sievers <kay.sievers@...e.de>
Subject: Re: oops in uevent_helper [was: mmotm 2010-01-13-12-17 uploaded]
On Mon, Feb 15, 2010 at 02:16:07PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 15, 2010 at 04:42:52PM -0500, Neil Horman wrote:
> > On Mon, Feb 15, 2010 at 12:12:33PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 12, 2010 at 12:06:24PM -0500, Neil Horman wrote:
> > > > On Thu, Feb 11, 2010 at 09:27:08PM -0800, Andrew Morton wrote:
> > > > > On Fri, 12 Feb 2010 06:21:26 +0100 Andi Kleen <andi@...stfloor.org> wrote:
> > > > >
> > > > > > > urgh, must I? That trashes Neil's
> > > > > > > kmod-add-init-function-to-usermodehelper.patch and
> > > > > > > kmod-replace-call_usermodehelper_pipe-with-use-of-umh-init-function-and-resolve-limit.patch
> > > > > > > and probably requires repairing other stuff and sets the testing status
> > > > > > > back to "square one".
> > > > > > >
> > > > > > > If you have patches queued, please make the time to support them!
> > > > > >
> > > > > > Ok, understood. I'll try to look into it today.
> > > > >
> > > > > Ta. As I mentioned to Neil, if it looks serious then let's shelve it
> > > > > all and revisit for 2.6.35.
> > > > >
> > > > > > You want incrementals?
> > > > >
> > > > > If convenient, please. Otherwise we can drop--and-remerge.
> > > > >
> > > > >
> > > >
> > > >
> > > > Ok, this fixes the oops Jiri reported for me. Its been tested by me, but only
> > > > minimally, and my rcu-foo is not the greatest, so through reviews appreciated.
> > > > The patch is incremental against the latest mmotm as of this AM.
> > >
> > > A few questions interspersed below...
> > >
> > > Thanx, Paul
> > >
> > > > Thanks!
> > > > Neil
> > > >
> > > >
> > > > Fix up remaining references to uevent_helper to play nice with Andi's
> > > > uevent_helper/rcu changes.
> > > >
> > > > Some changes were made recently which modified uevent_helper to be an rcu
> > > > protected pointer, rather than a static char array. This has led to a few
> > > > missed points in which the sysfs path still assumed that:
> > > > 1) the uevent_helper symbol could still be accessed safely without
> > > > rcu_dereference
> > > > 2) that the sysfs path could copy data to that pointer safely.
> > > >
> > > > I've fixed this by chaging the sysfs path so that it duplicates the string on
> > > > uevent_helper_store, and freeing it (only if it doesn't point to the
> > > > CONFIG_DEFAULT_UEVENT_HELPER string), in a call_rcu post-quiescent point. I've
> > > > also fixed up the remaining references to the uevent_helper pointers to use
> > > > rcu_dereference.
> > > >
> > > > Signed-off-by: Neil Horman <nhorman@...driver.com>
> > > >
> > > >
> > > > kernel/ksysfs.c | 38 ++++++++++++++++++++++++++++++++++++--
> > > > lib/kobject_uevent.c | 4 +++-
> > > > 2 files changed, 39 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> > > > index 21fe3c4..66d1e5b 100644
> > > > --- a/kernel/ksysfs.c
> > > > +++ b/kernel/ksysfs.c
> > > > @@ -37,19 +37,53 @@ KERNEL_ATTR_RO(uevent_seqnum);
> > > > static ssize_t uevent_helper_show(struct kobject *kobj,
> > > > struct kobj_attribute *attr, char *buf)
> > > > {
> > > > - return sprintf(buf, "%s\n", uevent_helper);
> > > > + return sprintf(buf, "%s\n", rcu_dereference(uevent_helper));
> > > > }
> > > > +
> > > > +struct uevent_helper_rcu {
> > > > + char *oldptr;
> > > > + struct rcu_head rcu;
> > > > +};
> > > > +
> > > > +static void free_old_uevent_ptr(struct rcu_head *list)
> > > > +{
> > > > + struct uevent_helper_rcu *ptr;
> > > > + char *dfl = CONFIG_UEVENT_HELPER_PATH;
> > >
> > > Given that you kfree() something that might be equal to dfl, I am
> > > hoping that the CONFIG_UEVENT_HELPER_PATH macro expands to something
> > > that kfree() can do something with...
> > >
> > > Or did you mean to put a "return;" in the then-clause of the "if"
> > > statement below?
> > >
> > No, I meant what I have. CONFIG_UEVENT_HELPER_PATH expands to a string. if the
> > old pointer is not NULL, and doesn't point to the default string (we don't want
> > to free something in the string table), then we kfree it, since it must have
> > been allocated in uevent_helper_store
>
> Got it, thank you! I confused ptr->oldptr with ptr. :-(
>
> > > > + ptr = container_of(list, struct uevent_helper_rcu, rcu);
> > > > + if (ptr->oldptr && (ptr->oldptr != dfl))
> > > > + kfree(ptr->oldptr);
> > > > +
> > > > + kfree(ptr);
> > > > +}
> > > > +
> > > > static ssize_t uevent_helper_store(struct kobject *kobj,
> > > > struct kobj_attribute *attr,
> > > > const char *buf, size_t count)
> > > > {
> > > > + char *kbuf;
> > > > + struct uevent_helper_rcu *old;
> > > > +
> > > > if (count+1 > UEVENT_HELPER_PATH_LEN)
> > > > return -ENOENT;
> > > > - memcpy(uevent_helper, buf, count);
> > > > + kbuf = kstrndup(buf, UEVENT_HELPER_PATH_LEN, GFP_KERNEL);
> > > > + if (!kbuf)
> > > > + return -ENOMEM;
> > > > uevent_helper[count] = '\0';
> > > > if (count && uevent_helper[count-1] == '\n')
> > > > uevent_helper[count-1] = '\0';
> > > > + old = kmalloc(sizeof(struct uevent_helper_rcu), GFP_KERNEL);
> > > > + if (!old)
> > > > + goto out_free;
> > > > +
> > > > + old->oldptr = rcu_dereference(uevent_helper);
> > > > + rcu_assign_pointer(uevent_helper, kbuf);
> > >
> > > Some lock protects this? Or does something else prevent multiple
> > > instances of uevent_helper_store() from executing concurrently?
> > >
> > I had thought that there was a per-file sysfs lock that protected this, but I
> > wasn't sure. Regardless, I would have thought this was safe, since
> > rcu_assign_pointer was serialized by rcu quiescence. Am I mistaken?
>
> Indeed you are -- rcu_assign_pointer() coordinates with any concurrent
> rcu_dereference() invocations, but does nothing to protect against other
> concurrent rcu_assign_pointer() invocations. So you should have something
> coordinating the update, be it a lock or whatever.
>
> > > > + call_rcu(&old->rcu, free_old_uevent_ptr);
> > > > +
> > > > return count;
> > > > +
> > > > +out_free:
> > > > + kfree(kbuf);
> > > > + return -ENOMEM;
> > > > }
> > > > KERNEL_ATTR_RW(uevent_helper);
> > > > #endif
> > > > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > > > index c2383f3..211f846 100644
> > > > --- a/lib/kobject_uevent.c
> > > > +++ b/lib/kobject_uevent.c
> > > > @@ -126,6 +126,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> > > > struct kset *kset;
> > > > const struct kset_uevent_ops *uevent_ops;
> > > > u64 seq;
> > > > + const char *helper;
> > > > int i = 0;
> > > > int retval = 0;
> > > >
> > > > @@ -272,7 +273,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> > > > #endif
> > > >
> > > > /* call uevent_helper, usually only enabled during early boot */
> > > > - if (uevent_helper[0])
> > > > + helper = rcu_dereference(uevent_helper);
> > >
> > > This is protected by a pre-existing rcu_read_lock() somewhere?
> > >
> > Does rcu_dereference not start a quiescence point the same way rcu_read_lock
> > does? AS I said, rcu isn't my strong suit :)
>
> What rcu_dereference() does is coordinate with any concurrent
> rcu_assign_pointer() calls to the same pointer. If you (incorrectly)
> replace the rcu_dereference() and rcu_assign_pointer() calls with
> simple assignments, then the helper[0] below might see the garbage
> values that were in place before the initialization that preceded the
> rcu_assign_pointer().
>
> The rcu_read_lock() and rcu_read_unlock() prevent an RCU grace period both
> starting and ending during the enclosed RCU read-side critical section.
> A pre-existing RCU grace period might end during this critical section,
> or a new RCU grace period might start during this critical section, but
> any RCU grace period that starts during a given RCU read-side critical
> section is not permitted to end until after that RCU read-side critical
> section ends.
>
> So the interaction between rcu_read_lock()/rcu_read_unlock on the one
> hand and synchronize_rcu() and call_rcu() on the other hand is what
> guarantees that any RCU-protect data structure referenced within an
> RCU read-side critical section will remain in existence throughout the
> remainder of that RCU read-side critical section.
>
> This arrangement might seem a bit strange at first, but it is what
> gives RCU its read-side performance, scalability, and deadlock immunity.
>
> For more detail, please take a look at:
>
> http://lwn.net/Articles/262464/ (What is RCU, Fundamentally?)
> http://lwn.net/Articles/263130/ (What is RCU's Usage?)
> http://lwn.net/Articles/264090/ (What is RCU's API?)
>
> Thanx, Paul
>
Ah, thank you for that Paul. So there is still some work to do here. On the
positive side, this isn't technically a regression, since the previous
implementation didn't provide any concurrency exclusion either (the use of a
static array prevented odd pointer corruption, but multiple uevent_helper reads
and writes can lead to all sorts of odd/erroneous behavior.
Fortunately, this can be fixed with an incremental addtion. Andrew, I'll do my
best by I don't know if it will be in time for -33, but I think thats ok, since
to trip over this you'd really have to be trying, and you'd have to be root to
do it. Let me know if you think this is critical and I'll try prioritize it/get
it done asap.
Neil
> > Neil
> >
> > > If not, an rcu_read_lock() / rcu_read_unlock() pair is needed that
> > > covers both the rcu_dereference() and any subsequent dereferences of the
> > > pointer returned by rcu_dereference().
> > >
> > > > + if (helper[0])
> > > > retval = uevent_call_helper(subsystem, env);
> > > >
> > > > exit:
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@...r.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at http://www.tux.org/lkml/
> > >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists