[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090310232027.GC25665@ldl.fc.hp.com>
Date: Tue, 10 Mar 2009 17:20:27 -0600
From: Alex Chiang <achiang@...com>
To: Vegard Nossum <vegard.nossum@...il.com>
Cc: Pekka Enberg <penberg@...helsinki.fi>, Ingo Molnar <mingo@...e.hu>,
jbarnes@...tuousgeek.org, gregkh@...e.de, tj@...nel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH, RFC] sysfs: only allow one scheduled removal callback per
kobj
Hi Vegard, sysfs folks,
Vegard was nice enough to test my PCI remove/rescan patches under
kmemcheck. Maybe "torture" is a more appropriate term. ;)
My patch series introduces a sysfs "remove" attribute for PCI
devices, which will remove that device (and child devices).
http://thread.gmane.org/gmane.linux.kernel.pci/3495
Vegard decided that he wanted to do something like:
# while true ; do echo 1 > /sys/bus/pci/devices/.../remove ; done
which caused a nasty oops in my code. You can see the results of
his testing in the thread I referenced above.
After looking at my code for a bit, I decided that maybe it
wasn't completely my fault. ;) See, I'm using device_schedule_callback()
which really is a wrapper around sysfs_schedule_callback() which
is the way that a sysfs attribute is supposed to remove itself to
prevent deadlock.
The problem that Vegard's test exposed is that if you repeatedly
call a sysfs attribute that's supposed to remove itself using
device_schedule_callback, we'll keep scheduling work queue tasks
with a kobj that we really want to release.
[nb, I bet that /sys/bus/scsi/devices/.../delete will exhibit the
same problems]
This is very racy, and at some point, whatever remove handler
we've scheduled with device_schedule_callback will end up
referencing a freed kobj.
I came up with the below patch which changes the semantics of
device/sysfs_schedule_callback. We now only allow one in-flight
callback per kobj, and return -EBUSY if that kobj already has a
callback scheduled for it.
This patch, along with my updated 07/11 patch in my series,
prevents at least the first oops that Vegard reported, and I
suspect it prevents the second kmemcheck error too, although I
haven't tested under kmemcheck (yet*).
I'm looking for comments on the approach I took, specifically:
- are we ok with the new restriction I imposed?
- is it ok to return -EBUSY to our callers?
- is the simple linked list proof of concept
implementation going to scale too poorly?
To answer my own first two questions, I checked for callers of
both device_ and sysfs_schedule_callback, and it looks like
everyone is using it the same way: to schedule themselves for
removal. That is, although the interface could be used to
schedule any sort of callback, the only use case is the removal
use case. I don't think it will be a problem to limit ourselves
to one remove callback per kobj.
Maybe this patch really wants to be a new interface called
sysfs_schedule_callback_once or _single, where we check for an
already-scheduled callback for a kobj, and if we pass, then we
simply continue on to the existing, unchanged
sysfs_schedule_callback. I don't feel too strongly about creating
a new interface, but my belief is that changing the semantics of
the existing interface is probably the better solution.
My opinion on my own third question is that removing a device is
not in the performance path, so a simple linked list is
sufficient.
Depending on the feedback here, I'll resend this patch with a
full changelog (and giving credit to Vegard/kmemcheck as Ingo
requested I do) or I can rework it.
Thanks.
/ac
*: I googled for kmemcheck and the most recent tree I found was
from 2008. Is that really true? Do the patches apply to current
upstream?
---
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1f4a3f8..e05a172 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -659,13 +659,16 @@ void sysfs_remove_file_from_group(struct kobject *kobj,
EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
struct sysfs_schedule_callback_struct {
- struct kobject *kobj;
+ struct list_head workq_list;
+ struct kobject *kobj;
void (*func)(void *);
void *data;
struct module *owner;
struct work_struct work;
};
+static DEFINE_MUTEX(sysfs_workq_mutex);
+static LIST_HEAD(sysfs_workq);
static void sysfs_schedule_callback_work(struct work_struct *work)
{
struct sysfs_schedule_callback_struct *ss = container_of(work,
@@ -674,6 +677,9 @@ static void sysfs_schedule_callback_work(struct work_struct *work)
(ss->func)(ss->data);
kobject_put(ss->kobj);
module_put(ss->owner);
+ mutex_lock(&sysfs_workq_mutex);
+ list_del(&ss->workq_list);
+ mutex_unlock(&sysfs_workq_mutex);
kfree(ss);
}
@@ -700,10 +706,19 @@ static void sysfs_schedule_callback_work(struct work_struct *work)
int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
void *data, struct module *owner)
{
- struct sysfs_schedule_callback_struct *ss;
+ struct sysfs_schedule_callback_struct *ss, *tmp;
if (!try_module_get(owner))
return -ENODEV;
+
+ mutex_lock(&sysfs_workq_mutex);
+ list_for_each_entry_safe(ss, tmp, &sysfs_workq, workq_list)
+ if (ss->kobj == kobj) {
+ mutex_unlock(&sysfs_workq_mutex);
+ return -EBUSY;
+ }
+ mutex_unlock(&sysfs_workq_mutex);
+
ss = kmalloc(sizeof(*ss), GFP_KERNEL);
if (!ss) {
module_put(owner);
@@ -715,6 +730,10 @@ int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
ss->data = data;
ss->owner = owner;
INIT_WORK(&ss->work, sysfs_schedule_callback_work);
+ INIT_LIST_HEAD(&ss->workq_list);
+ mutex_lock(&sysfs_workq_mutex);
+ list_add_tail(&ss->workq_list, &sysfs_workq);
+ mutex_unlock(&sysfs_workq_mutex);
schedule_work(&ss->work);
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists