[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F6B789C.8020201@panasas.com>
Date: Thu, 22 Mar 2012 12:08:12 -0700
From: Boaz Harrosh <bharrosh@...asas.com>
To: Oleg Nesterov <oleg@...hat.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
"Rafael J. Wysocki" <rjw@...k.pl>, <keyrings@...ux-nfs.org>,
<linux-security-module@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
NFS list <linux-nfs@...r.kernel.org>,
Trond Myklebust <Trond.Myklebust@...app.com>,
"Bhamare, Sachin" <sbhamare@...asas.com>,
David Howells <dhowells@...hat.com>,
Eric Paris <eparis@...hat.com>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
Kay Sievers <kay.sievers@...y.org>,
James Morris <jmorris@...ei.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Greg KH <gregkh@...uxfoundation.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Tejun Heo <tj@...nel.org>, David Rientjes <rientjes@...gle.com>
Subject: Re: [RFC 4/4] {RFC} kmod.c: Add new call_usermodehelper_timeout()
API
On 03/22/2012 07:27 AM, Oleg Nesterov wrote:
> On 03/21, Boaz Harrosh wrote:
>>
>>> @@ -258,7 +262,8 @@ static void __call_usermodehelper(struct work_struct *work)
>>>
>>> switch (wait) {
>>> case UMH_NO_WAIT:
>>> - call_usermodehelper_freeinfo(sub_info);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> break;
>
> This doesn't look very nice. If you add the refcounting, it should be
> consistent. Imho it is better to change call_usermodehelper_exec() so
> that UMH_NO_WAIT does kref_put() too. Just s/goto unlock/goto out/ afaics.
>
Yes I've seen this. after I sent the patch. Hence the RFC tag
>>> @@ -452,22 +459,27 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
>>>
>>> sub_info->complete = &done;
>>> sub_info->wait = wait;
>>> + if (!sub_info->wait_timeout)
>>> + sub_info->wait_timeout = MAX_SCHEDULE_TIMEOUT;
>>>
>>> + /* Balanced in __call_usermodehelper or wait_for_helper */
>>> + kref_get(&sub_info->kref);
>>> queue_work(khelper_wq, &sub_info->work);
>>> if (wait == UMH_NO_WAIT) /* task has freed sub_info */
>>> goto unlock;
>>> - wait_for_completion(&done);
>>> - retval = sub_info->retval;
>>> -
>>> + if (likely(wait_for_completion_timeout(&done, sub_info->wait_timeout)))
>>> + retval = sub_info->retval;
>>> + else
>>> + retval = -ETIMEDOUT;
>>> out:
>>> - call_usermodehelper_freeinfo(sub_info);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> unlock:
>>> helper_unlock();
>>> return retval;
>>> }
>
> This looks obviously wrong. You also need to move *sub_info->complete
> into subprocess_info.
>
Yes I caught that with farther testing. A stupid mistake. Again RFC
>> Author: Oleg Nesterov <oleg@...hat.com>
>> Date: Wed Mar 21 10:57:41 2012 +1100
>>
>> usermodehelper: implement UMH_KILLABLE
>>
>> Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC. The
>> caller must ensure that subprocess_info->path/etc can not go away until
>> call_usermodehelper_freeinfo().
>> ...
>>
>> I think that my patch above does a much better/cleaner lifetime management of the
>> subprocess_info struct, with the use of a kref.
>
> This is subjective, you know ;) I specially tried to avoid the
> refcounting.
>
Why?
The all kref_ abstraction comes to a simple atomic_inc/dec.
Which is in theory a more lite wait operation then xchg, no memory
bus locking, and in practice is the same. (Except on massively
parallel machines which it is)
The last time I submitted a patch with xchg I got clobbered on the head
so strong that I ran away from it as-fast-as-I-could.
For objects life cycle the kref_get/put pattern is a much simpler
more common and understood style in the Kernel, if just for that sake.
I don't see why it needs to be "avoided".
> In any case. I do not know why do we need timeout, but this is
> orthogonal to KILLABLE. Please redo your patches on top of -mm
> tree? Please note that in this case the change becomes trivial.
>
Yes you are right.
> And please explain the use-case for the new API.
>
The reason I need a timeout, is because: Calling from Kernel to
user-mode gives me the creeps. I don't trust user-mode programs,
specially when in final Control by a Distribution. Bugs can happen
and deadlocks are a possibility. An operation that should take
1/2 second and could max to at most 1.5 seconds, I can say in
confidence that after 15 seconds, a dmesg and a clean error recovery
is better. I don't want any chance of D stating IO operations.
(My code is in the IO path, either fsync or write-back. There is not
always a killable target)
The code path I have is easily recoverable, and if not for the scary
message in dmesg the user will not notice.
So in short it is so I can sleep at night.
>> Anyway I thought that we are not
>> suppose to use xhcg() since it is not portable to all ARCHs. ;-)
>
> Hmm. For example, exit_mm() does xchg().
>
Again, Personally I like xchg, but not here, not for an object
life-time management. Two threads share a structure, that needs
to go when the last one ends. That's a kref_ abstraction. Kref,
inside, could be implemented with xchg(), But that's not for me to
decide, I should use good abstractions when they exist and do the
job (well). No?
> Oleg.
>
Thanks Oleg, yes I'll rebase, Is there an mm git tree? I could not
find it on git://git.kernel.org/pub/scm/ . mean while I'll use a
random linux-next/master point. Which should do the job.
Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists