[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m235983p58.fsf@gmail.com>
Date: Thu, 22 Dec 2022 14:50:21 +0800
From: Schspa Shi <schspa@...il.com>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org,
syzbot+10d19d528d9755d9af22@...kaller.appspotmail.com,
syzbot+70d5d5d83d03db2c813d@...kaller.appspotmail.com,
syzbot+83cb0411d0fcf0a30fc1@...kaller.appspotmail.com
Subject: Re: [PATCH] umh: fix UAF when the process is being killed
Luis Chamberlain <mcgrof@...nel.org> writes:
> On Thu, Dec 22, 2022 at 01:45:46PM +0800, Schspa Shi wrote:
>>
>> Schspa Shi <schspa@...il.com> writes:
>>
>> > Luis Chamberlain <mcgrof@...nel.org> writes:
>> >
>> >> Peter, Ingo, Steven would like you're review.
>> >>
>> >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote:
>> >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote:
>> >>> > I'd like to upload a V2 patch with the new solution if you prefer the
>> >>> > following way.
>> >>> >
>> >>> > diff --git a/kernel/umh.c b/kernel/umh.c
>> >>> > index 850631518665..8023f11fcfc0 100644
>> >>> > --- a/kernel/umh.c
>> >>> > +++ b/kernel/umh.c
>> >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
>> >>> > /* umh_complete() will see NULL and free sub_info */
>> >>> > if (xchg(&sub_info->complete, NULL))
>> >>> > goto unlock;
>> >>> > + /*
>> >>> > + * kthreadd (or new kernel thread) will call complete()
>> >>> > + * shortly.
>> >>> > + */
>> >>> > + wait_for_completion(&done);
>> >>> > }
>> >>>
>> >>> Yes much better. Did you verify it fixes the splat found by the bots?
>> >>
>> >> Wait, I'm not sure yet why this would fix it... I first started thinking
>> >> that this may be a good example of a Coccinelle SmPL rule, something like:
>> >>
>> >> DECLARE_COMPLETION_ONSTACK(done);
>> >> foo *foo;
>> >> ...
>> >> foo->completion = &done;
>> >> ...
>> >> queue_work(system_unbound_wq, &foo->work);
>> >> ....
>> >> ret = wait_for_completion_state(&done, state);
>> >> ...
>> >> if (!ret)
>> >> S
>> >> ...
>> >> +wait_for_completion(&done);
>> >>
>> >> But that is pretty complex, and while it may be useful to know how many
>> >> patterns we have like this, it begs the question if generalizing this
>> >> inside the callers is best for -ERESTARTSYS condition is best. What
>> >> do folks think?
>> >>
>> >> The rationale here is that if you queue stuff and give access to the
>> >> completion variable but its on-stack obviously you can end up with the
>> >> queued stuff complete() on a on-stack variable. The issue seems to
>> >> be that wait_for_completion_state() for -ERESTARTSYS still means
>> >> that the already scheduled queue'd work is *about* to run and
>> >> the process with the completion on-stack completed. So we race with
>> >> the end of the routine and the completion on-stack.
>> >>
>> >> It makes me wonder if wait_for_completion() above really is doing
>> >> something more, if it is just helping with timing and is still error
>> >> prone.
>> >>
>> >> The queued work will try the the completion as follows:
>> >>
>> >> static void umh_complete(struct subprocess_info *sub_info)
>> >> {
>> >> struct completion *comp = xchg(&sub_info->complete, NULL);
>> >> /*
>> >> * See call_usermodehelper_exec(). If xchg() returns NULL
>> >> * we own sub_info, the UMH_KILLABLE caller has gone away
>> >> * or the caller used UMH_NO_WAIT.
>> >> */
>> >> if (comp)
>> >> complete(comp);
>> >> else
>> >> call_usermodehelper_freeinfo(sub_info);
>> >> }
>> >>
>> >> So the race is getting -ERESTARTSYS on the process with completion
>> >> on-stack and the above running complete(comp). Why would sprinkling
>> >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state)
>> >> fix this UAF?
>> >
>> > The wait_for_completion(&done) is added when xchg(&sub_info->complete,
>> > NULL) return NULL. When it returns NULL, it means the umh_complete was
>> > using the completion variable at the same time and will call complete
>> > in a very short time.
>> >
>> Hi Luis:
>>
>> Is there any further progress on this problem? Does the above
>> explanation answer your doubts?
>
> I think it would be useful to proove your work for you to either
> hunt with SmPL coccinelle a similar flaw / how rampant this issue
> is and then also try to create the same UAF there and prove how
> your changes fixes it.
>
OK, but it will take some time.
> Luis
--
BRs
Schspa Shi
Powered by blists - more mailing lists