[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230216022630.h6mfl5cdqt5vbjno@treble>
Date: Wed, 15 Feb 2023 18:26:30 -0800
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Petr Mladek <pmladek@...e.com>
Cc: live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
Seth Forshee <sforshee@...italocean.com>,
Peter Zijlstra <peterz@...radead.org>,
Song Liu <song@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Joe Lawrence <joe.lawrence@...hat.com>,
Miroslav Benes <mbenes@...e.cz>,
Jiri Kosina <jikos@...nel.org>, Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH 2/3] livepatch,sched: Add livepatch task switching to
cond_resched()
On Wed, Feb 15, 2023 at 02:30:36PM +0100, Petr Mladek wrote:
> > static inline int _cond_resched(void)
> > {
> > + klp_sched_try_switch();
> > return __cond_resched();
>
> My only concern is if it might cause any performance problems.
>
> On one hand, cond_resched() is used in code paths that are slow
> on its own. Also it will do nothing most of the time.
>
> On the other hand, cond_resched() is typically used in cycles.
> One cycle might be fast. The code might be slow because there
> are too many cycles. Repeating the same failing test might
> prolong the time significantly.
Yes, but it should hopefully be very rare to patch a function in the
call stack of a kthread loop. In general it's a good idea for the patch
author to avoid that.
> An idea is to try the switch only when it was not done during
> a real schedule. Something like:
>
> static inline int _cond_resched(void)
> {
> int scheduled;
>
> scheduled = __cond_resched();
> if (scheduled)
> klp_sched_try_switch();
>
> return scheduled();
> }
>
> But it would make it less reliable/predictable. Also it won't work
> in configurations when cond_resched() is always a nop.
>
> I am probably too careful. We might keep it simple until any real
> life problems are reported.
If we can get away with it, I much prefer the simple unconditional
klp_sched_try_switch() because of the predictability and quickness with
which the kthread gets patched.
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -76,6 +96,8 @@ static void klp_complete_transition(void)
> > klp_transition_patch->mod->name,
> > klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> >
> > + klp_cond_resched_disable();
> > +
>
> Nit: Strictly speaking, this is not needed when klp_complete_transition()
> is called from klp_cancel_transition(). In this case,
> klp_cond_resched_enable() was not called. So it might be moved into
> klp_try_complete_transition().
Argh, I always forget about that pesky klp_cancel_transition().
> More important thing, thinking loudly:
>
> We need to make sure that no task is in the middle
> klp_cond_resched_disable() when we modify anything that is used there.
>
> We seem to be on the safe side in klp_complete_transition(). We are
> here only when all tasks have TIF_PATCH_PENDING cleared. In this case,
> __klp_sched_try_switch() just returns. Also it calls
> klp_synchronize_transition() so that all tasks finish the critical part
> in __klp_sched_try_switch() before any new transition starts.
>
> But it is not the case in klp_reverse_transition(). It modifies
> klp_target_state() when __klp_sched_try_switch might be in the middle
> of klp_check_stack() and it might give wrong result.
>
> klp_reverse_transition() already solves similar race with
> klp_update_patch_state() by clearing all TIF_PATCH_PENDING flags
> and calling klp_synchronize_transition(). We just need to do
> it earlier. Something like:
Yes! Thanks, I can always count on you to find the race conditions ;-)
This highlights the similarities between klp_target_state(current) and
__klp_sched_try_switch(), they both access TIF_PATCH_PENDING
out-of-band.
Also, I'll update the comment in klp_copy_process(). It should be safe
for with __klp_sched_try_switch() for the same reason as
klp_update_patch_state(current): they all only work on 'current'.
--
Josh
Powered by blists - more mailing lists