linux-kernel - Re: [PATCH] kgdb: Resolve races during kgdb_io_register/unregister

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=UvBEW_XhsbgdSRYqPLywFDQg=nh0bX=UMtERBoxW9hmg@mail.gmail.com>
Date:   Tue, 30 Jun 2020 15:26:45 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Jason Wessel <jason.wessel@...driver.com>,
        Petr Mladek <pmladek@...e.com>,
        kgdb-bugreport@...ts.sourceforge.net,
        LKML <linux-kernel@...r.kernel.org>,
        Patch Tracking <patches@...aro.org>
Subject: Re: [PATCH] kgdb: Resolve races during kgdb_io_register/unregister_module

Hi,

On Tue, Jun 30, 2020 at 8:05 AM Daniel Thompson
<daniel.thompson@...aro.org> wrote:
>
> On Mon, Jun 29, 2020 at 02:03:52PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Jun 29, 2020 at 10:15 AM Daniel Thompson
> > <daniel.thompson@...aro.org> wrote:
> > >
> > > Currently kgdb_register_callbacks() and kgdb_unregister_callbacks()
> > > are called outside the scope of the kgdb_registration_lock. This
> > > allows them to race with each other. This could do all sorts of crazy
> > > things up to and including dbg_io_ops becoming NULL partway through the
> > > execution of the kgdb trap handler (which isn't allowed and would be
> > > fatal).
> > >
> > > Fix this by bringing the trap handler setup and teardown into the scope
> > > of the registration lock.
> > >
> > > Signed-off-by: Daniel Thompson <daniel.thompson@...aro.org>
> > > ---
> > >  kernel/debug/debug_core.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > index 9e5934780f41..9799f2c6dc94 100644
> > > --- a/kernel/debug/debug_core.c
> > > +++ b/kernel/debug/debug_core.c
> > > @@ -1117,9 +1117,8 @@ int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops)
> > >
> > >         dbg_io_ops = new_dbg_io_ops;
> > >
> > > -       spin_unlock(&kgdb_registration_lock);
> > > -
> > >         if (old_dbg_io_ops) {
> > > +               spin_unlock(&kgdb_registration_lock);
> > >                 old_dbg_io_ops->deinit();
> > >                 return 0;
> > >         }
> > > @@ -1129,6 +1128,8 @@ int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops)
> > >         /* Arm KGDB now. */
> > >         kgdb_register_callbacks();
> > >
> > > +       spin_unlock(&kgdb_registration_lock);
> >
> > From looking at code paths, I think this is illegal, isn't it?  You're
> > now calling kgdb_register_callbacks() while holding a spinlock, but:
> >
> > kgdb_register_callbacks()
> > -> register_console()
> > --> console_lock()
> > ---> might_sleep()
> > ----> <boom!>
>
> Thanks.
>
> I very nearly didn't press "Send" yesterday because I was worried I was
> rushing it too much (in order to avoid forgetting it ;-) ). Should have
> listened to myself!
>
>
> > I'm a little curious about the exact race we're trying to solve.
> > Calling unregister on an IO module before register even finished seems
> > like an error on the caller, so I guess it would be calling register
> > from a 2nd thread for a different IO module while the first thread was
> > partway through unregistering?  Even that seems awfully sketchy since
> > you're risking registering a 2nd IO ops while the first is still there
> > and that's illegal enough that we do a pr_err() for it (though we
> > don't crash), but let's say we're trying to solve that one.
>
> I didn't follow all the possible paths. Utlimately the
> (un)register_callbacks() functions use a flag variable without a lock
> and that can interact in lots of different ways.
>
> To be honest none are especially likely because the normal case is to
> register once during boot and never unregister. However we can trigger
> register/unregister from userspace so I think they can happen
> in parallel.

This is for kgdboc or one of the other IO modules?  I do know that, at
least for kgdboc, we have the "config_mutex".  I won't promise that
there are no bugs there but in the very least it should mostly prevent
a host of these types of issues.  ...so I guess you'd have to in
parallel be spamming a register of a non kgdboc IO module together
with an unregister of kgdboc?


> Double unregister can lead to some especially nasty schedules...
> although they still remain pretty unlikely since we need the double
> unregister to coincide with a breakpoint:
>
>
> kgdb_unregister_callbacks()     kgdb_unregister_callbacks()
>   .                             .
>   test flag                     .
>   set flag to 0                 .
>   .                             test flag
>   .                             spin_lock()
> *** kgdb trap ***                 .
>   . paranoid dbg_io_ops check     .
>   .                             dbg_io_ops = NULL
>   . stop other CPUs
>   . try to use NULL dbg_io_ops
>
>
> I have drawn the kgdb trap in the first column because otherwise things
> get too wide but the trap could trigger on any CPU in the system and
> provoke the problem.
>
>
> >
> > Looking at it closely, I _think_ the only race in this case is if the
> > one we're trying to unregister had a deinit() function and we going to
> > replace it?  If it didn't have a deinit function:
> >
> > cpu1 (unregister)                 cpu2 (register):
> > -----------------                 ----------------------
> > kgdb_unregister_callbacks()
> >                                   spin_lock() <got>
> > spin_lock() <blocked>
> >                                   if (old_dbg_io_ops) <true>
> >                                     if (has dinit) <false>
> >                                       print error
> >                                       spin_unlock()
> >                                       return -EBUSY
> > <finish unregister>
> >
> > The above is fine and is the same thing that would happen if the
> > whole register function ran before the unregister even started, right?
> >
> > Also: if the unregister won the race that should also be fine.
> >
> > So really the problem is this:
> >
> > cpu1 (unregister)                 cpu2 (register):
> > -----------------                 ----------------------
> > kgdb_unregister_callbacks()
> >                                   spin_lock() <got>
> > spin_lock() <blocked>
> >                                   if (old_dbg_io_ops) <true>
> >                                     if (has dinit) <true>
> >                                       print Replacing
> >                                   init new IO ops
> >                                   spin_unlock()
> >                                   if (old_dbg_io_ops) <true>
> >                                     finish deinit of old
> >                                     return true
> > WARN_ON() <hits and shouts!>
> > dbg_io_ops = NULL
> > spin_unlock()
> > if (deinit) <true>
> >   double-call to deinit of old
> >
> > So in this case we'll hit a WARN_ON(), incorrectly unregister the new
> > IO ops, and call deinit twice.
>
> To be honest I was simply working on "it is racy" and "there's not a
> good reason to allow that", especially as we start to develop tools to
> bring races to the surfaces someone will yell at us about it sooner or
> later ;-).
>
> Of course, implementing it correctly would have been better...

Yeah, still wouldn't hurt to try to figure out how to make it cleaner.  :-)

-Doug