lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190829181536.GC3294@t560>
Date:   Thu, 29 Aug 2019 13:15:36 -0500
From:   Corey Minyard <minyard@....org>
To:     Jes Sorensen <jes.sorensen@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        openipmi-developer@...ts.sourceforge.net, kernel-team@...com
Subject: Re: [PATCH 0/1] Fix race in ipmi timer cleanup

On Wed, Aug 28, 2019 at 08:53:47PM -0400, Jes Sorensen wrote:
> On 8/28/19 6:32 PM, Corey Minyard wrote:
> > On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
> >> From: Jes Sorensen <jsorensen@...com>
> >>
> >> I came across this in 4.16, but I believe the bug is still present
> >> in current 5.x, even if it is less likely to trigger.
> >>
> >> Basially stop_timer_and_thread() only calls del_timer_sync() if
> >> timer_running == true. However smi_mod_timer enables the timer before
> >> setting timer_running = true.
> > 
> > All the modifications/checks for timer_running should be done under
> > the si_lock.  It looks like a lock is missing in shutdown_smi(),
> > probably starting before setting interrupt_disabled to true and
> > after stop_timer_and_thread.  I think that is the right fix for
> > this problem.
> 
> Hi Corey,
> 
> I agree a spin lock could deal with this specific issue too, but calling
> del_timer_sync() is safe to call on an already disabled timer. The whole
> flagging of timer_running really doesn't make much sense in the first
> place either.
> 
> As for interrupt_disabled that is even worse. There's multiple places in
> the code where interrupt_disabled is checked, some of them are not
> protected by a spin lock, including shutdown_smi() where you have this
> sequence:
> 
>         while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
>                 poll(smi_info);
>                 schedule_timeout_uninterruptible(1);
>         }
>         if (smi_info->handlers)
>                 disable_si_irq(smi_info);
>         while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
>                 poll(smi_info);
>                 schedule_timeout_uninterruptible(1);
>         }

This one doesn't matter.  At this point the driver is single-threaded,
no interrupts, timeouts, or calls from the upper layer can happen.

> 
> In this case you'll have to drop and retake the long several times.
> 
> You also have this call sequence which leads to disable_si_irq() which
> checks interrupt_disabled:
> 
>   flush_messages()
>     smi_event_handler()
>       handle_transaction_done()
>         handle_flags()
>           alloc_msg_handle_irq()
>             disable_si_irq()

This one only happens in run-to-completion mode.  Which is strange,
but a number of people had issues with getting into a new kernel before
the watchdog timeout went off, so the run-to-completion mode runs at
panic time so the driver can run without scheduling so it can extend
the watchdog and store panic information in the IPMI log.

So you actually *don't* want a lock here, since the panic may have
occurred when the IPMI driver was holding the lock.

> 
> {disable,enable}_si_irq() themselves are racy:
> 
> static inline bool disable_si_irq(struct smi_info *smi_info)
> {
>         if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
>                 smi_info->interrupt_disabled = true;
> 
> Basically interrupt_disabled need to be atomic here to have any value,
> unless you ensure to have a spin lock around every access to it.

It needs to be atomic, yes, but I think just adding the spinlock like
I suggested will work.  You are right, the check for timer_running is
not necessary here, and I'm fine with removing it, but there are other
issues with interrupt_disabled (as you said) and with memory ordering
in the timer case.  So even if you remove the timer running check, the
lock is still required here.

It also might be a good idea to add a WARN_ON() to smi_mod_timer() and
alloc_msg_handle_irq() if the lock is not held, just to be sure.

Thanks,

-corey

> 
> Cheers,
> Jes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ