linux-kernel - Re: [PATCH 0/1] Fix race in ipmi timer cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <15517bfc-3022-509d-15ea-c2b8e7a91e0a@gmail.com>
Date:   Wed, 28 Aug 2019 20:53:47 -0400
From:   Jes Sorensen <jes.sorensen@...il.com>
To:     minyard@....org
Cc:     linux-kernel@...r.kernel.org,
        openipmi-developer@...ts.sourceforge.net, kernel-team@...com
Subject: Re: [PATCH 0/1] Fix race in ipmi timer cleanup

On 8/28/19 6:32 PM, Corey Minyard wrote:
> On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
>> From: Jes Sorensen <jsorensen@...com>
>>
>> I came across this in 4.16, but I believe the bug is still present
>> in current 5.x, even if it is less likely to trigger.
>>
>> Basially stop_timer_and_thread() only calls del_timer_sync() if
>> timer_running == true. However smi_mod_timer enables the timer before
>> setting timer_running = true.
> 
> All the modifications/checks for timer_running should be done under
> the si_lock.  It looks like a lock is missing in shutdown_smi(),
> probably starting before setting interrupt_disabled to true and
> after stop_timer_and_thread.  I think that is the right fix for
> this problem.

Hi Corey,

I agree a spin lock could deal with this specific issue too, but calling
del_timer_sync() is safe to call on an already disabled timer. The whole
flagging of timer_running really doesn't make much sense in the first
place either.

As for interrupt_disabled that is even worse. There's multiple places in
the code where interrupt_disabled is checked, some of them are not
protected by a spin lock, including shutdown_smi() where you have this
sequence:

        while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
                poll(smi_info);
                schedule_timeout_uninterruptible(1);
        }
        if (smi_info->handlers)
                disable_si_irq(smi_info);
        while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
                poll(smi_info);
                schedule_timeout_uninterruptible(1);
        }

In this case you'll have to drop and retake the long several times.

You also have this call sequence which leads to disable_si_irq() which
checks interrupt_disabled:

  flush_messages()
    smi_event_handler()
      handle_transaction_done()
        handle_flags()
          alloc_msg_handle_irq()
            disable_si_irq()

{disable,enable}_si_irq() themselves are racy:

static inline bool disable_si_irq(struct smi_info *smi_info)
{
        if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
                smi_info->interrupt_disabled = true;

Basically interrupt_disabled need to be atomic here to have any value,
unless you ensure to have a spin lock around every access to it.

Cheers,
Jes