lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221224023158.GA254443@nyquist.nev>
Date:   Sat, 24 Dec 2022 15:31:58 +1300
From:   Daniel Beer <dlbeer@...il.com>
To:     Jonathan Cameron <jic23@...nel.org>
Cc:     linux-iio@...r.kernel.org, Lars-Peter Clausen <lars@...afoo.de>,
        Michael Hennerich <Michael.Hennerich@...log.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ad_sigma_delta: fix race between IRQ and completion

On Fri, Dec 23, 2022 at 04:16:59PM +0000, Jonathan Cameron wrote:
> > ad_sigma_delta waits for a conversion which terminates with the firing
> > of a one-shot IRQ handler. In this handler, the interrupt is disabled
> > and a completion is set.
> > 
> > Meanwhile, the thread that initiated the conversion is waiting on the
> > completion to know when the conversion happened. If this wait times out,
> > the conversion is aborted and IRQs are disabled. But the IRQ may fire
> > anyway between the time the completion wait times out and the disabling
> > of interrupts. If this occurs, we get a double-disabled interrupt.
> 
> Ouch and good work tracking it down.  just to check, did you see this
> bug happen in the wild or spotted by code inspection?

Hi Jonathan,

Thanks for reviewing. It was by inspection -- I'd originally thought
about it and fixed in in a similar way in this patch:

    https://lore.kernel.org/all/61dd3e0c.1c69fb81.cea15.8d98@mx.google.com/

But since that's not applied, I thought I'd better put together a
separate fix for the time being.

> Given that timeout generally indicates hardware failure, I'm not sure
> how critical this is to fix.

Probably not very critical. I think you'd have to be pretty unlucky to
encounter it.

> Is this fix sufficient?  If the interrupt is being handled on a different
> CPU to the caller of this function, I think we can still race enough that
> this fails to fix it up.  Might need a spinlock to prevent that.
> 
>   CPU 0                                        CPU 1
> ad_sd_data_rdy_trig_poll()               ad_sd_wait_and_disable()
>                                        
>                                          //wait_for_completion ends
> 					
> Interrupt
>                                           disable_irq()
> 					  if (sigma-delta->irq_dis) !true	
> 					  else
> 						sigma_delta->irq_dis = true
> 
> disable_irq_nosync(irq)
> sigma_delta->irq_dis = true;
> 
> So we still end up with a doubly disabled irq.  Add a spinlock to make the
> disable and the setting of sigma_delta->irq_dis atomic then it should all be fine.                

My understanding is that the suffix-less version of disable_irq would
wait for all running handlers on other CPUs (i.e.
ad_sd_data_rdy_trig_poll) to finish before proceeding, which would
prevent this from happening. Is that not the case?

But now that you mention it, there is another small problem: in the case
where the conversion doesn't time out, the interrupt handler will call
complete() and then perform some operations on the struct
ad_sigma_delta.

This is always ok on a single processor, but if there are multiple CPUs
there is possibly a brief period where both the interrupt handler and
the waiting thread are accessing the ad_sigma_delta struct without
synchronization between them.

Not sure if that's really a problem in practice, but I think an easy way
to rule it out would just be to move the complete() call to the bottom
of the handler and make sure it doesn't touch the structure again after
that.

Cheers,
Daniel

-- 
Daniel Beer <dlbeer@...il.com> http://dlbeer.co.nz/
PGP: BA6E 0B26 1F89 246C E3F3  C910 1E58 C43A 160A 553B

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ