lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20221224023158.GA254443@nyquist.nev> Date: Sat, 24 Dec 2022 15:31:58 +1300 From: Daniel Beer <dlbeer@...il.com> To: Jonathan Cameron <jic23@...nel.org> Cc: linux-iio@...r.kernel.org, Lars-Peter Clausen <lars@...afoo.de>, Michael Hennerich <Michael.Hennerich@...log.com>, linux-kernel@...r.kernel.org Subject: Re: [PATCH] ad_sigma_delta: fix race between IRQ and completion On Fri, Dec 23, 2022 at 04:16:59PM +0000, Jonathan Cameron wrote: > > ad_sigma_delta waits for a conversion which terminates with the firing > > of a one-shot IRQ handler. In this handler, the interrupt is disabled > > and a completion is set. > > > > Meanwhile, the thread that initiated the conversion is waiting on the > > completion to know when the conversion happened. If this wait times out, > > the conversion is aborted and IRQs are disabled. But the IRQ may fire > > anyway between the time the completion wait times out and the disabling > > of interrupts. If this occurs, we get a double-disabled interrupt. > > Ouch and good work tracking it down. just to check, did you see this > bug happen in the wild or spotted by code inspection? Hi Jonathan, Thanks for reviewing. It was by inspection -- I'd originally thought about it and fixed in in a similar way in this patch: https://lore.kernel.org/all/61dd3e0c.1c69fb81.cea15.8d98@mx.google.com/ But since that's not applied, I thought I'd better put together a separate fix for the time being. > Given that timeout generally indicates hardware failure, I'm not sure > how critical this is to fix. Probably not very critical. I think you'd have to be pretty unlucky to encounter it. > Is this fix sufficient? If the interrupt is being handled on a different > CPU to the caller of this function, I think we can still race enough that > this fails to fix it up. Might need a spinlock to prevent that. > > CPU 0 CPU 1 > ad_sd_data_rdy_trig_poll() ad_sd_wait_and_disable() > > //wait_for_completion ends > > Interrupt > disable_irq() > if (sigma-delta->irq_dis) !true > else > sigma_delta->irq_dis = true > > disable_irq_nosync(irq) > sigma_delta->irq_dis = true; > > So we still end up with a doubly disabled irq. Add a spinlock to make the > disable and the setting of sigma_delta->irq_dis atomic then it should all be fine. My understanding is that the suffix-less version of disable_irq would wait for all running handlers on other CPUs (i.e. ad_sd_data_rdy_trig_poll) to finish before proceeding, which would prevent this from happening. Is that not the case? But now that you mention it, there is another small problem: in the case where the conversion doesn't time out, the interrupt handler will call complete() and then perform some operations on the struct ad_sigma_delta. This is always ok on a single processor, but if there are multiple CPUs there is possibly a brief period where both the interrupt handler and the waiting thread are accessing the ad_sigma_delta struct without synchronization between them. Not sure if that's really a problem in practice, but I think an easy way to rule it out would just be to move the complete() call to the bottom of the handler and make sure it doesn't touch the structure again after that. Cheers, Daniel -- Daniel Beer <dlbeer@...il.com> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
Powered by blists - more mailing lists