linux-kernel - Re: [PATCH] arm/perf: Fix pmu percpu irq handling at hotplug.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160826094800.GD13554@arm.com>
Date:   Fri, 26 Aug 2016 10:48:00 +0100
From:   Will Deacon <will.deacon@....com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Yabin Cui <yabinc@...gle.com>, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] arm/perf: Fix pmu percpu irq handling at hotplug.

Mark,

On Fri, Aug 19, 2016 at 03:25:14PM +0100, Mark Rutland wrote:
> On Thu, Aug 18, 2016 at 01:24:38PM -0700, Yabin Cui wrote:
> >    If the cpu pmu is using a percpu irq:                                    
> >          
> >    1. When a cpu is down, we should disable pmu irq on                      
> >    that cpu. Otherwise, if the cpu is still down when                        
> >    the last perf event is released, the pmu irq can't                        
> >    be freed. Because the irq is still enabled on the                        
> >    offlined cpu. And following perf_event_open()                            
> >    syscalls will fail.                                                      
> >
> >    2. When a cpu is up, we should enable pmu irq on                          
> >    that cpu. Otherwise, profiling tools can't sample                        
> >    events on the cpu before all perf events are                              
> >    released, because pmu irq is disabled on that cpu.                        
> 
> It also looks like if a CPU is taken down while events are active, a
> non-percpu interrupt will get migrated to another CPU, yet we don't
> retarget it if/when the CPU is brought back online. So we have at least
> three bugs with IRQ manipulation around hotplug.
> 
> Rather than adding more moving parts to the IRQ manipulation logic, I'd
> rather we rework the IRQ manipulation logic to:
> 
> * At probe time, request all the interrupts. If we can't, bail out and
>   fail the probe.
> 
> * Upon hotplug in (and at probe time), configure the affinity and
>   enable the relevant interrupt(s).
> 
> * Upon hotplug out, disable the relevant interrupt.
> 
> That way we have fewer moving parts that need to interact with each
> other (e.g. we don't need to inhibit hotplug in places), and we know
> early whether things will or will not work.
> 
> The {reserve,release}_hardware dance is largely a legacy thing that was
> there to cater for sharing the PMU with other subsystems, and we should
> be able to get rid of it.
> 
> I'm taking a look at doing the above, but I don't yet have a patch.

Any update on this? I'd quite like to do *something* to fix the issues
reported here.

Will