lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Oct 2021 09:36:37 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Mark Brown <broonie@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>
Cc:     Lino Sanfilippo <LinoSanfilippo@....de>, f.fainelli@...il.com,
        rjui@...adcom.com, sbranden@...adcom.com,
        bcm-kernel-feedback-list@...adcom.com, nsaenz@...nel.org,
        linux-spi@...r.kernel.org, linux-rpi-kernel@...ts.infradead.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        p.rosenberger@...bus.com, linux-integrity@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] spi: bcm2835: do not unregister controller in shutdown
 handler

On 10/4/21 9:31 AM, Mark Brown wrote:
> On Mon, Oct 04, 2021 at 12:44:36PM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 04, 2021 at 03:12:20PM +0100, Mark Brown wrote:
>>> On Mon, Oct 04, 2021 at 10:17:56AM -0300, Jason Gunthorpe wrote:
> 
>>>> When something like kexec happens we need the machine to be in a state
>>>> where random DMA's are not corrupting memory.
> 
>>> That's all well and good but there's no point in implementing something
>>> half baked that's opening up a whole bunch of opportunities to crash the
>>> system if more work comes in after it's half broken the device setup.  
> 
>> Well, that is up to the driver implementing this. It looks like device
>> shutdown is called before the userspace is all nuked so yes,
>> concurrency with userspace is a possible concern here.
> 
> It's not just userspace that can initiate things - interrupts are also
> an issue, someone could press a button or whatever.  Frankly for SPI the
> quiescing part doesn't seem like logic that should be implemented in
> drivers, it's a subsystem level thing since there's nothing driver
> specific about it.

Surely the SPI subsystem can help avoid queuing new transfers towards
the SPI controller while the controller can shut down the resources that
only it knows about.

> 
>>>> Due to the emergency sort of nature it is not appropriate to do
>>>> locking complicated sorts of things like struct device unregistrations
>>>> here.
> 
>>> That's just not what's actually implemented in a bunch of places, nor
>>> something one would infer from the documentation ("Called at shut-down
>>> to quiesce the device", no mention of emergency cases which I'd guess
>>> would just be kdump) - 
> 
>> Drivers mis understanding stuff is not new..
> 
> Not just drivers, entire subsystems.  And like I say given the
> documentation I'd be hard pressed to say that it's a misunderstanding.
> 
>>> that's a different thing and definitely abusing the API.  I would guess
>>> that a good proportion of people implementing it are more worried about
>>> clean system shutdown than they are about kdump.
> 
>> The other important case is to get the device cleaned up enough to
>> pass back to firmware for platforms that use a firmware
>> shutdown/reboot path.
> 
> Right, so the other cases I'm aware of are doing pretty much that -
> bringing things down to a state where the system can reboot cleanly.
> That can definitely include things like blocking for some hardware, and
> you're going to need some concurrency handling which means a combination
> of locking and infrequently tested lockless code paths.
> 
> In the case of this specific driver I'm still not clear that the best
> thing isn't just to delete the shutdown callback and let any ongoing
> transfers complete, though I guess there'd be issues in kexec cases with
> long enough tansfers.

No please don't, I should have arguably justified the reasons why
better, but the main reason is that one of the platforms on which this
driver is used has received extensive power management analysis and
changes, and shutting down every bit of hardware, including something as
small as a SPI controller, and its clock (and its PLL) helped meet
stringent power targets.

TBH, I still wonder why we have .shutdown() and we simply don't use
.remove() which would reduce the amount of work that people have to do
validate that the hardware is put in a low power state and would also
reduce the amount of burden on the various subsystems.
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ