lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1354221519.7776.10.camel@misato.fc.hp.com>
Date:	Thu, 29 Nov 2012 13:38:39 -0700
From:	Toshi Kani <toshi.kani@...com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Vasilis Liaskovitis <vasilis.liaskovitis@...fitbricks.com>,
	linux-acpi@...r.kernel.org, Wen Congyang <wency@...fujitsu.com>,
	Wen Congyang <wencongyang@...il.com>,
	isimatu.yasuaki@...fujitsu.com, lenb@...nel.org,
	gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on
 rebind scenario

On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > 
> > > [...]
> > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > friends and I think there's a way to address all of these problems
> > > > without big redesign (for now).
> > > > 
> > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > 
> > > > (1) It will be clear by default.
> > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > >     it's safe to physically remove the device after the .remove().
> > > > 
> > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > > should be unregistered anyway if that error code is to be returned.
> > > > 
> > > > [By the way, do you know where we free the memory allocated for struct
> > > >  acpi_device objects?]
> > > > 
> > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > store it, but continue the trimming normally and finally it should return that
> > > > error code to acpi_bus_hot_remove_device().
> > > 
> > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > fails). Trimming is not continued. 
> > > 
> > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > more a general question, not specific to the eject_forbidden suggestion)
> > 
> > Your change makes sense to me.  At least until we have rollback code in
> > place, we need to fail as soon as we hit an error.
> 
> Are you sure this makes sense?  What happens to the devices that we have
> trimmed already and then there's an error?  Looks like they are just unusable
> going forward, aren't they?

Yes, the devices trimmed already are released from the kernel, and their
memory ranges become unusable.  This is bad.  But I do not think we
should trim further to make more devices unusable after an error. 


> > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > we attempted to eject) and notify the firmware about the failure.
> > > 
> > > sounds like this rollback needs to be implemented in any solution we choose
> > > to implement, correct?
> > 
> > Yes, rollback is necessary.  But I do not think we need to include it
> > into your patch, though.
> 
> As the first step, we should just trim everything and then return an error
> code in my opinion.

But we cannot trim devices with kernel memory.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ