lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2995970.hSi3JdcGZF@vostro.rjw.lan>
Date:	Tue, 13 Aug 2013 02:45:55 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Toshi Kani <toshi.kani@...com>
Cc:	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	rafael.j.wysocki@...el.com, vasilis.liaskovitis@...fitbricks.com,
	linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
	tangchen@...fujitsu.com, wency@...fujitsu.com
Subject: Re: Cannot hot remove a memory device

On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
> On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> > On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > >  :
> > > > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > > > > > handle is already set\n").  When two ACPI memory objects associate with
> > > > > > > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> > > > > > > 
> > > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > > 
> > > > > > > There's no way for a single physical device to have two different ACPI
> > > > > > > "companions".  It looks like the memory blocks should be 64 M each in that
> > > > > > > case.  Or we need to create two child devices for each memory block and
> > > > > > > associate each of them with an ACPI object.  That would lead to complications
> > > > > > > in the user space interface, though.
> > > > > > 
> > > > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > > > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > > > > > this case gracefully.
> > > > > 
> > > > > FYI: I have submitted the patch blow to close this part of the issue...
> > > > > 
> > > > > https://lkml.org/lkml/2013/8/8/396
> > > > 
> > > > That looks good to me, but we'd still need to make it possible to have
> > > > memory blocks smaller than 128 MB ...
> > > 
> > > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > > so, it should not be a problem since a memory block device won't be
> > > created when add_memory() fails with the change above.  So, there is no
> > > binding to be done.  If you mean add_memory() needs to be able to handle
> > > a smaller range, that's quite a tough one unless we make the section
> > > size smaller.
> > > 
> > > BTW, when add_memory() fails, the memory hot-add request still succeeds
> > > with no driver attached.  This seems logical, but the added device is
> > > useless when no handler is attached.  And it does not allow ejecting the
> > > device with no handler.  I am not too worry about this since this is a
> > > rare case, but it reminded me that the framework won't handle rollback.
> > 
> > I'm not sure which rollback you mean.  During removal?
> 
> I meant rollback during hot-add.  Ideally, a device should be either
> added in usable state (success) or failed back to the original state
> (rollback).  Added in un-usable state is not really a success for users,
> and creates an odd state to deal with.  But it is still a LOT better
> than crashing the system.  So, I think this outcome is reasonable on
> this framework because adding rollback at this point will complicate the
> things unnecessarily.
> 
> > There are two slight problems here in my view.  First, even if the device
> > cannot be ejected directly, it still will be removed when its parent is
> > ejected, so it may be more consistent to just allow everything to be ejected
> > regardless of whether or not it has a scan handler.  
> 
> Agreed.
> 
> > Second, I guess the
> > removal is undesirable for memory devices for which the registration of the
> > scan handler failed, so it would be good to fail the "offline" of such devices
> > regardless of how we get there.  That's why I thought it would be good to have
> > an "offline disabled" flag in struct acpi_device.
> 
> I see.  But when attach() failed, the memory device may not be used by
> the kernel.  So, I think it should be safe to remove it.

The failure of .attach() need not mean that the memory is not used by the
kernel, though, and the ACPI device object is still there and still may
be involved in some removal scenarios (through its parent).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ