lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 29 Oct 2014 19:03:20 -0700
From:	Neil Zhang <zhangwm@...vell.com>
To:	Dan Streetman <ddstreet@...e.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>
CC:	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"nfont@...ux.vnet.ibm.com" <nfont@...ux.vnet.ibm.com>
Subject: RE: [PATCH V2] Driver cpu: update online when cpu_up/down besides
 sysfs

Dan,

> -----Original Message-----
> From: ddstreet@...il.com [mailto:ddstreet@...il.com] On Behalf Of Dan
> Streetman
> Sent: 2014年10月30日 5:52
> To: Rafael J. Wysocki
> Cc: Rafael J. Wysocki; Neil Zhang; linux-kernel; Greg Kroah-Hartman;
> nfont@...ux.vnet.ibm.com
> Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides
> sysfs
> 
> On Wed, Oct 29, 2014 at 5:46 PM, Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> > On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
> >> On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
> >> <rafael.j.wysocki@...el.com> wrote:
> >> > On 10/27/2014 3:59 AM, Neil Zhang wrote:
> >> >>
> >> >> The current per-cpu offline info won't be updated when we use any
> >> >> other method besides sysfs to call cpu_up/down.
> >> >> Thus the cpu/online can't reflect the real online status.
> >> >>
> >> >> This patch is going to fix the issue introduced by commit
> >> >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> >> >> Use generic offline/online for CPU offline/online)
> >> >>
> >> >> CC: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >> >> Tested-by: Dan Streetman <ddstreet@...e.org>
> >> >> Signed-off-by: Neil Zhang <zhangwm@...vell.com>
> >> >
> >> >
> >> > Oh dear, no.
> >> >
> >> > Please first tell me what exactly the problem you're seeing is.
> >>
> >> For some background, here is my last comment on the first email thread on
> this:
> >> https://lkml.org/lkml/2014/10/27/595
> >>
> >> I didn't create this patch, but the problem essentially is that
> >> before your commit the individual cpu online nodes
> >> (/sys/devices/system/cpu/cpuN/online) stayed in sync during
> >> cpu_down/up, because they used the cpu_online_mask; while after the
> >> commit, they are tracked by the cpu's generic dev->offline flag,
> >> which isn't updated during cpu_down/up.
> >
> > Which is not triggered from sysfs.
> >
> >> So now, any place in the kernel
> >> that brings a cpu up or down must also update the cpu->dev->offline
> >> flag.
> >
> > Not any place.  In particular, system suspend-resume doesn't need to
> > do that, because it takes CPUs offline and then brings them back
> > online.
> >
> > If there's a place in the kernel where CPUs are taken offline and left
> > in that state, then it needs to be updated.
> 
> The only place I know of is ppc's dlpar code, as mentioned below.
> 
> Neil, as you crafted the original patch, I assume you know of some other place
> in the kernel doing cpu_up/down directly, where you're seeing this problem?
> 

As I replied to Rafael many ARM SoCs will use an in kernel profiler to romove/add 
a core via cpu_up /down for power and performance consideration.
But actually we can fix these issues as Rafael suggested.
Thanks for the discussion in these days.

> >
> >> My interest in the patch was coincidental because I was seeing the
> >> same problem when using dlpar operations to hotplug cpus, which uses
> >> the arch/powerpc/platform/pseries/dlpar.c code; that code brings a
> >> cpu offline when it's hot-removed (and the cpu online when it's
> >> hot-added), but it hasn't been changed to also update the cpu's
> >> dev->offline flag.
> >
> > It should be modified to do that.
> >
> >> As I said in the previous email to the first thread, the ppc dlpar
> >> operation might be changed in the future to fully unregister a cpu
> >> when it's hot-removed, which would remove the entire sysfs cpuN
> >> directory.  Alternately and/or until then, it could be updated to
> >> simply update the cpu'd dev->offline flag (that's what I originally
> >> did for my own testing).  However, without a central place to update
> >> the cpu's dev->offline field, like this, or possibly in
> >> set_cpu_online(), or elsewhere during cpu_down/up, each place in the
> >> kernel that calls cpu_down() or cpu_up() also needs to update the
> >> dev->offline flag.  It's possible that the ppc dlpar code is the only
> >> place in the kernel that has this problem; I haven't searched.
> >
> > It is quite likely to be the only place like that.
> >
> > While I'm not familiar with the code in question, the most
> > straightforward way to fix the problem would be to replace cpu_down()
> > in there with device_offline(get_cpu_device(cpu)), but that needs to
> > be called under device_hotplug_lock.
> 
> Ok, will do.
> 
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.

Best Regards,
Neil Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ