lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110325174831.GB19214@dirshya.in.ibm.com>
Date:	Fri, 25 Mar 2011 23:18:31 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Len Brown <lenb@...nel.org>
Cc:	Trinabh Gupta <trinabh@...ux.vnet.ibm.com>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	peterz@...radead.org, suresh.b.siddha@...el.com,
	benh@...nel.crashing.org, venki@...gle.com,
	Andi Kleen <ak@...ux.intel.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH V1 1/2] cpuidle: Data structure changes for global
 cpuidle device

* Len Brown <lenb@...nel.org> [2011-03-25 04:12:03]:

> I agree it is silly to allocate a cpuidle_device
> for every cpu in the system as we do today.
> 
> Yes, splitting the counters out of cpuidle_device
> is a necessary part of fixing that.
> 
> However, cpuidle_device.cpuidle_state[] is currently not per-driver,
> it is per-cpu, and it is writable.
> 
> In particular, the cpuidle_device->prepare() mechanism
> causes updates to the cpuidle_state[].flags,
> setting and clearing CPUIDLE_FLAG_IGNORE to
> tell the governor not to chose a state
> on a per-cpu basis at run-time.
> 
> I don't like that mechanism.
> I'd like to see it replaced, and when replaced,
> cpuidle_state[] can be per system-wide driver.

Thanks for the detailed review.  I agree that we should rework
handling of the cpuidle_state[].flags.  However, is the prepare()
mechanism used at all?  Can we remove the option completely?

> I think the real problem that prepare() was trying to solve
> is that the driver today does not have the ability to over-rule
> the choice made by the governor.  The driver may discover
> in the course of trying to satisfy the request of the governor
> that it needs to demote to a shallower state; or it may
> do its best to satisfy the governor's request, and the hardware
> may demote its request to a shallower state.
> 
> Unfortunately, when this happens, the driver dutifully
> returns the time spent in the state to cpuidle_idle_call(),
> who then updates the wrong last_residency, time, and usage counters.

I did not get this scenario.  Are you saying 

target_state->enter(dev, target_state) can enter a different state
than the one suggested by target_state?  

I understand the hardware demotion part, but can we really detect the
target 'demoted' state in that case?  I guess not.

> Sure is ironic for the driver to allocate the data structures and
> then hand the timer to the uppper layer, just to have the upper layer
> update the wrong data structures...
> 
> Surely the driver enter routine should update the counters
> that the driver was obligated to allocate, and it should return
> the state actually entered (for tracing), rather than the time spent
> there.

Can we do something like this:

last_state = target_state->enter(dev, target_state)

dev->last_state and dev->last_residency are updated inside
target_state->enter() 

The returned last_state is just for tracing, actual data is already
updated in the cpuidle_dev structure and used for sysfs display.

> The generic cpuidle code should simply handle where the counters live
> in the sysfs namespace, not updating the counters.
> This needs to be addressed before cpuidle_device.cpuidle_state[]
> can be made one/system.

Agreed.

Thanks again for the recommendations.

--Vaidy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ