linux-kernel - Re: [PATCH v3 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160107115531.34279a9b@icelake>
Date:	Thu, 7 Jan 2016 11:55:31 -0800
From:	Jacob Pan <jacob.jun.pan@...ux.intel.com>
To:	Petr Mladek <pmladek@...e.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>, Tejun Heo <tj@...nel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jiri Kosina <jkosina@...e.cz>, Borislav Petkov <bp@...e.de>,
	Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
	Vlastimil Babka <vbabka@...e.cz>, linux-api@...r.kernel.org,
	linux-kernel@...r.kernel.org, Zhang Rui <rui.zhang@...el.com>,
	Eduardo Valentin <edubezval@...il.com>,
	linux-pm@...r.kernel.org, jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH v3 22/22] thermal/intel_powerclamp: Convert the kthread
 to kthread worker API

On Wed, 18 Nov 2015 14:25:27 +0100
Petr Mladek <pmladek@...e.com> wrote:

> From: Petr Mladek <pmladek@...e.com>
> To: Andrew Morton <akpm@...ux-foundation.org>, Oleg Nesterov
> <oleg@...hat.com>, Tejun Heo <tj@...nel.org>, Ingo Molnar
> <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org> Cc: Steven
> Rostedt <rostedt@...dmis.org>, "Paul E. McKenney"
> <paulmck@...ux.vnet.ibm.com>, Josh Triplett <josh@...htriplett.org>,
> Thomas Gleixner <tglx@...utronix.de>, Linus Torvalds
> <torvalds@...ux-foundation.org>, Jiri Kosina <jkosina@...e.cz>,
> Borislav Petkov <bp@...e.de>, Michal Hocko <mhocko@...e.cz>,
> linux-mm@...ck.org, Vlastimil Babka <vbabka@...e.cz>,
> linux-api@...r.kernel.org, linux-kernel@...r.kernel.org, Petr Mladek
> <pmladek@...e.com>, Zhang Rui <rui.zhang@...el.com>, Eduardo Valentin
> <edubezval@...il.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>,
> linux-pm@...r.kernel.org Subject: [PATCH v3 22/22]
> thermal/intel_powerclamp: Convert the kthread to kthread worker API
> Date: Wed, 18 Nov 2015 14:25:27 +0100 X-Mailer: git-send-email 1.8.5.6
> 
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
> 
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
> 
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
> 
> This patch converts the intel powerclamp kthreads into the kthread
> worker because they need to have a good control over the assigned
> CPUs.
> 
I have tested this patchset and found no obvious issues in terms of
functionality, power and performance. Tested CPU online/offline,
suspend resume, freeze etc.
Power numbers are comparable too. e.g. on IVB 8C system. Inject idle
from 5 to 50% and read package power while running CPU bound workload.

Before:
IdlePct    Perf    RAPL    WallPower                               
5 256.28 16.50 0.0                                                 
10 248.86 15.64 0.0                                                
15 209.01 14.57 0.0                                                
20 176.17 13.88 0.0                                                
25 161.25 13.37 0.0                                                
30 165.62 13.38 0.0                                                
35 150.94 12.89 0.0                                                
40 137.45 12.47 0.0                                                
45 123.80 11.83 0.0                                                
50 137.59 11.80 0.0                                                

After:

(deb_chroot)root@...ntu-jp-nfs:~/powercap-power# ./test.py -c 5
IdlePct	Perf	RAPL	WallPower
5 266.30 16.34 0.0
10 226.32 15.27 0.0
15 195.52 14.29 0.0
20 200.96 13.98 0.0
25 174.77 13.08 0.0
30 162.05 13.04 0.0
35 166.70 12.90 0.0
40 134.78 12.12 0.0
45 128.08 11.70 0.0
50 117.74 11.74 0.0    



> IMHO, the most natural way is to split one cycle into two works.
> First one does some balancing and let the CPU work normal
> way for some time. The second work checks what the CPU has done
> in the meantime and put it into C-state to reach the required
> idle time ratio. The delay between the two works is achieved
> by the delayed kthread work.
> 
> The two works have to share some data that used to be local
> variables of the single kthread function. This is achieved
> by the new per-CPU struct kthread_worker_data. It might look
> as a complication. On the other hand, the long original kthread
> function was not nice either.
> 
> The patch tries to avoid extra init and cleanup works. All the
> actions might be done outside the thread. They are moved
> to the functions that create or destroy the worker. Especially,
> I checked that the timers are assigned to the right CPU.
> 
> The two works are queuing each other. It makes it a bit tricky to
> break it when we want to stop the worker. We use the global and
> per-worker "clamping" variables to make sure that the re-queuing
> eventually stops. We also cancel the works to make it faster.
> Note that the canceling is not reliable because the handling
> of the two variables and queuing is not synchronized via a lock.
> But it is not a big deal because it is just an optimization.
> The job is stopped faster than before in most cases.
I am not convinced this added complexity is necessary, here are my
concerns by breaking down into two work items.
- overhead of queuing, per cpu data as you already mentioned.
- since we need to have very tight timing control, two items may limit
  our turnaround time. Wouldn't it take one extra tick for the scheduler
  to run the balance work then add delay? as opposed to just
  schedule_timeout()?
- vulnerable to future changes of queuing work

Jacob