linux-kernel - [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20161117231435.891545908@linutronix.de>
Date:   Fri, 18 Nov 2016 00:03:21 -0000
From:   Thomas Gleixner <tglx@...utronix.de>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Zhang Rui <rui.zhang@...el.com>,
        Eduardo Valentin <edubezval@...il.com>,
        linux-pm@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
        x86@...nel.org, rt@...utronix.de, Borislav Petkov <bp@...en8.de>
Subject: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and
 locking trainwreck

We solely intended to convert that driver to the hotplug state machine and
stumbled over a large pile of insanities, which are all interwoven with the
package management:

 - The work cancelation code, the thermal zone unregistering, the work code
   and the interrupt notification function are racy against each other and
   against cpu hotplug and module exit. The random locking sprinkeled all
   over the place does not help anything and probably exists to make people
   feel good. The resulting issues (mainly use after free) are probably
   hard to trigger, but they clearly exist

 - Work cancelation in the cpu hotplug code can leave the work marked
   scheduled and the package interrupts disabled forever.

 - Storage for a boolean information whether work is scheduled for a
   package is kept in separate allocated storage, which is resized when the
   number of detected packages grows.

 - Delayed work structs are held in a static percpu storage, which makes no
   sense at all because work is strictly per package.

 - Packages are kept in a list, which must be searched over and over.

Fixing the whole pile of races with a few duct tape fixes was pretty much
impossible, so I decided to do a major rewrite to fix all of the
above. Here are the major changes:

 - Rewrite the offline code with proper locking against interrupts and work
   function and make sure that canceled work is rescheduled if there is
   another online cpu in the package.

 - Use the cpu offline callback on module exit to fix the work cancelation
   race.

 - Move the bool which denotes scheduled work into the package struct
   where it belongs.

 - Move the delayed work struct into the package struct, which is the only
   sensible place to have it and schedule the work on the cpu which is the
   target for the sysfs files as this makes the cancellation and
   rescheduling in the cpu offline path simple.

 - Add a large pile of comments documenting the cpu teardown mechanism

 - Code sanitizing, revamp the horrible name choices plus a general coding
   style cleanup.

   Note, that I did the namespace and code cleanup in the middle of the
   series, because staring at that mess just made my eyes bleeding.

 - Store the package pointers in an array which is allocated at init
   time. Sizing of the array is determined from the topology
   information. That makes the package lookup a simple array lookup.

As a last step the driver is converted to the hotplug state machine.

The total damage is:

 x86_pkg_temp_thermal.c |  590 ++++++++++++++++++++-----------------------------
 1 file changed, 245 insertions(+), 345 deletions(-)

So a net reduction of 100 lines and the compiled size of the driver changes
in the following way:

   text	   data	    bss	    dec
   4370	    549	    128	   5047		Before
   2845	    424	     40	   3309	    	After fixes and cleanup
   2690	    404	     40	   3134		After hotplug conversion

Thanks,

	tglx