linux-kernel - Re: [PATCH] x86/split_lock: fix delayed detection enabling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <06eb4591-7b28-b774-01d0-eb980073d1d3@igalia.com>
Date: Sun, 21 Apr 2024 15:27:37 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To: Maksim Davydov <davydov-max@...dex-team.ru>
Cc: den-plotnikov@...dex-team.ru, x86@...nel.org,
 linux-kernel@...r.kernel.org, dave.hansen@...ux.intel.com,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling

On 19/04/2024 08:26, Maksim Davydov wrote:
> 
> [...]
> Some information that should be taken into account:
> * sld_update_msr() enables/disables SLD on both CPUs on the same core
> * schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT.
>    If a work has the 'pending' status, then schedule_delayed_work_on()
>    will return an error code and, most importantly, the work will not
>    be placed in the workqueue.
> 
> Let's say we have a multicore system on which split_lock_mitigate=0 and
> a multithreaded application is running that calls splitlock in multiple
> threads. Due to the fact that sld_update_msr() affects the entire core
> (both CPUs), we will consider 2 CPUs from different cores. Let the 2
> threads of this application schedule to CPU0 (core 0) and to CPU 2
> (core 1), then:
> 
> |                                 ||                                   |
> |             CPU 0 (core 0)      ||          CPU 2 (core 1)           |
> |_________________________________||___________________________________|
> |                                 ||                                   |
> | 1) SPLIT LOCK occured           ||                                   |
> |                                 ||                                   |
> | 2) split_lock_warn()            ||                                   |
> |                                 ||                                   |
> | 3) sysctl_sld_mitigate == 0     ||                                   |
> |    (work = &sl_reenable)        ||                                   |
> |                                 ||                                   |
> | 4) schedule_delayed_work_on()   ||                                   |
> |    (reenable will be called     ||                                   |
> |     after 2 jiffies on CPU 0)   ||                                   |
> |                                 ||                                   |
> | 5) disable SLD for core 0       ||                                   |
> |                                 ||                                   |
> |    -------------------------    ||                                   |
> |                                 ||                                   |
> |                                 || 6) SPLIT LOCK occured             |
> |                                 ||                                   |
> |                                 || 7) split_lock_warn()              |
> |                                 ||                                   |
> |                                 || 8) sysctl_sld_mitigate == 0       |
> |                                 ||    (work = &sl_reenable,          |
> |                                 ||     the same address as in 3) )   |
> |                                 ||                                   |
> |            2 jiffies            || 9) schedule_delayed_work_on()     |
> |                                 ||    fials because the work is in   | 
> 
> |                                 ||    the pending state since 4).    |
> |                                 ||    The work wasn't placed to the  |
> |                                 ||    workqueue. reenable won't be   |
> |                                 ||    called on CPU 2                |
> |                                 ||                                   |
> |                                 || 10) disable SLD for core 0        |
> |                                 ||                                   |
> |                                 ||     From now on SLD will          |
> |                                 ||     never be reenabled on core 1  |
> |                                 ||                                   |
> |    -------------------------    ||                                   |
> |                                 ||                                   |
> |    11) enable SLD for core 0 by ||                                   |
> |        __split_lock_reenable    ||                                   |
> |                                 ||                                   |
> 
> 
> If the application threads can be scheduled to all processor cores,
> then over time there will be only one core left, on which SLD will be
> enabled and split lock will be able to be detected; and on all other
> cores SLD will be disabled all the time.
> Most likely, this bug has not been noticed for so long because
> sysctl_sld_mitigate default value is 1, and in this case a semaphore
> is used that does not allow 2 different cores to have SLD disabled at
> the same time, that is, strictly only one work is placed in the
> workqueue.
> 

Hi Maksim, this is awesome! Thanks a lot for the diagram, super clear now.

Well, I think you nailed it and we should get the patch merged, right?
I'm not sure if the diagram should be included or not in the commit
message - it's good but big, maybe include a lore archive mentioning the
diagram in a V2?

Cheers,


Guilherme