linux-kernel - Re: [PATCH v2] x86/resctrl: Clear the stale staged config after the configuration is completed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad08eee6-cfea-3858-0def-e2e3fef315fb@linux.alibaba.com>
Date:   Thu, 20 Oct 2022 13:55:52 +0800
From:   Shawn Wang <shawnwang@...ux.alibaba.com>
To:     Reinette Chatre <reinette.chatre@...el.com>, fenghua.yu@...el.com
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
        James Morse <james.morse@....com>, jamie@...iainc.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86/resctrl: Clear the stale staged config after the
 configuration is completed

Hi Reinette,

Sorry for replying now due to other things.

On 10/12/2022 7:48 AM, Reinette Chatre wrote:
> Hi Shawn,
> 
> Thank you very much for working on getting this fixed.
> 
> On 10/9/2022 1:36 AM, Shawn Wang wrote:
>> As a temporary storage, array staged_config in struct rdt_domain is not
> 
> staged_config -> staged_config[]
> 
>> cleared after it has been used. The stale value in staged_config could
> staged_config -> staged_config[]
> (please make the above change in rest of changelog, doing so makes it
> easier to read)
> 
>> cause a MSR access error.
> 
> a MSR -> an MSR
> 
>>
>> If resctrl is mounted with CDP enabled and then remounted with CDP
>> disabled, the value of staged_config in domainX via mkdir changes as
>> follows:
>>
>> CDP enabled:
>> mkdir <resource control group>
>>      domainX.staged_config[CDP_NONE].have_new_ctrl = false
>>      domainX.staged_config[CDP_NONE].new_ctrl = 0
>>      domainX.staged_config[CDP_CODE].have_new_ctrl = true
>>      domainX.staged_config[CDP_CODE].new_ctrl = <default mask>
>>      domainX.staged_config[CDP_DATA].have_new_ctrl = true
>>      domainX.staged_config[CDP_CODE].new_ctrl = <default mask>
> 
> Apologies, you copied a copy&paste error from me ... the last one
> should be CDP_DATA.
> 
>>
>> unmount/remount resctrl (CDP disabled):
>> mkdir <resource control group>
>>      domainX.staged_config[CDP_NONE].have_new_ctrl = true
>>      domainX.staged_config[CDP_NONE].new_ctrl = <default mask>
>>      domainX.staged_config[CDP_CODE].have_new_ctrl = true      /* stale */
>>      domainX.staged_config[CDP_CODE].new_ctrl = <default mask> /* stale */
>>      domainX.staged_config[CDP_DATA].have_new_ctrl = true      /* stale */
>>      domainX.staged_config[CDP_CODE].new_ctrl = <default mask> /* stale */
> 
> (same typo here)
> 
>>
>> When creating a new resource control group, hardware will be configured by
>> resctrl_arch_update_domains():
>> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
>>
>> Since resctrl_arch_update_domains() iterates and updates all
>> resctrl_conf_type whose have_new_ctrl is true, it will continue to update
>> the stale CDP_CODE and CDP_DATA configurations when CDP is disabled.
>>
>> Based on the above analysis, an error can be reproduced on a system with
>> 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the
>> number of CLOSIDs for MB is less than 16.) :
>>      mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>      mkdir /sys/fs/resctrl/p{1..7}
>>      umount /sys/fs/resctrl/
>>      mount -t resctrl resctrl /sys/fs/resctrl
>>      mkdir /sys/fs/resctrl/p{1..8}
>>
>> dmesg will generate the following error:
> 
> I think it would be helpful to connect this reproducer to the
> explanation of what happens. Maybe just something like:
> 
> "Upon creating directory "p8", to which CLOSID 8 is automatically
>   assigned, the following error is generated:"
> 
> Please consider what is the most useful information found in the backtrace
> and only include that. Interesting enough, there is a very related example
> of what information is useful in Documentation/process/submitting-patches.rst
> (section "Backtraces in commit messages").
> 
>>      [ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
>>      0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
>>      [ 6180.951983] Call Trace:
>>      [ 6180.954516]  <IRQ>
>>      [ 6180.956619]  __flush_smp_call_function_queue+0x11d/0x170
>>      [ 6180.962028]  __sysvec_call_function+0x24/0xd0
>>      [ 6180.966485]  sysvec_call_function+0x89/0xc0
>>      [ 6180.970760]  </IRQ>
>>      [ 6180.972947]  <TASK>
>>      [ 6180.975131]  asm_sysvec_call_function+0x16/0x20
>>      [ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
>>      [ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
>>      ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
>>      f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
>>      [ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
>>      [ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
>>      [ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
>>      [ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
>>      [ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
>>      [ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
>>      [ 6181.045202]  ? cpuidle_enter_state+0xb2/0x400
>>      [ 6181.049678]  cpuidle_enter+0x24/0x40
>>      [ 6181.053370]  do_idle+0x1dd/0x260
>>      [ 6181.056713]  cpu_startup_entry+0x14/0x20
>>      [ 6181.060753]  rest_init+0xbb/0xc0
>>      [ 6181.064097]  arch_call_rest_init+0x5/0xa
>>      [ 6181.068137]  start_kernel+0x668/0x691
>>      [ 6181.071914]  secondary_startup_64_no_verify+0xe0/0xeb
>>      [ 6181.077086]  </TASK>
>>
>> As Reinette Chatre explained:
>> https://lore.kernel.org/all/2728c354-ac75-be4c-66ad-86ebd9c50248@intel.com/
>> "
>>   The value of interest here is the register it tries to write to ... 0xca0.
>>   On a system with 16 CLOSIDs the range of registers available to set the
>>   CBM would be 0xc90 to 0xc9f that corresponds to CLOSID 0 to CLOSID 15.
>>   The error is an attempt to write to an unsupported register - there
>>   appears to have been an attempt to configure non-existent CLOSID 16.
>>
>>   Above becomes an issue when the resource group being created is for a
>>   CLOSID # that is more than half of the CLOSIDs supported. In the
>>   reproducer the issue was encountered when creating resource group for
>>   CLOSID 8 on a system that supports 16 CLOSIDs.
>>
>>   In this case get_config_index() called from resctrl_arch_update_domains()
>>   will return 16 and 17 when processing this resource group and that
>>   translated to an invalid register - 0xca0 in this scenario.
>> "
>>
> 
> I am not sure if adding this full quote is necessary to explain the issue.
> Maybe you could just summarize it and move it to before the "Based on the
> above analysis ..."
> 

Thank you very much for your suggestion, I will revise it in the new 
version.

>> Fix this issue by clearing the staged configs when the configuration is
>> completed.
>>
>> Fixes: 75408e43509ed ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
>> Cc: <stable@...r.kernel.org>
>> Signed-off-by: Shawn Wang <shawnwang@...ux.alibaba.com>
>> Suggested-by: Xin Hao <xhao@...ux.alibaba.com>
>> ---
>> Changes since v1:
>> - Move the clearing from schemata_list_destroy() to resctrl_arch_update_domains().
>> - Update the commit message suggested by Reiniette Chatre.
>> - Add stable tag suggested by James Morse.
>> ---
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 1dafbdc5ac31..2c719da5544f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -338,6 +338,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
>>   				msr_param.high = max(msr_param.high, idx + 1);
>>   			}
>>   		}
>> +		/* Clear the stale staged config */
>> +		memset(d->staged_config, 0, sizeof(d->staged_config));
>>   	}
>>   
>>   	if (cpumask_empty(cpu_mask))
> 
> Please also ensure that the temporary storage is cleared if there is an
> early exist because of failure. Please do not duplicate the memset() code
> but instead move it to a common exit location.
> 

There are two different resctrl_arch_update_domains() function call paths:

1.rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
2.rdtgroup_schemata_write()->resctrl_arch_update_domains()

Perhaps there is no common exit location if we want to clear 
staged_config[] after every call of resctrl_arch_update_domains().

How about doing the cleanup at the end of rdtgroup_schemata_write() and 
rdtgroup_mkdir_ctrl_mon() respectively?

Thank you,

Shawn