lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 8 Nov 2021 12:56:21 -0800
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Jarkko Sakkinen <jarkko@...nel.org>
CC:     <dave.hansen@...ux.intel.com>, <tglx@...utronix.de>,
        <bp@...en8.de>, <mingo@...hat.com>, <linux-sgx@...r.kernel.org>,
        <x86@...nel.org>, <seanjc@...gle.com>, <tony.luck@...el.com>,
        <hpa@...or.com>, <linux-kernel@...r.kernel.org>,
        <stable@...r.kernel.org>
Subject: Re: [PATCH] x86/sgx: Fix free page accounting

Hi Jarkko,

On 11/8/2021 12:12 PM, Jarkko Sakkinen wrote:
> On Mon, Nov 08, 2021 at 11:48:18AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 11/7/2021 8:47 AM, Jarkko Sakkinen wrote:
>>> On Sun, 2021-11-07 at 18:45 +0200, Jarkko Sakkinen wrote:
>>>> On Thu, 2021-11-04 at 11:28 -0700, Reinette Chatre wrote:
>>>>> The consequence of sgx_nr_free_pages not being protected is that
>>>>> its value may not accurately reflect the actual number of free
>>>>> pages on the system, impacting the availability of free pages in
>>>>> support of many flows. The problematic scenario is when the
>>>>> reclaimer never runs because it believes there to be sufficient
>>>>> free pages while any attempt to allocate a page fails because there
>>>>> are no free pages available. The worst scenario observed was a
>>>>> user space hang because of repeated page faults caused by
>>>>> no free pages ever made available.
>>>>
>>>> Can you go in detail with the "concrete scenario" in the commit
>>>> message? It does not have to describe all the possible scenarios
>>>> but at least one sequence of events.
>>
>>
>> I provided significant detail regarding the "concrete scenario" in a
>> separate response to Greg:
>> https://lore.kernel.org/lkml/a636290d-db04-be16-1c86-a8dcc3719b39@intel.com/
>>
>> That message details the test that was run (the test hangs before the fix
>> and can complete after the fix), the traces captured at the time the test
>> hung, analysis of the traces with root cause of why the system is hung,
>> traces after fix applied demonstrating why user space is able to make
>> progress and explaining why the test can complete.
> 
> For me that sequence looks like something that you could "abstract"
> a bit and get a rough description of the concurrency scenario.
> 
> It is as important in this type of patch, as the code change itself,
> not least because it helps with maintaining in the future to have
> that info in some level of detail in the commit log.

My apologies. I understood your comment to be a concern with the change 
itself instead of just the commit message. I will add more detail about 
the failing scenario encountered to the commit message.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ