[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ced9786a-b8ac-2575-02b0-04323c83ca4e@intel.com>
Date: Mon, 8 Nov 2021 12:56:21 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Jarkko Sakkinen <jarkko@...nel.org>
CC: <dave.hansen@...ux.intel.com>, <tglx@...utronix.de>,
<bp@...en8.de>, <mingo@...hat.com>, <linux-sgx@...r.kernel.org>,
<x86@...nel.org>, <seanjc@...gle.com>, <tony.luck@...el.com>,
<hpa@...or.com>, <linux-kernel@...r.kernel.org>,
<stable@...r.kernel.org>
Subject: Re: [PATCH] x86/sgx: Fix free page accounting
Hi Jarkko,
On 11/8/2021 12:12 PM, Jarkko Sakkinen wrote:
> On Mon, Nov 08, 2021 at 11:48:18AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 11/7/2021 8:47 AM, Jarkko Sakkinen wrote:
>>> On Sun, 2021-11-07 at 18:45 +0200, Jarkko Sakkinen wrote:
>>>> On Thu, 2021-11-04 at 11:28 -0700, Reinette Chatre wrote:
>>>>> The consequence of sgx_nr_free_pages not being protected is that
>>>>> its value may not accurately reflect the actual number of free
>>>>> pages on the system, impacting the availability of free pages in
>>>>> support of many flows. The problematic scenario is when the
>>>>> reclaimer never runs because it believes there to be sufficient
>>>>> free pages while any attempt to allocate a page fails because there
>>>>> are no free pages available. The worst scenario observed was a
>>>>> user space hang because of repeated page faults caused by
>>>>> no free pages ever made available.
>>>>
>>>> Can you go in detail with the "concrete scenario" in the commit
>>>> message? It does not have to describe all the possible scenarios
>>>> but at least one sequence of events.
>>
>>
>> I provided significant detail regarding the "concrete scenario" in a
>> separate response to Greg:
>> https://lore.kernel.org/lkml/a636290d-db04-be16-1c86-a8dcc3719b39@intel.com/
>>
>> That message details the test that was run (the test hangs before the fix
>> and can complete after the fix), the traces captured at the time the test
>> hung, analysis of the traces with root cause of why the system is hung,
>> traces after fix applied demonstrating why user space is able to make
>> progress and explaining why the test can complete.
>
> For me that sequence looks like something that you could "abstract"
> a bit and get a rough description of the concurrency scenario.
>
> It is as important in this type of patch, as the code change itself,
> not least because it helps with maintaining in the future to have
> that info in some level of detail in the commit log.
My apologies. I understood your comment to be a concern with the change
itself instead of just the commit message. I will add more detail about
the failing scenario encountered to the commit message.
Reinette
Powered by blists - more mailing lists