lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYmEwobYw+jGBSwV@iki.fi>
Date:   Mon, 8 Nov 2021 22:12:50 +0200
From:   Jarkko Sakkinen <jarkko@...nel.org>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     dave.hansen@...ux.intel.com, tglx@...utronix.de, bp@...en8.de,
        mingo@...hat.com, linux-sgx@...r.kernel.org, x86@...nel.org,
        seanjc@...gle.com, tony.luck@...el.com, hpa@...or.com,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] x86/sgx: Fix free page accounting

On Mon, Nov 08, 2021 at 11:48:18AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 11/7/2021 8:47 AM, Jarkko Sakkinen wrote:
> > On Sun, 2021-11-07 at 18:45 +0200, Jarkko Sakkinen wrote:
> > > On Thu, 2021-11-04 at 11:28 -0700, Reinette Chatre wrote:
> > > > The consequence of sgx_nr_free_pages not being protected is that
> > > > its value may not accurately reflect the actual number of free
> > > > pages on the system, impacting the availability of free pages in
> > > > support of many flows. The problematic scenario is when the
> > > > reclaimer never runs because it believes there to be sufficient
> > > > free pages while any attempt to allocate a page fails because there
> > > > are no free pages available. The worst scenario observed was a
> > > > user space hang because of repeated page faults caused by
> > > > no free pages ever made available.
> > > 
> > > Can you go in detail with the "concrete scenario" in the commit
> > > message? It does not have to describe all the possible scenarios
> > > but at least one sequence of events.
> 
> 
> I provided significant detail regarding the "concrete scenario" in a
> separate response to Greg:
> https://lore.kernel.org/lkml/a636290d-db04-be16-1c86-a8dcc3719b39@intel.com/
> 
> That message details the test that was run (the test hangs before the fix
> and can complete after the fix), the traces captured at the time the test
> hung, analysis of the traces with root cause of why the system is hung,
> traces after fix applied demonstrating why user space is able to make
> progress and explaining why the test can complete.

For me that sequence looks like something that you could "abstract"
a bit and get a rough description of the concurrency scenario.

It is as important in this type of patch, as the code change itself,
not least because it helps with maintaining in the future to have
that info in some level of detail in the commit log.

/Jarkko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ