lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Jul 2021 11:40:45 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Dennis Zhou <dennis@...nel.org>
Cc:     Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] percpu fixes for v5.14-rc1

On Wed, Jul 7, 2021 at 6:00 AM Dennis Zhou <dennis@...nel.org> wrote:
>
> This is just a single change to fix percpu depopulation. The code relied
> on depopulation code written specifically for the free path and relied
> on vmalloc to do the tlb flush lazily. As we're modifying the backing
> pages during the lifetime of a chunk, we need to also flush the tlb
> accordingly.

I pulled this, but I ended up unpulling after looking at the fix.

The fix may be perfectly correct, but I'm looking at that
pcpu_reclaim_populated() function, and I want somebody to explain to
me what it's ok to drop and re-take the 'pcpu_lock' and just continue.

Because whatever it was protecting is now not protected any more.

It *looks* like it's intended to protect the pcpu_chunk_lists[]
content, and some other functions that do this look ok. So for
example, pcpu_balance_free() at least removes the 'chunk' from the
pcpu_chunk_lists[] before it drops the lock and then works on the
chunk contents.

But pcpu_reclaim_populated() seems to *leave* chunk on the
pcpu_chunk_lists[], drop the lock, and then continue to use 'chunk'.

That odd "release lock and continue to use the data it's supposed to
protect" seems to be pre-existing, but

 (a) this is the code that caused problems to begin with

and

 (b) it seems to now happen even more.

So maybe this code is right. But it looks very odd to me, and I'd like
to get more explanations of _why_ it would be ok before I pull this
fix, since there seems to be a deeper underlying problem in the code
that this tries to fix.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ