lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHEsqG00UW8DfsCW3t8jyKXXwCcUxaom=t5uAoeeXFuWzg@mail.gmail.com>
Date: Mon, 12 Aug 2024 18:30:22 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, Liam.Howlett@...cle.com, pedro.falcato@...il.com, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into
 the struct

On Mon, Aug 12, 2024 at 5:27 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Sun, Aug 11, 2024 at 9:29 PM Mateusz Guzik <mjguzik@...il.com> wrote:
> > That aside as I mentioned earlier the dedicated vma lock cache results
> > in false sharing between separate vmas, except this particular
> > benchmark does not test for it (which in your setup should be visible
> > even if the cache grows the  SLAB_HWCACHE_ALIGN flag).
>
> When implementing VMA locks I did experiment with SLAB_HWCACHE_ALIGN
> for vm_lock cache using different benchmarks and didn't see
> improvements above noise level. Do you know of some specific benchmark
> that would possibly show improvement?
>

I don't know anything specific, I'm saying basic multicore hygiene
says these locks need to land in dedicated cachelines.

Consider the following: struct rw_semaphore is 40 bytes and the word
modified in the lock/unlock fast path is at offset 0.

I don't know how much waste is there in the allocator, if there is
anything less than 24 bytes (which obviously will be the case) there
will be massive false-sharing. 24 bytes of the *second* lock land in
the same cacheline as the first one, including the stuff which is
modified in the fast path. iow the locks allocated this way are
guaranteed to keep bouncing.

I don't believe any effort is warranted to try to find a real scenario
with this problem or synthetically trying to write one.

> > If there are still problems and the lock needs to remain separate, the
> > bare minimum damage-controlling measure would be to hwalign the vma
> > lock cache -- it wont affect the pts benchmark, but it should help
> > others.
>
> Sure but I'll need to measure the improvement and for that I need a
> banchmark or a workload. Any suggestions?
>

I believe I addressed this above.

If there is an individual who in your opinion is going to protest such
a patch on the grounds that no benchmark is being provided, I can give
them a talking to.

Even then, it may be this bit wont be applicable anyway, so....

> >
> > Should the decision be to bring the lock back into the struct, I'll
> > note my patch is merely slapped together to a state where it can be
> > benchmarked and I have no interest in beating it into a committable
> > shape. You stated you already had an equivalent (modulo keeping
> > something in a space previously occupied by the pointer to the vma
> > lock), so as far as I'm concerned you can submit that with your
> > authorship.
>
> Thanks! If we end up doing that I'll keep you as Suggested-by and will
> add a link to this thread.

sgtm

-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ