lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4iAbWnWUT2d2VhnvuHvJE0-Vxgbf1TYtOPjkR6j3qROtw@mail.gmail.com>
Date:   Wed, 19 Jun 2019 14:53:54 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Bjorn Helgaas <bhelgaas@...gle.com>
Cc:     Nadav Amit <namit@...are.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Borislav Petkov <bp@...e.de>,
        Toshi Kani <toshi.kani@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        "Kleen, Andi" <andi.kleen@...el.com>
Subject: Re: [PATCH 3/3] resource: Introduce resource cache

[ add Andi ]

On Wed, Jun 19, 2019 at 6:00 AM Bjorn Helgaas <bhelgaas@...gle.com> wrote:
>
> On Tue, Jun 18, 2019 at 12:40 AM Nadav Amit <namit@...are.com> wrote:
> >
> > > On Jun 17, 2019, at 10:33 PM, Nadav Amit <namit@...are.com> wrote:
> > >
> > >> On Jun 17, 2019, at 9:57 PM, Andrew Morton <akpm@...ux-foundation.org> wrote:
> > >>
> > >> On Wed, 12 Jun 2019 21:59:03 -0700 Nadav Amit <namit@...are.com> wrote:
> > >>
> > >>> For efficient search of resources, as needed to determine the memory
> > >>> type for dax page-faults, introduce a cache of the most recently used
> > >>> top-level resource. Caching the top-level should be safe as ranges in
> > >>> that level do not overlap (unlike those of lower levels).
> > >>>
> > >>> Keep the cache per-cpu to avoid possible contention. Whenever a resource
> > >>> is added, removed or changed, invalidate all the resources. The
> > >>> invalidation takes place when the resource_lock is taken for write,
> > >>> preventing possible races.
> > >>>
> > >>> This patch provides relatively small performance improvements over the
> > >>> previous patch (~0.5% on sysbench), but can benefit systems with many
> > >>> resources.
> > >>
> > >>> --- a/kernel/resource.c
> > >>> +++ b/kernel/resource.c
> > >>> @@ -53,6 +53,12 @@ struct resource_constraint {
> > >>>
> > >>> static DEFINE_RWLOCK(resource_lock);
> > >>>
> > >>> +/*
> > >>> + * Cache of the top-level resource that was most recently use by
> > >>> + * find_next_iomem_res().
> > >>> + */
> > >>> +static DEFINE_PER_CPU(struct resource *, resource_cache);
> > >>
> > >> A per-cpu cache which is accessed under a kernel-wide read_lock looks a
> > >> bit odd - the latency getting at that rwlock will swamp the benefit of
> > >> isolating the CPUs from each other when accessing resource_cache.
> > >>
> > >> On the other hand, if we have multiple CPUs running
> > >> find_next_iomem_res() concurrently then yes, I see the benefit.  Has
> > >> the benefit of using a per-cpu cache (rather than a kernel-wide one)
> > >> been quantified?
> > >
> > > No. I am not sure how easy it would be to measure it. On the other hander
> > > the lock is not supposed to be contended (at most cases). At the time I saw
> > > numbers that showed that stores to “exclusive" cache lines can be as
> > > expensive as atomic operations [1]. I am not sure how up to date these
> > > numbers are though. In the benchmark I ran, multiple CPUs ran
> > > find_next_iomem_res() concurrently.
> > >
> > > [1] http://sigops.org/s/conferences/sosp/2013/papers/p33-david.pdf
> >
> > Just to clarify - the main motivation behind the per-cpu variable is not
> > about contention, but about the fact the different processes/threads that
> > run concurrently might use different resources.
>
> IIUC, the underlying problem is that dax relies heavily on ioremap(),
> and ioremap() on x86 takes too long because it relies on
> find_next_iomem_res() via the __ioremap_caller() ->
> __ioremap_check_mem() -> walk_mem_res() path.
>
> The fact that x86 is the only arch that does this much work in
> ioremap() makes me wonder.  Is there something unique about x86
> mapping attributes that requires this extra work, or is there some way
> this could be reworked to avoid searching the resource map in the
> first place?

The underlying issue is that the x86-PAT implementation wants to
ensure that conflicting mappings are not set up for the same physical
address. This is mentioned in the developer manuals as problematic on
some cpus. Andi, is lookup_memtype() and track_pfn_insert() still
relevant?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ