lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZQnBoPqMGhtLxnJ@elver.google.com>
Date:   Tue, 16 Nov 2021 22:47:50 +0100
From:   Marco Elver <elver@...gle.com>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Mina Almasry <almasrymina@...gle.com>, paulmck@...nel.org,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shuah Khan <shuah@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        David Rientjes <rientjes@...gle.com>,
        Jue Wang <juew@...gle.com>, Yang Yao <ygyao@...gle.com>,
        Joanna Li <joannali@...gle.com>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        Linux Memory Management List <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, kasan-dev@...glegroups.com
Subject: Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file

On Tue, Nov 16, 2021 at 12:59PM -0800, Shakeel Butt wrote:
> On Tue, Nov 16, 2021 at 12:48 PM Mina Almasry <almasrymina@...gle.com> wrote:
[...]
> > > Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it,
> > > and at least relieve you to not worry about it (and shift the burden to
> > > WRITE_ONCE's implementation).
> > >
> >
> > Thank you very much for the detailed response. I can add READ_ONCE()
> > at the no-lock read site, that is no issue.
> >
> > However, for the writes that happen while holding the lock, the write
> > is like so:
> > +               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages;
> >
> > And like so:
> > +               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages;
> >
> > I.e. they are increments/decrements. Sorry if I missed it but I can't
> > find an INC_ONCE(), and it seems wrong to me to do something like:
> >
> > +               WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx],
> > +
> > h_cg->nodeinfo[page_to_nid(page)] + nr_pages);

>From what I gather there are no concurrent writers, right?

WRITE_ONCE(a, a + X) is perfectly fine. What it says is that you can
have concurrent readers here, but no concurrent writers (and KCSAN will
still check that). Maybe we need a more convenient macro for this idiom
one day..

Though I think for something like

	h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages;

it seems there wants to be an temporary long* so that you could write
WRITE_ONCE(*usage, *usage + nr_pages) or something.

> > I know we're holding a lock anyway so there is no race, but to the
> > casual reader this looks wrong as there is a race between the fetch of
> > the value and the WRITE_ONCE(). What to do here? Seems to me the most
> > reasonable thing to do is just READ_ONCE() and leave the write plain?
> >
> >
> 
> How about atomic_long_t?

That would work of course; if this is very hot path code it might be
excessive if you don't have concurrent writers.

Looking at the patch in more detail, the counter is a stat counter that
can be read from a stat file, correct? Hypothetically, what would happen
if the reader of 'usage' reads approximate values?

If the answer is "nothing", then this could classify as an entirely
"benign" data race and you could only use data_race() on the reader and
leave the writers unmarked using normal +=/-=. Check if it fits
"Data-Racy Reads for Approximate Diagnostics" [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt#n74

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ