lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <db6177a5-1f1-9f2a-3c6e-fbc9563c3e3e@google.com>
Date:   Mon, 28 Mar 2022 20:32:07 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Matthew Wilcox <willy@...radead.org>
cc:     Hugh Dickins <hughd@...gle.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Next Mailing List <linux-next@...r.kernel.org>,
        linux-doc@...r.kernel.org, linux-mm@...ck.org
Subject: Re: linux-next: build warnings after merge of the akpm-current
 tree

On Tue, 29 Mar 2022, Matthew Wilcox wrote:
> On Wed, Feb 09, 2022 at 08:03:26AM -0800, Hugh Dickins wrote:
> > On Wed, 9 Feb 2022, Stephen Rothwell wrote:
> > > include/linux/mm_types.h:272: warning: Function parameter or member '__filler' not described in 'folio'
> > > include/linux/mm_types.h:272: warning: Function parameter or member 'mlock_count' not described in 'folio'
> > 
> > Thank you for including the patches and reporting this, Stephen.
> > Is this a warning you can live with for a week or two?
> > 
> > I've never tried generating htmldocs (I'm tempted just to replace a few
> > "/**"s by "/*"s!), and I'm fairly sure Matthew will have strong feelings
> > about how this new union (or not) will be better foliated - me messing
> > around with doc output here is unlikely to be helpful at this moment.
> 
> I have a substantial question, and then some formatting / appearance
> questions.

I think that substantial question will soon need its own thread,
rather than here in this htmldocs thread.

But while we are here, let's include a link into our previous discussion:
https://lore.kernel.org/linux-mm/3c6097a7-df8c-f39c-36e8-8b5410e76c8a@google.com/

> 
> The first is, what does mlock_count represent for a multi-page folio
> that is partially mlocked?  If you allocate an order-9 page then mmap()
> 13 pages of it and mlockall(), does mlock_count increase by 1, 13 or 512?

You won't like the answer: none of the above; the current answer is 0.

Because before your folio work implementing readahead into compound pages,
we had order-0 pages (mapped by PTEs), and (speaking x86_64 language)
order-9 pages which (typically) get mapped by PMDs.

And Kirill attended to the issue of PTE mappings of huge pages by leaving
them out of the Mlocked accounting, and leaving huge pages mapped that way
on evictable LRUs, so that they could be split under memory pressure.

And I've carried that forward in the mm/munlock series - though I claim
to have simpified and improved it, by leaving PageDoubleMap out of it,
keeping PMD mappings as Mlocked even if there are also PTE mappings.

I think none of us have attended to PageCompound folio mappings changing
the balance of probabilities: they are being left out of the mlocked
accounting just like PTE mappings of THPs are.

If you'd like a number bigger than 0 there, I guess I can add a patch
to, what, not skip PTE mappings of compound pages if !PageSwapBacked?
Although it would be much nicer not to distinguish by PageSwapBacked,
I suspect testing and page reclaim issues require us to make the
distinction, for now at least.

Then the answer to your question would be mlock_count 13
(but Mlocked 2048 kB).

As I said in the linked mail: it doesn't matter at all how you count them
in mlock_count, just so long as they are counted up and down consistently.
Since it's simplest just to count 1 in mlock_count for each PMD or PTE,
I prefer that (as I did with THPs); but if you prefer to count PMDs up
and down by HUGE_PMD_NR, that works too.

mlock_count is just an internal implementation detail.

Mlocked and Unevictable amounts are visible to the user (and to various
tests no doubt) but still just numbers shown; more important is whether
a sparsely mlocked compound page can be split under memory pressure and
its unmlocked portions reclaimed.

I don't know where the folio work stands with splitting (but I think you
have a patch which removes the !PageSwapBacked splitting of huge pages
from vmscan.c - I've a separate mail to send you on that); but it looks
like a lot of huge_memory.c is still HPAGE_PMD_NR-based (I'd say rightly,
because PMD issues are traditional THP's main concern).

If we do move sparsely mlocked folios to the "Unevictable LRU", I'm
not sure how long we can get away with them being invisible to page
reclaim: it will probably need someone (I'm not volunteering) to write
a shrinker for them, along the lines of anon THP's deferred split list,
or shmem THP's unused-beyond-EOF split list.

> 
> Then we have a tradeoff between prettiness of the source code and
> prettiness of the htmldocs.  At one maximum, we can make it look like
> this in the htmldocs:
> 
> struct folio {
>   unsigned long flags;
>   struct list_head lru;
>   unsigned int mlock_count;
>   struct address_space *mapping;
>   pgoff_t index;
>   void *private;
>   atomic_t _mapcount;
>   atomic_t _refcount;
> #ifdef CONFIG_MEMCG;
>   unsigned long memcg_data;
> #endif;
> };
> 
> but at the cost of making the source code look like this:
> 
> struct folio {
>         /* private: don't document the anon union */
>         union {
>                 struct {
>         /* public: */
>                         unsigned long flags;
>         /* private: */
>                         union {
>         /* public: */
>                                 struct list_head lru;
>         /* private: */
>                                 struct {
>                                         void *__filler;
>         /* public: */
>                                         unsigned int mlock_count;
>         /* private: */
>                                 };
>                         };
>         /* public: */
>                         struct address_space *mapping;
> 
> At the other extreme, the htmldocs can look more complicated:
> 
> struct folio {
>   unsigned long flags;
>   union {
>     struct list_head lru;
>     struct {
>       unsigned int mlock_count;
>     };
>   };
>   struct address_space *mapping;
>   pgoff_t index;
>   void *private;
>   atomic_t _mapcount;
>   atomic_t _refcount;
> #ifdef CONFIG_MEMCG;
>   unsigned long memcg_data;
> #endif;
> };
> 
> with the source code changes being merely:
> 
> @@ -227,6 +227,7 @@ struct page {
>   * struct folio - Represents a contiguous set of bytes.
>   * @flags: Identical to the page flags.
>   * @lru: Least Recently Used list; tracks how recently this folio was used.
> + * @mlock_count: Number of times any page in this folio is mlocked.
>   * @mapping: The file this page belongs to, or refers to the anon_vma for
>   *    anonymous memory.
>   * @index: Offset within the file, in units of pages.  For anonymous memory,
> @@ -256,7 +257,9 @@ struct folio {
>                         union {
>                                 struct list_head lru;
>                                 struct {
> +       /* private: */
>                                         void *__filler;
> +       /* public: */
>                                         unsigned int mlock_count;
>                                 };
>                         };
> 
> Various steps in between are possible, such as removing the anonymous
> struct from the documentation, but leaving the union.  We could also
> choose to document __filler, but that seems like a poor choice to me.
> 
> Anyway, that's all arguable and not really too important.  My mild
> preference is for the private/public markers around __filler alone,
> but I'll happily abide by the preferences of others.

Same here: mild preference for this, but defer to others.

> 
> The important part is what mlock_count really means.  We can be
> reasonably verbose here (two or three lines of text, I'd suggest).

mlock_count aims to be the number of page table entries in VM_LOCKED
VMAs for this folio, but may undercount.

That's of course an over-simplification, skirting issues mentioned
above, and that it can only be relied on when PageLRU PageUnevictable;
but let's not get into those here.

But I wrote that "mlock_count aims" sentence above without seeing your
 * @mlock_count: Number of times any page in this folio is mlocked.

Yes, I think I prefer the brevity of what you wrote to mine.

Thanks,
Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ