lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZNY4bz1450enHxlG@qmqm.qmqm.pl>
Date:   Fri, 11 Aug 2023 15:32:31 +0200
From:   Michał Mirosław <mirq-linux@...e.qmqm.pl>
To:     Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc:     Michał Mirosław <emmir@...gle.com>,
        Peter Xu <peterx@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrei Vagin <avagin@...il.com>,
        Danylo Mocherniuk <mdanylo@...gle.com>,
        Paul Gofman <pgofman@...eweavers.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Mike Rapoport <rppt@...nel.org>, Nadav Amit <namit@...are.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Shuah Khan <shuah@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Yun Zhou <yun.zhou@...driver.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Alex Sierra <alex.sierra@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        "Gustavo A . R . Silva" <gustavoars@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>, kernel@...labora.com
Subject: Re: [PATCH v28 2/6] fs/proc/task_mmu: Implement IOCTL to get and
 optionally clear info about PTEs

On Fri, Aug 11, 2023 at 05:02:44PM +0500, Muhammad Usama Anjum wrote:
> On 8/10/23 10:32 PM, Michał Mirosław wrote:
> > On Wed, 9 Aug 2023 at 08:16, Muhammad Usama Anjum
> > <usama.anjum@...labora.com> wrote:
[...]
> >> --- a/fs/proc/task_mmu.c
> >> +++ b/fs/proc/task_mmu.c
> > [...]
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +static unsigned long pagemap_thp_category(pmd_t pmd)
> >> +{
> >> +       unsigned long categories = PAGE_IS_HUGE;
> >> +
> >> +       if (pmd_present(pmd)) {
> >> +               categories |= PAGE_IS_PRESENT;
> >> +               if (!pmd_uffd_wp(pmd))
> >> +                       categories |= PAGE_IS_WRITTEN;
> >> +               if (is_zero_pfn(pmd_pfn(pmd)))
> >> +                       categories |= PAGE_IS_PFNZERO;
> >> +       } else if (is_swap_pmd(pmd)) {
> >> +               categories |= PAGE_IS_SWAPPED;
> >> +               if (!pmd_swp_uffd_wp(pmd))
> >> +                       categories |= PAGE_IS_WRITTEN;
> >> +       }
> >> +
> >> +       return categories;
> >> +}
> > I guess THPs can't be file-backed currently, but can we somehow mark
> > this assumption so it can be easily found if the capability arrives?
> Yeah, THPs cannot be file backed. Lets not care for some feature which may
> not arrive in several years or eternity.

Yes, it might not arrive. But please add at least a comment, so that it
is clearly visible that lack if PAGE_IS_FILE here is intentional.

> >> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >> +
> >> +#ifdef CONFIG_HUGETLB_PAGE
> >> +static unsigned long pagemap_hugetlb_category(pte_t pte)
> >> +{
> >> +       unsigned long categories = PAGE_IS_HUGE;
> >> +
> >> +       if (pte_present(pte)) {
> >> +               categories |= PAGE_IS_PRESENT;
> >> +               if (!huge_pte_uffd_wp(pte))
> >> +                       categories |= PAGE_IS_WRITTEN;
> >> +               if (!PageAnon(pte_page(pte)))
> >> +                       categories |= PAGE_IS_FILE;
> >> +               if (is_zero_pfn(pte_pfn(pte)))
> >> +                       categories |= PAGE_IS_PFNZERO;
> >> +       } else if (is_swap_pte(pte)) {
> >> +               categories |= PAGE_IS_SWAPPED;
> >> +               if (!pte_swp_uffd_wp_any(pte))
> >> +                       categories |= PAGE_IS_WRITTEN;
> >> +       }
> > 
> > BTW, can a HugeTLB page be file-backed and swapped out?
> Accourding to pagemap_hugetlb_range(), file-backed HugeTLB page cannot be
> swapped.

Here too a comment that leaving out this case is intentional would be useful.

> > [...]
> >> +       walk_start = p.arg.start;
> >> +       for (; walk_start < p.arg.end; walk_start = p.arg.walk_end) {
[...[
> >> +               ret = mmap_read_lock_killable(mm);
> >> +               if (ret)
> >> +                       break;
> >> +               ret = walk_page_range(mm, walk_start, p.arg.end,
> >> +                                     &pagemap_scan_ops, &p);
> >> +               mmap_read_unlock(mm);
[...]
> >> +               if (ret != -ENOSPC || p.arg.vec_len - 1 == 0 ||
> >> +                   p.found_pages == p.arg.max_pages)
> >> +                       break;
> > 
> > The second condition is equivalent to `p.arg.vec_len == 1`, but why is
> > that an ending condition? Isn't the last entry enough to gather one
> > more range? (The walk could have returned -ENOSPC due to buffer full
> > and after flushing it could continue with the last free entry.)
> Now we are walking the entire range walk_page_range(). We don't break loop
> when we get -ENOSPC as this error may only mean that the temporary buffer
> is full. So we need check if max pages have been found or output buffer is
> full or ret is 0 or any other error. When p.arg.vec_len = 1 is end
> condition as the last entry is in cur. As we have walked over the entire
> range, cur must be full after which the walk returned.
> 
> So current condition is necessary. I've double checked it. I'll change it
> to `p.arg.vec_len == 1`.

If we have walked the whole range, then the loop will end anyway due to
`walk_start < walk_end` not held in the `for()`'s condition.

[...]
> >> +/*
> >> + * struct pm_scan_arg - Pagemap ioctl argument
> >> + * @size:              Size of the structure
> >> + * @flags:             Flags for the IOCTL
> >> + * @start:             Starting address of the region
> >> + * @end:               Ending address of the region
> >> + * @walk_end           Address where the scan stopped (written by kernel).
> >> + *                     walk_end == end informs that the scan completed on entire range.
> > 
> > Can we ensure this holds also for the tagged pointers?
> No, we cannot.

So this need explanation in the comment here. (Though I'd still like to
know how the address tags are supposed to be used from someone that
knows them.)

Best Regards
Michał Mirosław

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ