lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Jan 2022 11:38:23 +0000
From:   Mark Hemment <markhemm@...glemail.com>
To:     Charan Teja Kalla <quic_charante@...cinc.com>
Cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>, vbabka@...e.cz,
        rientjes@...gle.com, mhocko@...e.com,
        Suren Baghdasaryan <surenb@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Charan Teja Reddy <charante@...eaurora.org>
Subject: Re: [PATCH v3 RESEND] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED
 for shmem

On Mon, 10 Jan 2022 at 15:14, Charan Teja Kalla
<quic_charante@...cinc.com> wrote:
>
> Thanks again Mark for the review comments!!
>
> On 1/10/2022 6:06 PM, Mark Hemment wrote:
> > On Thu, 6 Jan 2022 at 17:06, Charan Teja Reddy
> > <quic_charante@...cinc.com> wrote:
> >>
> >> From: Charan Teja Reddy <charante@...eaurora.org>
> >>
> >> Currently fadvise(2) is supported only for the files that doesn't
> >> associated with noop_backing_dev_info thus for the files, like shmem,
> >> fadvise results into NOP. But then there is file_operations->fadvise()
> >> that lets the file systems to implement their own fadvise
> >> implementation. Use this support to implement some of the POSIX_FADV_XXX
> >> functionality for shmem files.
> > ...
> >> +static void shmem_isolate_pages_range(struct address_space *mapping, loff_t start,
> >> +                               loff_t end, struct list_head *list)
> >> +{
> >> +       XA_STATE(xas, &mapping->i_pages, start);
> >> +       struct page *page;
> >> +
> >> +       rcu_read_lock();
> >> +       xas_for_each(&xas, page, end) {
> >> +               if (xas_retry(&xas, page))
> >> +                       continue;
> >> +               if (xa_is_value(page))
> >> +                       continue;
> >> +               if (!get_page_unless_zero(page))
> >> +                       continue;
> >> +               if (isolate_lru_page(page))
> >> +                       continue;
> >
> > Need to unwind the get_page on failure to isolate.
>
> Will be done.
>
> >
> > Should PageUnevicitable() pages (SHM_LOCK) be skipped?
> > (That is, does SHM_LOCK override DONTNEED?)
>
>
> Should be skipped. Will be done.
>
> >
> > ...
> >> +static int shmem_fadvise_dontneed(struct address_space *mapping, loff_t start,
> >> +                               loff_t end)
> >> +{
> >> +       int ret;
> >> +       struct page *page;
> >> +       LIST_HEAD(list);
> >> +       struct writeback_control wbc = {
> >> +               .sync_mode = WB_SYNC_NONE,
> >> +               .nr_to_write = LONG_MAX,
> >> +               .range_start = 0,
> >> +               .range_end = LLONG_MAX,
> >> +               .for_reclaim = 1,
> >> +       };
> >> +
> >> +       if (!shmem_mapping(mapping))
> >> +               return -EINVAL;
> >> +
> >> +       if (!total_swap_pages)
> >> +               return 0;
> >> +
> >> +       lru_add_drain();
> >> +       shmem_isolate_pages_range(mapping, start, end, &list);
> >> +
> >> +       while (!list_empty(&list)) {
> >> +               page = lru_to_page(&list);
> >> +               list_del(&page->lru);
> >> +               if (page_mapped(page))
> >> +                       goto keep;
> >> +               if (!trylock_page(page))
> >> +                       goto keep;
> >> +               if (unlikely(PageTransHuge(page))) {
> >> +                       if (split_huge_page_to_list(page, &list))
> >> +                               goto keep;
> >> +               }
> >
> > I don't know the shmem code and the lifecycle of a shm-page, so
> > genuine questions;
> > When the try-lock succeeds, should there be a test for PageWriteback()
> > (page skipped if true)?  Also, does page->mapping need to be tested
> > for NULL to prevent races with deletion from the page-cache?
>
> I failed to envisage it. I should have considered both these conditions
> here. BTW, I am just thinking about why we shouldn't use
> reclaim_pages(page_list) function here with an extra set_page_dirty() on
> a page that is isolated? It just call the shrink_page_list() where all
> these conditions are properly handled. What is your opinion here?

Should be possible to use reclaim_pages() (I haven't look closely).
It might actually be good to use this function, as will do some
congestion throttling.  Although it will always try to unmap
pages (note: your page_mapped() test is 'unstable' as done without the
page locked), so might give behaviour you want to avoid.
Note: reclaim_pages() is already used for madvise(PAGEOUT).  The shmem
code would need to prepare page(s) to help shrink_page_list() to make
progress (see madvise.c:madvise_cold_or_pageout_pte_range()).

Taking a step back; is fadvise(DONTNEED) really needed/wanted?  Yes,
you gave a usecase (which I cut from this thread in my earlier reply),
but I'm not familiar with various shmem uses to know if this feature
is needed.  Someone else will need to answer this.

Cheers,
Mark

>
> >
> > ...
> >> +
> >> +               clear_page_dirty_for_io(page);
> >> +               SetPageReclaim(page);
> >> +               ret = shmem_writepage(page, &wbc);
> >> +               if (ret || PageWriteback(page)) {
> >> +                       if (ret)
> >> +                               unlock_page(page);
> >> +                       goto keep;
> >> +               }
> >> +
> >> +               if (!PageWriteback(page))
> >> +                       ClearPageReclaim(page);
> >> +
> >> +               /*
> >> +                * shmem_writepage() place the page in the swapcache.
> >> +                * Delete the page from the swapcache and release the
> >> +                * page.
> >> +                */
> >> +               __mod_node_page_state(page_pgdat(page),
> >> +                               NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page));
> >> +               lock_page(page);
> >> +               delete_from_swap_cache(page);
> >> +               unlock_page(page);
> >> +               put_page(page);
> >> +               continue;
> >> +keep:
> >> +               putback_lru_page(page);
> >> +               __mod_node_page_state(page_pgdat(page),
> >> +                               NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page));
> >> +       }
> >
> > The putback_lru_page() drops the last reference hold this code has on
> > 'page'.  Is it safe to use 'page' after dropping this reference?
>
> True. Will correct it in the next revision.
>
> >
> > Cheers,
> > Mark
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ