netdev - Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20191219115338.GC26945@dhcp22.suse.cz>
Date:   Thu, 19 Dec 2019 12:53:38 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "brouer@...hat.com" <brouer@...hat.com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        Li Rongqing <lirongqing@...du.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        peterz@...radead.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        bhelgaas@...gle.com,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
 condition

On Thu 19-12-19 10:09:33, Yunsheng Lin wrote:
[...]
> > There is not real guarantee that NUMA_NO_NODE is going to imply local
> > node and we do not want to grow any subtle dependency on that behavior.
> 
> Strictly speaking, using numa_mem_id() also does not have real guarantee
> that it will allocate local memory when local memory is used up, right?
> Because alloc_pages_node() is basically turning the node to numa_mem_id()
> when it is NUMA_NO_NODE.

yes, both allocations are allowed to fallback to other nodes unless
there is an explicit nodemask specified.

> Unless we do not allow passing NUMA_NO_NODE to alloc_pages_node(), otherwise
> I can not see difference between NUMA_NO_NODE and numa_mem_id().

The difference is in the presented intention. NUMA_NO_NODE means no node
preference. We turn it into an implicit local node preference because
this is the best assumption we can in general. If you provide numa_mem_id
then you explicitly ask for the local node preference because you know
that this is the best for your specific code. See the difference?

The NUMA_NO_NODE -> local node is a heuristic which might change
(albeit unlikely).
 
> >> And for those drivers, locality is decided by rx interrupt affinity, not
> >> dev_to_node(). So when rx interrupt affinity changes, the old page from old
> >> node will not be recycled(by checking page_to_nid(page) == numa_mem_id()),
> >> new pages will be allocated to replace the old pages and the new pages will
> >> be recycled because allocation and recycling happens in the same context,
> >> which means numa_mem_id() returns the same node of new page allocated, see
> >> [2].
> > 
> > Well, but my understanding is that the generic page pool implementation
> > has a clear means to change the affinity (namely page_pool_update_nid()).
> > So my primary question is, why does NUMA_NO_NODE should be use as a
> > bypass for that?
> 
> In that case, page_pool_update_nid() need to be called explicitly, which
> may not be the reality, because for drivers using page pool now, mlx5 seems
> to be the only one to call page_pool_update_nid(), which may lead to
> copy & paste problem when not careful enough.

The API is quite new AFAIU and I think it would be better to use it in
the intended way. Relying on implicit and undocumented behavior is just
going to bend that API in the future and it will impose an additional
burden to any future changes.
-- 
Michal Hocko
SUSE Labs