netdev - Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iK1eyXH_6ViU0sj=yH+3+WVtVdXaAVBUkkcA=0dCvTkOA@mail.gmail.com>
Date:   Mon, 13 Mar 2017 11:58:05 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

On Mon, Mar 13, 2017 at 11:31 AM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Mon, Mar 13, 2017 at 10:50:28AM -0700, Eric Dumazet wrote:
>> On Mon, Mar 13, 2017 at 10:34 AM, Alexei Starovoitov
>> <alexei.starovoitov@...il.com> wrote:
>> > On Sun, Mar 12, 2017 at 05:58:47PM -0700, Eric Dumazet wrote:
>> >> @@ -767,10 +814,30 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>> >>                       case XDP_PASS:
>> >>                               break;
>> >>                       case XDP_TX:
>> >> +                             /* Make sure we have one page ready to replace this one */
>> >> +                             npage = NULL;
>> >> +                             if (!ring->page_cache.index) {
>> >> +                                     npage = mlx4_alloc_page(priv, ring,
>> >> +                                                             &ndma, numa_mem_id(),
>> >> +                                                             GFP_ATOMIC | __GFP_MEMALLOC);
>> >
>> > did you test this with xdp2 test ?
>> > under what conditions it allocates ?
>> > It looks dangerous from security point of view to do allocations here.
>> > Can it be exploited by an attacker?
>> > we use xdp for ddos and lb and this is fast path.
>> > If 1 out of 100s XDP_TX packets hit this allocation we will have serious
>> > perf regression.
>> > In general I dont think it's a good idea to penalize x86 in favor of powerpc.
>> > Can you #ifdef this new code somehow? so we won't have these concerns on x86?
>>
>> Normal paths would never hit this point really. I wanted to be extra
>> safe, because who knows, some guys could be tempted to set
>> ethtool -G ethX  rx 512 tx 8192
>>
>> Before this patch, if you were able to push enough frames in TX ring,
>> you would also eventually be forced to allocate memory, or drop frames...
>
> hmm. not following.
> Into xdp tx queues packets don't come from stack. It can only be via xdp_tx.
> So this rx page belongs to driver, not shared with anyone and it only needs to
> be put onto tx ring, so I don't understand why driver needs to allocating
> anything here. To refill the rx ring? but why here?

Because RX ring can be shared, by packets goind to the upper stacks (say TCP)

So there is no guarantee that the pages in the quarantine pool have
their page count to 1.

The normal TX_XDP path will recycle pages in ring->cache_page .

This is exactly where I pick up a replacement.

Pages in ring->cache_page have the guarantee to have no other users
than ourself (mlx4 driver)

You might have not noticed that current mlx4 driver has a lazy refill
of RX ring buffers, that eventually
removes all the pages from RX ring, and we have to recover with this
lazy mlx4_en_recover_from_oom() thing
that will attempt to restart the allocations.

After my patch, we have the guarantee that the RX ring buffer is
always fully populated.

When we receive a frame (XDP or not), we drop it if we can not
replenish the RX slot,
in case the oldest page in quarantine is not a recycling candidate and
we can not allocate a new page.



> rx 512 tx 8192 is meaningless from xdp pov, since most of the tx entries
> will be unused.

Yes, but as an author of a kernel patch, I do not want to be
responsible for a crash
_if_ someone tries something dumb like that.

> why are you saying it will cause this if (!ring->page_cache.index) to trigger?
>
>> This patch does not penalize x86, quite the contrary.
>> It brings a (small) improvement on x86, and a huge improvement on powerpc.
>
> for normal tcp stack. sure. I'm worried about xdp fast path that needs to be tested.

Note that TX_XDP in mlx4 is completely broken as I mentioned earlier.