lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8736sxgei1.fsf@toke.dk>
Date:   Tue, 23 Oct 2018 12:10:30 +0200
From:   Toke Høiland-Jørgensen <toke@...e.dk>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        "netdev\@vger.kernel.org" <netdev@...r.kernel.org>
Cc:     Eran Ben Elisha <eranbe@...lanox.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        "brouer\@redhat.com" <brouer@...hat.com>
Subject: Re: Kernel oops with mlx5 and dual XDP redirect programs

Saeed Mahameed <saeedm@...lanox.com> writes:

> On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
>> Saeed Mahameed <saeedm@...lanox.com> writes:
>> 
>> > I think that the mlx5 driver doesn't know how to tell the other
>> > device
>> > to stop transmitting to it while it is resetting.. Maybe tariq or
>> > Jesper know more about this ?
>> > I will look at this tomorrow after noon and will try to repro...
>> 
>> Hi Saeed
>> 
>> Did you have a chance to poke at this? :)
>
> HI Toke, yes i have been planing to respond but also i wanted to dig
> more,
>
> so the root cause is very clear.
>
> 1. core 1 is doing tx_dev->ndo_xdp_xmit()
> 2. core 2 is doing tx_dev->xdp_set() //remove xdp program.

Right, it was also my guess that it was related to this interaction.
Thanks for looking into it!

> and the problem is beyond mlx5, since we don't have a way to tell a
> different core/different netdev to stop xmitting, or at least
> synchronize with it.

Hmm, ideally there should be some way for the higher level XDP API to
notice this and abort the call before it even reaches the driver on the
TX side, shouldn't there? At LPC, Jesper and I will be talking about a
proposal for decoupling the ndo_xdp_xmit() resource allocation from
loading and unloading XDP programs, which I guess could be a way to deal
with this as well.

In the meantime...

> I will be waiting for your confirmation that the fix did work.

I tested your patch, and it does indeed fix the crash. However, it also
seems to have the effect that the XDP redirect continues to function
even after removing the XDP program on the target device.

I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets being
redirected out $TX_IF. Is this intentional?

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ