lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <30b4e9a4-c0ad-8ac5-0bbb-f186f1caf7a7@huawei.com>
Date:   Tue, 6 Apr 2021 20:24:14 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Michal Kubecek <mkubecek@...e.cz>
CC:     Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Kosina <jikos@...nel.org>,
        Hillf Danton <hdanton@...a.com>,
        John Fastabend <john.fastabend@...il.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Kehuan Feng <kehuan.feng@...il.com>,
        "Jike Song" <albcamus@...il.com>,
        Michael Zhivich <mzhivich@...mai.com>,
        "David Miller" <davem@...emloft.net>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>, Josh Hunt <johunt@...mai.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

On 2021/4/6 15:31, Michal Kubecek wrote:
> On Tue, Apr 06, 2021 at 10:46:29AM +0800, Yunsheng Lin wrote:
>> On 2021/4/6 9:49, Cong Wang wrote:
>>> On Sat, Apr 3, 2021 at 5:23 AM Jiri Kosina <jikos@...nel.org> wrote:
>>>>
>>>> I am still planning to have Yunsheng Lin's (CCing) fix [1] tested in the
>>>> coming days. If it works, then we can consider proceeding with it,
>>>> otherwise I am all for reverting the whole NOLOCK stuff.
>>>>
>>>> [1] https://lore.kernel.org/linux-can/1616641991-14847-1-git-send-email-linyunsheng@huawei.com/T/#u
>>>
>>> I personally prefer to just revert that bit, as it brings more troubles
>>> than gains. Even with Yunsheng's patch, there are still some issues.
>>> Essentially, I think the core qdisc scheduling code is not ready for
>>> lockless, just look at those NOLOCK checks in sch_generic.c. :-/
>>
>> I am also awared of the NOLOCK checks too:), and I am willing to
>> take care of it if that is possible.
>>
>> As the number of cores in a system is increasing, it is the trend
>> to become lockless, right? Even there is only one cpu involved, the
>> spinlock taking and releasing takes about 30ns on our arm64 system
>> when CONFIG_PREEMPT_VOLUNTARY is enable(ip forwarding testing).
> 
> I agree with the benefits but currently the situation is that we have
> a race condition affecting the default qdisc which is being hit in
> production and can cause serious trouble which is made worse by commit
> 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host
> queues") preventing the retransmits of the stuck packet being sent.
> 
> Perhaps rather than patching over current implementation which requires
> more and more complicated hacks to work around the fact that we cannot
> make the "queue is empty" check and leaving the critical section atomic,
> it would make sense to reimplement it in a way which would allow us
> making it atomic.

Yes, reimplementing that is also an option.
But what if reimplemention also has the same problem if we do not find
the root cause of this problem? I think it better to find the root cause
of it first?

> 
> Michal
> 
> 
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ