lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR15MB2278A68E25BE057956B7D46BC0840@BYAPR15MB2278.namprd15.prod.outlook.com>
Date:   Thu, 10 Jan 2019 22:41:24 +0000
From:   Pieter Noordhuis <pietern@...com>
To:     "saeedm@...lanox.com" <saeedm@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     Jes Sorensen <jsorensen@...com>, Alexei Starovoitov <ast@...com>,
        "John Reumann" <bold@...com>, Mark Marchukov <march@...com>,
        Yonghong Song <yhs@...com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>
Subject: Sticky packet drops on mlx5 RX queue

I'm looking into an issue with mlx5 on 4.11.3. It is triggered by high memory pressure but continues for long after the memory pressure is gone. It starts to continuously use pfmemalloc pages, some of which appear to be coming from an RX queue's page cache.

Attached is a log file showing a second-by-second diff of ethtool counters for a single RX queue that was showing this behavior. This log doesn't capture the start of these drops, because the ethtool monitoring is only started until after the first drops are detected. Every increase of the “cache_waive” counter means mlx5 refused to add a page to its page cache because it was a pfmemalloc page. It also means the corresponding packet gets dropped in sk_filter_trim_cap.

Initially, the log shows the “cache_busy” counter increasing, meaning that the first page in the page cache has >1 references, so can't be used. Then after roughly a minute, it switches to increasing the “cache_reuse” and “cache_waive” counters. This means that the pages are coming from the RX queue's page cache *and** *are not put back because they are pfmemalloc pages. This is highly suspicious, as they shouldn't end up in the page cache in the first place. Then, after reusing 255 pages from the page cache, the “cache_empty” counter starts to increase, in lock step with the “cache_waive” counter. This means that the pages are allocated with dev_alloc_pages and not placed in the page cache, because they are pfmemalloc pages. This is also suspicious, because with the memory pressure gone, dev_alloc_pages shouldn't be returning pfmemalloc pages. By the time it stops incrementing “cache_waive”, a total of 3804 pages were waived (and packets were dropped), over a duration of 1895 seconds.

What I would expect to happen is the “cache_reuse” and “cache_waive” to never be incremented in lock step, as pfmemalloc pages must never be added to the RX queue page cache to begin with. Similarly, I would expect “cache_empty” and “cache_waive” to never be incremented in lock step if there is no memory pressure.

Static analysis of mlx5 on 4.11.3 has so far not lead to any insights as to why this is happening. Any help in this investigation is much appreciated. If there is any additional information I can provide please me know.

Pieter
View attachment "rx25_ethtool.txt" of type "text/plain" (455900 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ