lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sun, 13 Jan 2019 10:19:12 +0000
From:   Tariq Toukan <tariqt@...lanox.com>
To:     Pieter Noordhuis <pietern@...com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     Jes Sorensen <jsorensen@...com>, Alexei Starovoitov <ast@...com>,
        John Reumann <bold@...com>, Mark Marchukov <march@...com>,
        Yonghong Song <yhs@...com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>
Subject: Re: Sticky packet drops on mlx5 RX queue

Hi Pieter,


On 1/11/2019 12:41 AM, Pieter Noordhuis wrote:
> I'm looking into an issue with mlx5 on 4.11.3. It is triggered by high memory pressure but continues for long after the memory pressure is gone. It starts to continuously use pfmemalloc pages, some of which appear to be coming from an RX queue's page cache.
> 
> Attached is a log file showing a second-by-second diff of ethtool counters for a single RX queue that was showing this behavior. This log doesn't capture the start of these drops, because the ethtool monitoring is only started until after the first drops are detected. Every increase of the “cache_waive” counter means mlx5 refused to add a page to its page cache because it was a pfmemalloc page. It also means the corresponding packet gets dropped in sk_filter_trim_cap.
> 
> Initially, the log shows the “cache_busy” counter increasing, meaning that the first page in the page cache has >1 references, so can't be used.

Right, it's a head-of-the-queue blocking. So pages are allocated instead.

> Then after roughly a minute, it switches to increasing the “cache_reuse” and “cache_waive” counters. This means that the pages are coming from the RX queue's page cache *and** *are not put back because they are pfmemalloc pages.

This means the head-of-queue is released, pages are popped from queue 
but fails to get re-pushed, due to the mlx5e_page_is_reserved() check. 
So the cache eventually gets empty.

>This is highly suspicious, as they shouldn't end up in the page cache in the first place. Then, after reusing 255 pages from the page cache, the “cache_empty” counter starts to increase, in lock step with the “cache_waive” counter. This means that the pages are allocated with dev_alloc_pages and not placed in the page cache, because they are pfmemalloc pages. This is also suspicious, because with the memory pressure gone, dev_alloc_pages shouldn't be returning pfmemalloc pages.

Notice that the mlx5e_page_is_reserved() combines two conditions:
[1] page_is_pfmemalloc(page)
[2] page_to_nid(page) != numa_mem_id();

In your case [2] could hold. Can you repro and check that?

> By the time it stops incrementing “cache_waive”, a total of 3804 pages were waived (and packets were dropped), over a duration of 1895 seconds.
> 
> What I would expect to happen is the “cache_reuse” and “cache_waive” to never be incremented in lock step, as pfmemalloc pages must never be added to the RX queue page cache to begin with. Similarly, I would expect “cache_empty” and “cache_waive” to never be incremented in lock step if there is no memory pressure.
> 
> Static analysis of mlx5 on 4.11.3 has so far not lead to any insights as to why this is happening. Any help in this investigation is much appreciated. If there is any additional information I can provide please me know.

Please try to identify the specific reason of mlx5e_page_is_reserved(), 
you might need to hook/modify the driver.
If we see that [2] holds, then it would explain the behavior.

> 
> Pieter
> 

Regards,
Tariq

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ