lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190618093431.GA2577@redhat.com>
Date:   Tue, 18 Jun 2019 11:34:31 +0200
From:   Stanislaw Gruszka <sgruszka@...hat.com>
To:     Soeren Moch <smoch@....de>
Cc:     Helmut Schaa <helmut.schaa@...glemail.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        "David S. Miller" <davem@...emloft.net>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] rt2x00: fix rx queue hang

Hi

On Mon, Jun 17, 2019 at 11:46:56AM +0200, Soeren Moch wrote:
> Since commit ed194d136769 ("usb: core: remove local_irq_save() around
>  ->complete() handler") the handlers rt2x00usb_interrupt_rxdone() and
> rt2x00usb_interrupt_txdone() are not running with interrupts disabled
> anymore. So these handlers are not guaranteed to run completely before
> workqueue processing starts. So only mark entries ready for workqueue
> processing after proper accounting in the dma done queue.

It was always the case on SMP machines that rt2x00usb_interrupt_{tx/rx}done
can run concurrently with rt2x00_work_{rx,tx}done, so I do not
understand how removing local_irq_save() around complete handler broke
things.

Have you reverted commit ed194d136769 and the revert does solve the problem ?

Between 4.19 and 4.20 we have some quite big changes in rt2x00 driver:

0240564430c0 rt2800: flush and txstatus rework for rt2800mmio
adf26a356f13 rt2x00: use different txstatus timeouts when flushing
5022efb50f62 rt2x00: do not check for txstatus timeout every time on tasklet
0b0d556e0ebb rt2800mmio: use txdone/txstatus routines from lib
5c656c71b1bf rt2800: move usb specific txdone/txstatus routines to rt2800lib

so I'm a bit afraid that one of those changes is real cause of
the issue not ed194d136769 .

> Note that rt2x00usb_work_rxdone() processes all available entries, not
> only such for which queue_work() was called.
> 
> This fixes a regression on a RT5370 based wifi stick in AP mode, which
> suddenly stopped data transmission after some period of heavy load. Also
> stopping the hanging hostapd resulted in the error message "ieee80211
> phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
> Other operation modes are probably affected as well, this just was
> the used testcase.

Do you know what actually make the traffic stop,
TX queue hung or RX queue hung?

> diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> index 1b08b01db27b..9c102a501ee6 100644
> --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> @@ -263,9 +263,9 @@ EXPORT_SYMBOL_GPL(rt2x00lib_dmastart);
> 
>  void rt2x00lib_dmadone(struct queue_entry *entry)
>  {
> -	set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);
>  	clear_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags);
>  	rt2x00queue_index_inc(entry, Q_INDEX_DMA_DONE);
> +	set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);

Unfortunately I do not understand how this suppose to fix the problem,
could you elaborate more about this change?

Stanislaw

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ