lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 27 Jan 2024 03:13:45 -0800
From: Adam Goldman <adamg@...ox.com>
To: Takashi Sakamoto <o-takashi@...amocchi.jp>
Cc: linux1394-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] firewire: core: send bus reset promptly on gap count
 error

Hi,

On Sat, Jan 27, 2024 at 05:37:30PM +0900, Takashi Sakamoto wrote:
> On Tue, Jan 23, 2024 at 12:11:40AM -0800, Adam Goldman wrote:
> > If we are bus manager and the bus has inconsistent gap counts, send a 
> > bus reset immediately instead of trying to read the root node's config 
> > ROM first. Otherwise, we could spend a lot of time trying to read the 
> > config ROM but never succeeding.
> > 
> > This eliminates a 50+ second delay before the FireWire bus is usable 
> > after a newly connected device is powered on in certain circumstances.
> 
> At first, would I request you to explain about the certain
> circumstances in the patch comment? It is really helpful to understand
> the change itself.

The delay occurs if a gap count inconsistency occurs, we are not the 
root node, and we become bus manager. One scenario that causes this is 
with a TI XIO2213B OHCI, the first time a Sony DSR-25 is powered on 
after being connected to the FireWire cable.

> > Link: https://sourceforge.net/p/linux1394/mailman/message/58727806/

This link has a longer description and kernel logs.


> > --- linux-source-6.1.orig/drivers/firewire/core-card.c	2023-09-23 02:11:13.000000000 -0700
> > +++ linux-source-6.1/drivers/firewire/core-card.c	2024-01-22 04:23:06.000000000 -0800
> > @@ -435,6 +435,16 @@
> >  		 * config rom.  In either case, pick another root.
> >  		 */
> >  		new_root_id = local_id;
> > +	} else if (card->gap_count == 0) {
> > +		/* 
> > +		 * If self IDs have inconsistent gap counts, do a
> > +		 * bus reset ASAP. The config rom read might never
> > +		 * complete, so don't wait for it. However, still
> > +		 * send a PHY configuration packet prior to the bus
> > +		 * reset, as permitted by IEEE 1394-2008 8.4.5.2.
> > +		 */
> > +		new_root_id = local_id;
> > +		card->bm_retries = 0;
> >  	} else if (!root_device_is_running) {
> >  		/*
> > 		 * If we haven't probed this device yet, bail out now
> 
> Next, after the condition branches, we can see below lines:
> 
> ```
> 	/*
> 	 * Finally, figure out if we should do a reset or not.  If we have
> 	 * done less than 5 resets with the same physical topology and we
> 	 * have either a new root or a new gap count setting, let's do it.
> 	 */
> 
> 	if (card->bm_retries++ < 5 &&
> 	    (card->gap_count != gap_count || new_root_id != root_id))
> 		do_reset = true;
> ```
> 
> When the value of "card->gap_count" is zero, it would hit the condition of
> "card->gap_count != gap_count". I think the transmission of phy config
> packet and scheduling of short bus reset would be done, regardless of the
> change. Would I ask the main intention to the additional branch?

Without the additional branch, the !root_device_is_running branch will 
be taken (because the root node's config ROM hasn't been read yet), and 
bm_work will go to sleep. Eventually we will give up trying to read the 
config ROM, the root_device==NULL branch will be taken, and the bus 
reset will be done. The additional branch eliminates waiting for the 
config ROM read when gap_count is zero.

Here is the full sequence of events:

1. Bus reset occurs due to newly active device.

2. Self identification process completes. We are not root node. Gap 
counts are inconsistent.

3. build_tree() notices the gap count error and sets gap_count to 0:

> 		/*
> 		 * If PHYs report different gap counts, set an invalid count
> 		 * which will force a gap count reconfiguration and a reset.
> 		 */
> 		if (SELF_ID_GAP_COUNT(q) != gap_count)
> 			gap_count = 0;

4. bm_work() starts and makes us bus manager:

> 		rcode = fw_run_transaction(card, TCODE_LOCK_COMPARE_SWAP,
> 				irm_id, generation, SCODE_100,
> 				CSR_REGISTER_BASE + CSR_BUS_MANAGER_ID,
> 				transaction_data, 8);

5. read_config_rom() starts reading the root node's config ROM.

6. bm_work() wants to know if the root node is cycle master capable. If 
the root node is cycle master capable, we will leave it as the root 
node. Otherwise, we will make ourself the root node. To determine if the 
root node is cycle master capable, we must wait until its config ROM has 
been read:

> 	} else if (!root_device_is_running) {
> 		/*
> 		 * If we haven't probed this device yet, bail out now
> 		 * and let's try again once that's done.
> 		 */
> 		spin_unlock_irq(&card->lock);
> 		goto out;

7a. Without the patch: read_config_rom() reads the first few quadlets 
from the config ROM. Due to the gap count inconsistency, eventually one 
of the reads times out. When read_config_rom() fails, fw_device_init() 
calls it again until MAX_RETRIES is reached. This takes 50+ seconds.

8a. bm_work() sees that we have given up trying to read the config ROM. 
It makes us the root node and does a bus reset:

> 	if (root_device == NULL) {
> 		/*
> 		 * Either link_on is false, or we failed to read the
> 		 * config rom.  In either case, pick another root.
> 		 */
> 		new_root_id = local_id;
> ...
> 	if (card->bm_retries++ < 5 &&
> 	    (card->gap_count != gap_count || new_root_id != root_id))
> 		do_reset = true;

7b. With the patch: Because of the gap count inconsistency, bm_work() 
does not wait for the config ROM to be read. It makes us the root node 
and does a bus reset immediately:

> 	} else if (card->gap_count == 0) {
> 		/*
> 		 * If self IDs have inconsistent gap counts, do a
> 		 * bus reset ASAP. The config rom read might never
> 		 * complete, so don't wait for it. However, still
> 		 * send a PHY configuration packet prior to the
> 		 * bus reset, as permitted by 1394-2008 8.4.5.2.
> 		 */
> 		new_root_id = local_id;
> 		card->bm_retries = 0;
> ...
> 	if (card->bm_retries++ < 5 &&
> 	    (card->gap_count != gap_count || new_root_id != root_id))
> 		do_reset = true;

-- Adam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ