| Commit message (Collapse) | Author |
|
commit 104739aca4488909175e9e31d5cd7d75b82a2046 upstream.
If the device is power-cycled, it takes time for the initiator to transmit
the periodic NOTIFY (ENABLE SPINUP) SAS primitive, and for the device to
respond to the primitive to become ACTIVE. Retry the I/O request to allow
the device time to become ACTIVE.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210629155826.48441-1-quat.le@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Quat Le <quat.le@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7661a8e5ce10b5321882d0bbaf3f81070903319 upstream.
When instrumenting the SCSI layer to run into the
!blk_rq_nr_phys_segments(rq) case the following warning emitted from the
block layer:
blk_peek_request: bad return=-22
This happens because since commit fd3fc0b4d730 ("scsi: don't BUG_ON()
empty DMA transfers") we return the wrong error value from
scsi_prep_fn() back to the block layer.
[mkp: silenced checkpatch]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: fd3fc0b4d730 scsi: don't BUG_ON() empty DMA transfers
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[iwamatsu: - backport for 4.4.y and 4.9.y
- Use rq->nr_phys_segments instead of blk_rq_nr_phys_segments]
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 857de6e00778738dc3d61f75acbac35bdc48e533 upstream.
The device handler needs to check if a given queue belongs to a scsi
device; only then does it make sense to attach a device handler.
[mkp: dropped flags]
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fd3fc0b4d7305fa7246622dcc0dec69c42443f45 upstream.
Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.
Found by Dmitry Vyukov and syzkaller.
[ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
might still be a cause of excessive log spamming.
NOTE! If this warning ever triggers, we may end up leaking resources,
since this doesn't bother to try to clean the command up. So this
WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
much worse.
People really need to stop using BUG_ON() for "this shouldn't ever
happen". It makes pretty much any bug worse. - Linus ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream.
When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data. This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request. Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time. This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done. Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.
Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep. Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep. The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.
Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Add a ->handler and a ->handler_data field to struct scsi_device and kill
this indirection. Also move struct scsi_device_handler to scsi_dh.h so that
changes to it don't require rebuilding every SCSI LLDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|
Log the ALUA state change unit attention correctly with
the message log and emit an event to allow user-space
tools to react to it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|
The 'sd' driver is calling scsi_mode_sense() to figure out
internal details. But scsi_mode_sense() never checks for
any pending unit attentions, so we're getting annoying error
messages like:
MODE SENSE: unimplemented page/subpage: 0x00/0x00
and a possible wrong decision for device cache handling.
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|
Fix a memory leak with scsi-mq triggered by commands with large data
transfer length.
__sg_alloc_table() sets both table->nents and table->orig_nents to the
same value. When the scatterlist is DMA-mapped, table->nents is
overwritten with the (possibly smaller) size of the DMA-mapped
scatterlist, while table->orig_nents retains the original size of the
allocated scatterlist. scsi_free_sgtable() should therefore check
orig_nents instead of nents, and all code that initializes sdb->table
without calling __sg_alloc_table() should set both nents and orig_nents.
Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
BUG: unable to handle kernel paging request at ffffc9001a6bc084
IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
Call Trace:
[<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
[<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
[<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
[<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
[<ffffffff81223b37>] __blk_run_queue+0x27/0x30
[<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
[<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
[<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
[<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
[<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
[<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
[<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
[<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
[<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
[<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
[<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
[<ffffffff811589de>] vfs_write+0xce/0x140
[<ffffffff81158b53>] sys_write+0x53/0xa0
[<ffffffff81464592>] system_call_fastpath+0x16/0x1b
[<00007f611c9d9300>] 0x7f611c9d92ff
Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|
This is the blk-mq part to support tag allocation policy. The default
allocation policy isn't changed (though it's not a strict FIFO). The new
policy is round-robin for libata. But it's a try-best implementation. If
multiple tasks are competing, the tags returned will be mixed (which is
unavoidable even with !mq, as requests from different tasks can be
mixed in queue)
Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This can happen if a multipathed device uses DIX and another path is
added via an adapter that does not support it. Multipath should not
allow this path to be added, but we should not depend upon that to avoid
crashing.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The blk-mq ->queue_rq method is always called from process context,
but might have preemption disabled. This means we still always
have to use GFP_ATOMIC for memory allocations, and thus need to
revert part of commit 3c356bde1 ("scsi: stop passing a gfp_mask
argument down the command setup path").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
|
|
This fixes random memory corruption triggered when all three of the
following are true:
* scsi-mq enabled
* T10 Protection Information (DIF) enabled
* SCSI host with sg_tablesize > SCSI_MAX_SG_SEGMENTS (128)
The symptoms of this bug are unpredictable memory corruption, BUG()s,
oopses, lockups, etc., any of which may appear to be completely
unrelated to the root cause.
Cc: <stable@vger.kernel.org> # 3.17.x, 3.18.x
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
scsi_lib.c is where the rest of the I/O submission path lives, so move
scsi_dispatch_cmd there and mark it static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
There is no reason for ULDs to pass in a flag on how to allocate the S/G
lists. While we don't need GFP_ATOMIC for the blk-mq case because we
don't hold locks, that decision can be made way down the chain without
having to pass a pointless gfp_mask argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
There's only one caller left, so inline it and reduce the blk-mq vs !blk-mq
diff a litte bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
Currently scsi piggy backs on the block layer to define the concept
of a tagged command. But we want to be able to have block-level host-wide
tags assigned even for untagged commands like the initial INQUIRY, so add
a new SCSI-level flag for commands that are tagged at the scsi level, so
that even commands without that set can have tags assigned to them. Note
that this alredy is the case for the blk-mq code path, and this just lets
the old path catch up with it.
We also set this flag based upon sdev->simple_tags instead of the block
queue flag, so that it is entirely independent of the block layer tagging,
and thus always correct even if a driver doesn't use block level tagging
yet.
Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and
removing it forces them to look for the proper replacement.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
Allow a SCSI LLD to declare how many hardware queues it supports
by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
There can be quite a lot of I/O error messages, even on smaller
machines. So we need to ratelimit them to not overwhelm logging.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Simplify scsi_log_(send|completion) by externalizing
scsi_mlreturn_string() and always print the command address.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Convert scsi_normalize_sense() and friends to return 'bool'
instead of an integer.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We should be using the standard dev_printk() variants for
sense code printing.
[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Resolve some missing-field-initializers warnings by using
designated initialization.
[hch: W=2 with modern gcc warns about this. Pretty pointless to me, but
I'd prefer to keep us warning free]
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Since we have the notion of a 'last' request in a chain, we can use
this to have the hardware optimize the issuing of requests. Add
a list_head parameter to queue_rq that the driver can use to
temporarily store hw commands for issue when 'last' is true. If we
are doing a chain of requests, pass in a NULL list for the first
request to force issue of that immediately, then batch the remainder
for deferred issue until the last request has been sent.
Instead of adding yet another argument to the hot ->queue_rq path,
encapsulate the passed arguments in a blk_mq_queue_data structure.
This is passed as a constant, and has been tested as faster than
passing 4 (or even 3) args through ->queue_rq. Update drivers for
the new ->queue_rq() prototype. There are no functional changes
in this patch for drivers - if they don't use the passed in list,
then they will just queue requests individually like before.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
To generate the right SPI tag messages we need to properly set
QUEUE_FLAG_QUEUED in the request_queue and mirror it to the
request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Cc: stable@vger.kernel.org
|
|
Some ATA drivers need the dma drain size workaround, and thus need to
call blk_mq_start_request before the S/G mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Allow blk-mq to pass an argument to the timeout handler to indicate
if we're timing out a reserved or regular command. For many drivers
those need to be handled different.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we've changed the driver API on the submission side use the
opportunity to fix up the name on the completion side to fit into the
general scheme.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When we call blk_mq_start_request from the core blk-mq code before calling into
->queue_rq there is a racy window where the timeout handler can hit before we've
fully set up the driver specific part of the command.
Move the call to blk_mq_start_request into the driver so the driver can start
the request only once it is fully set up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Pass an explicit parameter for the last request in a batch to ->queue_rq
instead of using a request flag. Besides being a cleaner and non-stateful
interface this is also required for the next patch, which fixes the blk-mq
I/O submission code to not start a time too early.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When ending a bi-directionional SCSI request, blk_finish_request()
cleans up and frees the request, but scsi_release_bidi_buffers() tries
to indirect through the request to find it's data buffers. This causes
a panic due to a null pointer dereference.
Move the call to scsi_release_bidi_buffers() before the call to
blk_finish_request().
Signed-off-by: Daniel Gryniewicz <dang@linuxbox.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Add a use_cmd_list flag in struct Scsi_Host to request keeping track of
all outstanding commands per device.
Default behaviour is not to keep track of cmd_list per sdev, as this may
introduce lock contention. (overhead is more on multi-node NUMA.), and
only enable it on the two drivers that need it.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.
For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request. It may fail if the queue is dead, or the caller was
unwilling to wait.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch fixes code such as the following with scsi-mq enabled:
rq = blk_get_request(...);
blk_rq_set_block_pc(rq);
rq->cmd = my_cmd_buffer; /* separate CDB buffer */
blk_execute_rq_nowait(...);
Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only). Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The latest kernel fails to boot qemu arm images when using scsi
for disk access. Boot gets stuck after the following messages.
brd: module loaded
sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3
Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to
atomic_t"). Code inspection shows the following suspicious change
in scsi_request_fn.
out_delay:
- if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+ if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}
'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)',
meaning the logic was reversed. Changing this expression to
'!atomic_read(&sdev->device_busy)' fixes the problem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The latest kernel fails to boot qemu arm images when using scsi
for disk access. Boot gets stuck after the following messages.
brd: module loaded
sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3
Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to
atomic_t"). Code inspection shows the following suspicious change
in scsi_request_fn.
out_delay:
- if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+ if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}
'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)',
meaning the logic was reversed. Changing this expression to
'!atomic_read(&sdev->device_busy)' fixes the problem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 71e75c97f97a
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.
Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.
In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation. The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure. This completely avoids
dynamic memory allocations for the fast path through I/O submission.
Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one. This only
affects drivers actually using the block layer provided tag allocator
instead of their own. Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.
For now the blk-mq path is disable by defauly and must be enabled using
the "use_blk_mq" module parameter. Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.
Based on the earlier scsi-mq prototype by Nicholas Bellinger.
Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Blk-mq drivers usually preallocate their S/G list as part of the request,
but if we want to support the very large S/G lists currently supported by
the SCSI code that would tie up a lot of memory in the preallocated request
pool. Add support to the scatterlist code so that it can initialize a
S/G list that uses a preallocated first chunks and dynamically allocated
additional chunks. That way the scsi-mq code can preallocate a first
page worth of S/G entries as part of the request, and dynamically extend
the S/G list when needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Replace the calls to the various blk_end_request variants with opencode
equivalents. Blk-mq is using a model that gives the driver control
between the bio updates and the actual completion, and making the old
code follow that same model allows us to keep the code more similar for
both paths.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
This saves us an atomic operation for each I/O submission and completion
for the usual case where the driver doesn't set a per-target can_queue
value. Only a few iscsi hardware offload drivers set the per-target
can_queue value at the moment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted. Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.
With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values. Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Avoid taking the queue_lock to check the per-device queue limit. Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.
Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it. Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
The blk-mq code path will set this to a different function, so make the
code simpler by setting it up in a legacy-request specific place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|
|
Make sure we only have the logic for requeing commands in one place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
|