Life of a SCSI Command (scsi_cmnd) — Linux kernel 4.0

截取自 Rajat Jain 提交的patch，但注意，此文档未被merge！

           ==================================
           Life of a SCSI Command (scsi_cmnd)
           ==================================

        Rajat Jain <rajatja@google.com> on 12-May-2015

(This document roughly matches the Linux kernel 4.0)

This documents describes the various phases of a SCSI command (struct scsi_cmnd)
lifecycle, as it flows though different parts of the SCSI mid level driver. It
describes under what conditions and how a scsi_cmnd may be aborted, or retried,
or scheduled for error handling, and how is it recovered, and in general how a
block request is handled by the SCSI mid level driver. It goes into detail about
what functions get called and the purpose for each one of them etc.

To help explain with an example, it takes example of a scsi_cmnd that goes
through it all – timeout, abort, error handling, retry (also results in
CHECK_CONDITION and gets sense info). The last section traces the path taken by
this example scsi_cmnd in its lifetime.

TABLE OF CONTENTS

[1] Lifecycle of a scsi_cmnd
[2] How does a scsi_cmnd get queued to the LLD for processing?
[3] How does a scsi_cmnd complete?
[3.1] Command completing via scsi_softirq_done()
[3.2] Command completing via scsi_times_out()$
[4] SCSI Error Handling
[4.1] How did we Get here?
[4.2] When does Error Handling actually run?
[4.3] SCSI Error Handler thread
[5] SCSI Commands can be “hijacked”
[6] SCSI Command Aborts
[6.1] When would mid level try to abort a command?
[6.2] How SCSI command abort works?
[6.3] Aborts can fail too
[7] SCSI command Retries
[7.1] When would mid level retry a command?
[7.2] Eligibility criteria for Retry
[8] Example: Following a scsi_cmnd (that results in CHECK_CONDITION)
[8.1] High level view of path taken by example scsi_cmnd
[8.2] Actual Path taken
[9] References

Lifecycle of a scsi_cmnd SCSI Mid level interfaces with the block layer just like any other block
driver. For each block device that SCSI ML adds to the system, it indicates
a bunch of functions to serve the corresponding request queue. The following functions are relevant to the scsi_cmnd in its lifetime. Note
that depending on the situations, it may not go thourgh some of these
stages, or may have to go through some stages multiple times. scsi_prep_fn()
is called by the blocklayer to prepare the request. This
function actually allocates a new scsi_cmnd for the request (from
scsi_host->cmd_pool) and sets it up. This is where a scsi_smnd is “born”.
Note, a new scsi_cmnd is allocated only if the blk req did not already have
one associated with it (req->special != NULL). A req may already have a
scsi_cmnd if the req was tried by SCSI earlier, and it resulted in a
decision to retry later (and hence req was put back on the queue). scsi_request_fn()
is the actual function to serve the request queue. It basically checks
whether the host is ready for new commands, and if so, it submits it to the
LLD:
scsi_request_fn()
->scsi_dispatch_cmd()
->hostt->queue_command()
In case a scsi_cmnd could not be queued to LLD for some reason, the req
is put back on the original request queue (for retry later). scsi_softirq_done()
is the handler that gets called once the LLD indicates command completed.
scsi_done()
->blk_complete_request()
->causes softirq
->blk_done_softirq()
->scsi_softirq_done()
The most important goal of this function is to determine the course of
further action for this req (based on the scsi_cmnd->result and sense data
if present), and take that course. The options could be to finish off the
request to block layer, requeue it to block layer, or schedule it for error
handling (if that is deemed necessary). This is discussed in much detail
later. scsi_times_out()
is the function that gets called if the LLD does not respond with the
result of a scsi_cmnd for a long time, and a time out happens. It tries
to see if the situation can be fixed by LLD timeout handlers (if available)
or aborting the commands. If not, it schedules the commands for EH
(discussed at length later). scsi_unprep_fn()
is the function that gets called to unprepare the request. It is supposed
to undo whatever scsi_prep_fn() does.
How does a scsi_cmnd get queued to the LLD for processing? The submission part is very simple. Once the scsi_request_fn() gets called
for a block request and it picks up a new block request via
blk_peek_request(), the scsi_cmnd has already been setup and is ready to be
sent to the LLD:
scsi_request_fn()
->scsi_dispatch_cmd()
->hostt->queue_command()
How does a scsi_cmnd complete? Once a scsi_cmnd is submitted to the LLD, there are only 2 ways it can get
completed: a. Either the LLD responds in time.
(i.e. resulting in scsi_softirq_done() for the command) b. Or, the LLD does not respond in time and a timeout out occurred
(i.e. resulting in scsi_times_out() for the command) We discuss both these cases below. Note 1: There may be scsi_cmnd(s) that are re-tried. But completion of a
re-tried scsi_cmnd is not any different than the completion of a new
scsi_cmnd. Thus irrespective of retries, the scsi_cmnds will always end up
in using one of the above 2 scenarios. Note 2: A scsi_cmnd may be “highjacked” during error handling in
scsi_send_eh_cmnd(), to send one of the EH commands (TUR / STU /
REQUEST_SENSE). However, the completion of these EH commands does not land up
in the above two scenarios. This is the only exception. Once the scsi_cmnd is
“un-hijacked”, the result of this original scsi_cmnd will still go through
the same 2 scenarios. 3.1 Command completing via scsi_softirq_done() This is the case when the LLD responded in time i.e. completed the command.
Note that here “completed” does not mean that the command was successfully
completed. In fact it could have been the case, that the SCSI host hardware
may have failed without even accepting the command. However, the fact that
scsi_softir_done() was called, indicates that there is a “result” available
in a timely fashion. And we’ll have to examine this result in order to
decide the next course of action. scsi_softirq_done()
|
+—> scsi_decide_disposition()
| Takes a look at the scsi_cmnd->result and sense data to determine
| what is the best course of action to take. While reading this
| function code, one should not confuse SUCCESS as meaning the command
| was successful, or FAILED to mean the command failed etc. The return
| value of this function merely indicates the course of action to take
|
+—> case SUCCESS:
| (Finish off the command to block layer. For e.g, the device may be
| offline, and hence complete the command – the block layer may retry
| on its own later, but that doesn’t concern the SCSI ML)
| |
| +—> scsi_finish_command()
| |
| +—> scsi_io_completion() (*see note below)
| |
| +—> blk_finish_request()
|
+—> case RETRY/ADD_TO_MLQUEUE:
| (Requeue the command to request queue. For e.g. the device HW was
| busy, and thus SCSI ML knows that retrying may help)
| |
| +—> scsi_queue_insert()
| |
| +—> blk_requeue_request()
|
+—> case FAILED/default:
(Schedule the scsi_cmnd for EH. For e.g. there was a bus error that
might need bus reset. Or we got CHECK_CONDITION and we need to issue
REQ_SENSE to get more info about the failure. etc)
|
+—> scsi_eh_scmd_add()
Add scsi_cmnd to the host EH queue
scsi_eh_wakeup() Note 3:
The scsi_io_completion() has a secondary logic similar to
scsi_decide_disposition() in that it also looks at result and sense data
and figures out what to do with request. It makes similar choices on the
course of action to take. There is a special case in this function that
involves “unprepping” a scsi_cmnd before requeuing it, and we’ll discuss
it in sections below. 3.2 Command completing via scsi_times_out() This happens when the LLD does not repond in time, the block layer times
out, and as a result calls the timeout function for the request queue for
the SCSI device in question. scsi_times_out()
|
+—> scsi_transport_template->eh_timed_out() – Successful? If not…
| (Gives transportt a chance to deal with it)
|
+—> scsi_host_template->eh_timed_out() – Successful? If not…
| (Gives hostt a chance to deal with it)
|
+—> scsi_abort_command() – Successful? If not…
| (Schedule an ABORT of the scsi_cmnd. The abort handler will also
| requeue it if needed)
|
+—> scsi_eh_scmd_add()
(Schedule the scsi_cmnd for EH. This’ll definitely work. Because if it
doesn’t work, the EH handler will mark the device as offline, which
counts as a good fix :-))
SCSI Error Handling SCSI Error handling should be thought of the action the mid level decides to
take when it knows that merely retrying a request may not help, and it needs
to do something else (possibly disruptive) in order to fix the issue. For
e.g. a stalled host may require a host reset, and only after that a retry of
the request may complete. Note 4:
(Random thoughts): Contrast the “Error Handling” with “Retries”. A Retry
is a normal thing to do, when the mid level believes that it has seen an
error which is transient in nature, and will go away on its own without
explicitly doing anything. Thus a retry of a request again makes sense in
this case. (On the other hand a cmnd is scheduled for EH, when it knows
that it needs to do “something” before a retrying a cmnd can give good
results). Note 5:
The SCSI mid level maintains a (per-host) list of all the scsi_cmnd(s)
that have been scheduled for EH at that host using scsi_host->eh_cmd_q.
This is the list that gets processed by the EH thread, when it runs. 4.1 How did we Get here?

A scsi_cmnd could be marked for EH in the following cases:

The command “error completed” i.e. scsi_decide_disposition() returned
FAILED or something that indicates a failure that requires some sort of
error recovery. E.g. device hardware failed, or we have a CHECK_CONDITION.
scsi_softirq_done()
->scsi_decide_disposition = FAILED
->scsi_eh_scmd_add()
A scsi_cmnd timed out, and attempt to abort it fails.
scsi_times_out()
->scsi_abort_command() != SUCCESS
->scsi_eh_scmd_add() 4.2 When does Error Handling actually run?

A SCSI error handler thread is scheduled whenever there is a scsi_smnd that
is marked for EH (inserted in the Scsi_Host->eh_cmd_q). Once a scsi_cmnd is
marked for EH, the ML does not accept any more scsi_cmnds for that
particular Scsi_Host. However, the EH thread does not actually run until all
the pending IOs to the LLD for that particular Scsi_Host have either
completed or failed. In other words, the only commands pending at the LLD
for that host are the ones that need EH (host_busy == host_failed).

The idea is to quiesce the bus, so that EH thread can recover the devices,
as it may require to reset different components in order to do its job.

4.3 SCSI Error Handler thread

scsi_error_handler()
|
+—> transportt->eh_strategy_handler() if exists, else…
| (Use transportt’s own error recovery handler, if available)
|
+—> scsi_unjam_host()
| (The SCSI ML error handler described below. Also described in
| Documentation/scsi/scsi_eh.txt. Basic goal is to do whatever
| needs to recover from the current error condition. And requeue the
| eligible commands after recovery)
|
+—> scsi_restart_operations()
(Restart the operations of the SCSI request queue)
|
+—> scsi_run_host_queues()
|
+—> scsi_run_queue()
|
+—> blk_run_queue()

scsi_unjam_host()

The idea is to create 2 lists: work_q, done_q.
Initially, work_q = , done_q = NULL
And then error handle all the requests in work_q by taking sequentially
higher severity action items that may recover the cmnd or device. Keep
moving the requests from work_q to done_q and in the end finish them all
in one go rather than individually finishing them up.

scsi_unjam_host()
|
+–> Create 2 lists: work_q, done_q
| work_q = , done_q = NULL
|
+–> scsi_eh_get_sense() – Are we done? if not…
| (For the commands that have CHECK_CONDITION, get sense_info)
| |
| +–> scsi_request_sense()
| | (Use scsi_send_eh_cmnd() to send a “hijacked” REQ_SENSE cmnd)
| |
| +–> scsi_decide_disposition()
| |
| +–> Arrange to finish the scsi_cmnd if SUCCESS (by setting
| retries=allowed)
|
+–> scsi_eh_abort_cmds() – Are we done? If not…
| (Abort the commands that had timed out)
| |
| +–> scsi_try_to_abort_cmd()
| | (Results in call to hostt->eh_abort_handler() which is responsible
| | making the LLD and the HW forget about the scsi_cmnd)
| |
| +–> scsi_eh_test_devices()
| (Test if the device is responding now by sending appropriate EH
| commands (STU / TEST_UNIT_READY). Again, sending these EH
| commands involves highjacking the original scsi_cmnd, and later
| restoring the context)
|
+–> scsi_eh_ready_devs() – Are we done? if not…
| (Take increasing order of higher severity actions in order to recover)
| |
| +–> scsi_eh_bus_device_reset()
| | (Reset the scsi_device. Results in call to
| | hostt->eh_device_reset_handler())
| |
| +–> scsi_eh_target_reset()
| | (Reset the scsi_target. Results in call to
| | hostt->eh_target_reset_handler())
| |
| +–> scsi_eh_bus_reset()
| | (Reset the scsi_device. Results in call to
| | hostt->eh_bus_reset_handler())
| |
| +–> scsi_eh_host_reset()
| | (Reset the Scsi_Host. Results in call to
| | hostt->eh_host_reset_handler())
| |
| +–> If nothing has worked – scsi_eh_offline_sdevs()
| (The device is not recoverable, put it offline)
|
+–> scsi_eh_flush_done_q()
(For all the EH commands on the done_q, either requeue them (via
scsi_queue_insert()) if eligible, or finish them up to block layer
(via scsi_finish_command())

Note 6:
At each recovery stage we test if we are done (using
scsi_eh_test_devices()), and take the next severity action only if needed.

Note 7:
The error handler takes care that for multiple scsi_cmnds that can be
recovered by resetting the same component (e.g. same scsi_device), the
device is reset only once.

SCSI Commands can be “hijacked” As seen above, the EH thread may need to send some EH commands in order to
check the health and responsiveness of the SCSI device:

TUR – Test Unit Ready
STU – Start / Stop Unit
REQUEST_SENSE – To get the Sense data in response to CHECK_CONDITION However instead of allocating and setting up a new scsi_cmnd for such
temporary purposes, the EH thread hijacks- the current scsi_cmnd that it is
trying to recover, in order to send the EH commands. This whole process is
done in scsi_send_eh_cmnd(). The scsi_send_eh_cmnd saves a context of the current command before hijacking
it, replaces the scsi_done ptr with its own before dipatching it to the LLD,
and restores the context later once it is done. The EH commands sent in this
manner are subject to the same problems of timeouts / abort failures /
completions – but they do not take the route taken by normal commands (i.e.
don’t take the scsi_softirq_done() or scsi_times_out() route). Every
thing is handled within scsi_send_eh_cmnd(). This is discussed in following
sections.

SCSI Command Aborts It refers to the scenario where the SCSI mid level wants to have the LLD
driver and the hardware below it forget everything about a scsi_cmnd that
was given to the LLD earlier. The most common reason is that the LLD failed
to respond in time. 6.1 When would mid level try to abort a command?

The SCSI ML may try to abort a scsi_cmnd in the following conditions:

SCSI mid layer times out on a command, and tried to abort it.
scsi_times_out()
-> scsi_abort_command()
What happens if this abort fails? Schedule the command for EH.
The EH thread tried to abort all the pending commands while trying to
unjam a host.
scsi_unjam_host()
-> scsi_eh_abort_cmds() What happens if this abort fails? We move to higher severity recovery
steps (start resetting HW components etc) because that is likely to cause
both LLD and the HW forget aout those commands.
This is a nasty one. During error recovery, the EH thread may “hijack”
a scsi_cmnd to send a EH command (TUR/STU/REQ_SENSE) to LLD using
scsi_send_eh_cmnd(). If such a “hijacked” EH command times out, the SCSI
EH thread will try to abort it.
scsi_send_eh_cmnd()
-> scsi_abort_eh_cmnd()
-> scsi_try_to_abort_cmd() What happens if this abort fails? Similar to the previous case, the
scsi_abort_eh_cmnd() will try to take higher severity actions (reset bus
etc) but will not send EH commands such as TUR etc again in order to
verify if the devices started to respond. 6.2 How SCSI command abort works?

Unlike EH command like TUR, the ABORT is not a SCSI command that mid layer
driver sends to LLD. The LLD provides an eh_abort_handler() function
pointer that is used to abort the command. It is up to the LLD to do
whatever is needed to abort the command. It may require to send some
proprietary command to the HW, or fiddle some bits, or do whatever magic
is necessary.

6.3 Aborts can fail too

As with other things, abort attempts can also fail. The SCSI mid layer does
the right thing in such situations as depicted in the section above.

Note 8:
Once a block layer hands off a command to the SCSI subsystem, there is no
way currently for the block layer to cancel / abort a request. This needs
some work.

SCSI command Retries The SCSI mid level maintains no queues for the SCSI commands it is processing
(other than the EH command queue). Thus whenever the SCSI ML thinks it needs
to retry a command, it requeues the request back to the corresponding request
queue, so that the retries will be made “naturally” when the request function
picks up the next request for processing. When requing such requests back to the request queue, they are put at the
head so that they go before the other (existing) requests in that request
queue. 7.1 When would mid level retry a command?

Following are the conditions that will cause a SCSI command to be retried
(by putting the blk request back at the request queue):

Mid layer times out on a scsi_cmnd, aborts it successfully, and requeues
it.
scsi_times_out()
-> scsi_abort_command()
-> schedules scmd_eh_abort_handler()
-> scsi_queue_insert()
-> blk_requeue_request()
EH thread, after recovering a host, requeues back all the scsi_cmnds that
are eligible for a retry:
scsi_error_handler()
-> scsi_unjam_host()
-> scsi_eh_flush_done_q()
-> scsi_queue_insert()
-> blk_requeue_request()
LLD completes the scsi_cmnd, and scsi_decide_disposition() looks at the
scsi_cmnd->result and thinks it needs to be retried (For e.g. because the
bus was busy).
scsi_softirq_done()
-> scsi_decide_disposition() returns NEEDS_RETRY
-> scsi_queue_insert()
-> blk_requeue_request()
In the scsi_request_fn(), the SCSI ML finds out that the host is busy and
the scsi_cmnd could not be sent to the LLD, hence it requeues the req
back on the queue.
scsi_request_fn()
-> case note_ready:
-> blk_requeue_request()
scsi_finish_command() that is called from a variety of places to finish
off a request to the block level. However, it calls scsi_io_completion()
that may look at the request and decide to retry it (if it qualifies).
scsi_finish_command()
-> scsi_io_completion()
-> __scsi_queue_insert()
-> blk_requeue_request() Note 9:
The case 5 above has a very special case. There may be some cases where
the scsi_io_completion() decides that a blk request has to be retried,
however the scsi_cmnd for this req should be relased and instead a new
scsi_cmnd should be allocated and used for this request at the next
retry. This can be the case for e.g. if it sees an ILLEGAL REQUEST as a
response to a READ10 command, and thinks that it may be because the
device supports only READ6. Thus it may make sense to switch to READ6
(hence a new scsi_cmnd) at the time of next retry. 7.2 Eligibility criteria for Retry

Note that SCSI mid level always checks for retry eligibility before it goes
ahead and requeues the command for retries. The eligibility criteria for a
scsi_cmnd includes (some of these may not apply in all situations described
above):

retries < allowed (Num of retries should be less than allowed retries)
no more than host->eh_deadline jiffies spent in EH.
scsi_noretry_cmd() should return 0 for the command.
scsi_device must be online
req->timeout must not have expired
etc.

Example: Following a scsi_cmnd 8.1 High level view of path taken by example scsi_cmnd

We take the example of a block request that for example wants to read a
block off a scsi disk, how ever the LBA address is out of range for the
current device (hypothetically). The ML submits it to LLD, but the HW takes
the command and chokes on it (again hypothetically to trace through the
abort sequence). So the timeout happens and the ML aborts the
command, and requeues it. In the next run, the LLD completes the command
with CHECK_CONDITION. We assume that the SCSI host does not automatically
get the sense info. The ML schedules the cmnd for EH. The EH thread sends
the REQUEST_SENSE to get sense info ILLEGAL_REQUEST, and based on it
completes the request to the block layer.

8.2 Actual Path taken

Dispatched:

Times out:

scsi_times_out()
|
+—> scsi_abort_command() – returns SUCCESS
|
+—> queue_delayed_work(abort_work)

Abort Handler:

Request picked up again:

Command is completed with a CHECK_CONDITION:

The SCSI Error handler thread runs to get the sense info, and completes the
request once it is done.

References
==========
The following are excellent sources of references:
Documentation/scsi/scsi_eh.txt

http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

原文来源：

https://lwn.net/Articles/644318/
https://lkml.org/lkml/2015/5/12/853

Life of a SCSI Command (scsi_cmnd) — Linux kernel 4.0

http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

SCSI 命令 (scsi_cmnd)的生命周期 — 基于 Linux kernel v4.0

关于统信免费版YUM软件源问题

发表回复取消回复

http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

类似文章

发表回复 取消回复

发表回复取消回复