{"id":52,"date":"2023-03-15T23:58:44","date_gmt":"2023-03-15T15:58:44","guid":{"rendered":"http:\/\/fudong.tech\/?p=52"},"modified":"2023-03-15T23:58:44","modified_gmt":"2023-03-15T15:58:44","slug":"life-of-a-scsi-command-scsi_cmnd-linux-kernel-4-0","status":"publish","type":"post","link":"http:\/\/fudong.tech\/?p=52","title":{"rendered":"Life of a SCSI Command (scsi_cmnd) &#8212; Linux kernel 4.0"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">\u622a\u53d6\u81ea Rajat Jain \u63d0\u4ea4\u7684patch\uff0c\u4f46\u6ce8\u610f\uff0c\u6b64\u6587\u6863\u672a\u88abmerge\uff01<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>           ==================================\n           Life of a SCSI Command (scsi_cmnd)\n           ==================================\n\n        Rajat Jain &lt;rajatja@google.com&gt; on 12-May-2015<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">(This document roughly matches the Linux kernel 4.0)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This documents describes the various phases of a SCSI command (struct scsi_cmnd)<br \/>lifecycle, as it flows though different parts of the SCSI mid level driver. It<br \/>describes under what conditions and how a scsi_cmnd may be aborted, or retried,<br \/>or scheduled for error handling, and how is it recovered, and in general how a<br \/>block request is handled by the SCSI mid level driver. It goes into detail about<br \/>what functions get called and the purpose for each one of them etc.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To help explain with an example, it takes example of a scsi_cmnd that goes<br \/>through it all &#8211; timeout, abort, error handling, retry (also results in<br \/>CHECK_CONDITION and gets sense info). The last section traces the path taken by<br \/>this example scsi_cmnd in its lifetime.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">TABLE OF CONTENTS<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[1] Lifecycle of a scsi_cmnd<br \/>[2] How does a scsi_cmnd get queued to the LLD for processing?<br \/>[3] How does a scsi_cmnd complete?<br \/>[3.1] Command completing via scsi_softirq_done()<br \/>[3.2] Command completing via scsi_times_out()$<br \/>[4] SCSI Error Handling<br \/>[4.1] How did we Get here?<br \/>[4.2] When does Error Handling actually run?<br \/>[4.3] SCSI Error Handler thread<br \/>[5] SCSI Commands can be &#8220;hijacked&#8221;<br \/>[6] SCSI Command Aborts<br \/>[6.1] When would mid level try to abort a command?<br \/>[6.2] How SCSI command abort works?<br \/>[6.3] Aborts can fail too<br \/>[7] SCSI command Retries<br \/>[7.1] When would mid level retry a command?<br \/>[7.2] Eligibility criteria for Retry<br \/>[8] Example: Following a scsi_cmnd (that results in CHECK_CONDITION)<br \/>[8.1] High level view of path taken by example scsi_cmnd<br \/>[8.2] Actual Path taken<br \/>[9] References<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Lifecycle of a scsi_cmnd SCSI Mid level interfaces with the block layer just like any other block<br \/>driver. For each block device that SCSI ML adds to the system, it indicates<br \/>a bunch of functions to serve the corresponding request queue. The following functions are relevant to the scsi_cmnd in its lifetime. Note<br \/>that depending on the situations, it may not go thourgh some of these<br \/>stages, or may have to go through some stages multiple times. scsi_prep_fn()<br \/>is called by the blocklayer to prepare the request. This<br \/>function actually allocates a new scsi_cmnd for the request (from<br \/>scsi_host-&gt;cmd_pool) and sets it up. This is where a scsi_smnd is &#8220;born&#8221;.<br \/>Note, a new scsi_cmnd is allocated only if the blk req did not already have<br \/>one associated with it (req-&gt;special != NULL). A req may already have a<br \/>scsi_cmnd if the req was tried by SCSI earlier, and it resulted in a<br \/>decision to retry later (and hence req was put back on the queue). scsi_request_fn()<br \/>is the actual function to serve the request queue. It basically checks<br \/>whether the host is ready for new commands, and if so, it submits it to the<br \/>LLD:<br \/>scsi_request_fn()<br \/>-&gt;scsi_dispatch_cmd()<br \/>-&gt;hostt-&gt;queue_command()<br \/>In case a scsi_cmnd could not be queued to LLD for some reason, the req<br \/>is put back on the original request queue (for retry later). scsi_softirq_done()<br \/>is the handler that gets called once the LLD indicates command completed.<br \/>scsi_done()<br \/>-&gt;blk_complete_request()<br \/>-&gt;causes softirq<br \/>-&gt;blk_done_softirq()<br \/>-&gt;scsi_softirq_done()<br \/>The most important goal of this function is to determine the course of<br \/>further action for this req (based on the scsi_cmnd-&gt;result and sense data<br \/>if present), and take that course. The options could be to finish off the<br \/>request to block layer, requeue it to block layer, or schedule it for error<br \/>handling (if that is deemed necessary). This is discussed in much detail<br \/>later. scsi_times_out()<br \/>is the function that gets called if the LLD does not respond with the<br \/>result of a scsi_cmnd for a long time, and a time out happens. It tries<br \/>to see if the situation can be fixed by LLD timeout handlers (if available)<br \/>or aborting the commands. If not, it schedules the commands for EH<br \/>(discussed at length later). scsi_unprep_fn()<br \/>is the function that gets called to unprepare the request. It is supposed<br \/>to undo whatever scsi_prep_fn() does.<\/li>\n\n\n\n<li>How does a scsi_cmnd get queued to the LLD for processing? The submission part is very simple. Once the scsi_request_fn() gets called<br \/>for a block request and it picks up a new block request via<br \/>blk_peek_request(), the scsi_cmnd has already been setup and is ready to be<br \/>sent to the LLD:<br \/>scsi_request_fn()<br \/>-&gt;scsi_dispatch_cmd()<br \/>-&gt;hostt-&gt;queue_command()<\/li>\n\n\n\n<li>How does a scsi_cmnd complete? Once a scsi_cmnd is submitted to the LLD, there are only 2 ways it can get<br \/>completed: a. Either the LLD responds in time.<br \/>(i.e. resulting in scsi_softirq_done() for the command) b. Or, the LLD does not respond in time and a timeout out occurred<br \/>(i.e. resulting in scsi_times_out() for the command) We discuss both these cases below. Note 1: There may be scsi_cmnd(s) that are re-tried. But completion of a<br \/>re-tried scsi_cmnd is not any different than the completion of a new<br \/>scsi_cmnd. Thus irrespective of retries, the scsi_cmnds will always end up<br \/>in using one of the above 2 scenarios. Note 2: A scsi_cmnd may be &#8220;highjacked&#8221; during error handling in<br \/>scsi_send_eh_cmnd(), to send one of the EH commands (TUR \/ STU \/<br \/>REQUEST_SENSE). However, the completion of these EH commands does not land up<br \/>in the above two scenarios. This is the only exception. Once the scsi_cmnd is<br \/>&#8220;un-hijacked&#8221;, the result of this original scsi_cmnd will still go through<br \/>the same 2 scenarios. 3.1 Command completing via scsi_softirq_done() This is the case when the LLD responded in time i.e. completed the command.<br \/>Note that here &#8220;completed&#8221; does not mean that the command was successfully<br \/>completed. In fact it could have been the case, that the SCSI host hardware<br \/>may have failed without even accepting the command. However, the fact that<br \/>scsi_softir_done() was called, indicates that there is a &#8220;result&#8221; available<br \/>in a timely fashion. And we&#8217;ll have to examine this result in order to<br \/>decide the next course of action. scsi_softirq_done()<br \/>|<br \/>+&#8212;&gt; scsi_decide_disposition()<br \/>| Takes a look at the scsi_cmnd-&gt;result and sense data to determine<br \/>| what is the best course of action to take. While reading this<br \/>| function code, one should not confuse SUCCESS as meaning the command<br \/>| was successful, or FAILED to mean the command failed etc. The return<br \/>| value of this function merely indicates the course of action to take<br \/>|<br \/>+&#8212;&gt; case SUCCESS:<br \/>| (Finish off the command to block layer. For e.g, the device may be<br \/>| offline, and hence complete the command &#8211; the block layer may retry<br \/>| on its own later, but that doesn&#8217;t concern the SCSI ML)<br \/>| |<br \/>| +&#8212;&gt; scsi_finish_command()<br \/>| |<br \/>| +&#8212;&gt; scsi_io_completion() (*see note below)<br \/>| |<br \/>| +&#8212;&gt; blk_finish_request()<br \/>|<br \/>+&#8212;&gt; case RETRY\/ADD_TO_MLQUEUE:<br \/>| (Requeue the command to request queue. For e.g. the device HW was<br \/>| busy, and thus SCSI ML knows that retrying may help)<br \/>| |<br \/>| +&#8212;&gt; scsi_queue_insert()<br \/>| |<br \/>| +&#8212;&gt; blk_requeue_request()<br \/>|<br \/>+&#8212;&gt; case FAILED\/default:<br \/>(Schedule the scsi_cmnd for EH. For e.g. there was a bus error that<br \/>might need bus reset. Or we got CHECK_CONDITION and we need to issue<br \/>REQ_SENSE to get more info about the failure. etc)<br \/>|<br \/>+&#8212;&gt; scsi_eh_scmd_add()<br \/>Add scsi_cmnd to the host EH queue<br \/>scsi_eh_wakeup() Note 3:<br \/>The scsi_io_completion() has a secondary logic similar to<br \/>scsi_decide_disposition() in that it also looks at result and sense data<br \/>and figures out what to do with request. It makes similar choices on the<br \/>course of action to take. There is a special case in this function that<br \/>involves &#8220;unprepping&#8221; a scsi_cmnd before requeuing it, and we&#8217;ll discuss<br \/>it in sections below. 3.2 Command completing via scsi_times_out() This happens when the LLD does not repond in time, the block layer times<br \/>out, and as a result calls the timeout function for the request queue for<br \/>the SCSI device in question. scsi_times_out()<br \/>|<br \/>+&#8212;&gt; scsi_transport_template-&gt;eh_timed_out() &#8211; Successful? If not\u2026<br \/>| (Gives transportt a chance to deal with it)<br \/>|<br \/>+&#8212;&gt; scsi_host_template-&gt;eh_timed_out() &#8211; Successful? If not\u2026<br \/>| (Gives hostt a chance to deal with it)<br \/>|<br \/>+&#8212;&gt; scsi_abort_command() &#8211; Successful? If not\u2026<br \/>| (Schedule an ABORT of the scsi_cmnd. The abort handler will also<br \/>| requeue it if needed)<br \/>|<br \/>+&#8212;&gt; scsi_eh_scmd_add()<br \/>(Schedule the scsi_cmnd for EH. This&#8217;ll definitely work. Because if it<br \/>doesn&#8217;t work, the EH handler will mark the device as offline, which<br \/>counts as a good fix :-))<\/li>\n\n\n\n<li>SCSI Error Handling SCSI Error handling should be thought of the action the mid level decides to<br \/>take when it knows that merely retrying a request may not help, and it needs<br \/>to do something else (possibly disruptive) in order to fix the issue. For<br \/>e.g. a stalled host may require a host reset, and only after that a retry of<br \/>the request may complete. Note 4:<br \/>(Random thoughts): Contrast the &#8220;Error Handling&#8221; with &#8220;Retries&#8221;. A Retry<br \/>is a normal thing to do, when the mid level believes that it has seen an<br \/>error which is transient in nature, and will go away on its own without<br \/>explicitly doing anything. Thus a retry of a request again makes sense in<br \/>this case. (On the other hand a cmnd is scheduled for EH, when it knows<br \/>that it needs to do &#8220;something&#8221; before a retrying a cmnd can give good<br \/>results). Note 5:<br \/>The SCSI mid level maintains a (per-host) list of all the scsi_cmnd(s)<br \/>that have been scheduled for EH at that host using scsi_host-&gt;eh_cmd_q.<br \/>This is the list that gets processed by the EH thread, when it runs. 4.1 How did we Get here?<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">A scsi_cmnd could be marked for EH in the following cases:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The command &#8220;error completed&#8221; i.e. scsi_decide_disposition() returned<br \/>FAILED or something that indicates a failure that requires some sort of<br \/>error recovery. E.g. device hardware failed, or we have a CHECK_CONDITION.<br \/>scsi_softirq_done()<br \/>-&gt;scsi_decide_disposition = FAILED<br \/>-&gt;scsi_eh_scmd_add()<\/li>\n\n\n\n<li>A scsi_cmnd timed out, and attempt to abort it fails.<br \/>scsi_times_out()<br \/>-&gt;scsi_abort_command() != SUCCESS<br \/>-&gt;scsi_eh_scmd_add() 4.2 When does Error Handling actually run?<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">A SCSI error handler thread is scheduled whenever there is a scsi_smnd that<br \/>is marked for EH (inserted in the Scsi_Host-&gt;eh_cmd_q). Once a scsi_cmnd is<br \/>marked for EH, the ML does not accept any more scsi_cmnds for that<br \/>particular Scsi_Host. However, the EH thread does not actually run until all<br \/>the pending IOs to the LLD for that particular Scsi_Host have either<br \/>completed or failed. In other words, the only commands pending at the LLD<br \/>for that host are the ones that need EH (host_busy == host_failed).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The idea is to quiesce the bus, so that EH thread can recover the devices,<br \/>as it may require to reset different components in order to do its job.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4.3 SCSI Error Handler thread<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_error_handler()<br \/>|<br \/>+&#8212;&gt; transportt-&gt;eh_strategy_handler() if exists, else\u2026<br \/>| (Use transportt&#8217;s own error recovery handler, if available)<br \/>|<br \/>+&#8212;&gt; scsi_unjam_host()<br \/>| (The SCSI ML error handler described below. Also described in<br \/>| Documentation\/scsi\/scsi_eh.txt. Basic goal is to do whatever<br \/>| needs to recover from the current error condition. And requeue the<br \/>| eligible commands after recovery)<br \/>|<br \/>+&#8212;&gt; scsi_restart_operations()<br \/>(Restart the operations of the SCSI request queue)<br \/>|<br \/>+&#8212;&gt; scsi_run_host_queues()<br \/>|<br \/>+&#8212;&gt; scsi_run_queue()<br \/>|<br \/>+&#8212;&gt; blk_run_queue()<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_unjam_host()<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">The idea is to create 2 lists: work_q, done_q.<br \/>Initially, work_q = , done_q = NULL<br \/>And then error handle all the requests in work_q by taking sequentially<br \/>higher severity action items that may recover the cmnd or device. Keep<br \/>moving the requests from work_q to done_q and in the end finish them all<br \/>in one go rather than individually finishing them up.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_unjam_host()<br \/>|<br \/>+&#8211;&gt; Create 2 lists: work_q, done_q<br \/>| work_q = , done_q = NULL<br \/>|<br \/>+&#8211;&gt; scsi_eh_get_sense() &#8211; Are we done? if not\u2026<br \/>| (For the commands that have CHECK_CONDITION, get sense_info)<br \/>| |<br \/>| +&#8211;&gt; scsi_request_sense()<br \/>| | (Use scsi_send_eh_cmnd() to send a &#8220;hijacked&#8221; REQ_SENSE cmnd)<br \/>| |<br \/>| +&#8211;&gt; scsi_decide_disposition()<br \/>| |<br \/>| +&#8211;&gt; Arrange to finish the scsi_cmnd if SUCCESS (by setting<br \/>| retries=allowed)<br \/>|<br \/>+&#8211;&gt; scsi_eh_abort_cmds() &#8211; Are we done? If not\u2026<br \/>| (Abort the commands that had timed out)<br \/>| |<br \/>| +&#8211;&gt; scsi_try_to_abort_cmd()<br \/>| | (Results in call to hostt-&gt;eh_abort_handler() which is responsible<br \/>| | making the LLD and the HW forget about the scsi_cmnd)<br \/>| |<br \/>| +&#8211;&gt; scsi_eh_test_devices()<br \/>| (Test if the device is responding now by sending appropriate EH<br \/>| commands (STU \/ TEST_UNIT_READY). Again, sending these EH<br \/>| commands involves highjacking the original scsi_cmnd, and later<br \/>| restoring the context)<br \/>|<br \/>+&#8211;&gt; scsi_eh_ready_devs() &#8211; Are we done? if not\u2026<br \/>| (Take increasing order of higher severity actions in order to recover)<br \/>| |<br \/>| +&#8211;&gt; scsi_eh_bus_device_reset()<br \/>| | (Reset the scsi_device. Results in call to<br \/>| | hostt-&gt;eh_device_reset_handler())<br \/>| |<br \/>| +&#8211;&gt; scsi_eh_target_reset()<br \/>| | (Reset the scsi_target. Results in call to<br \/>| | hostt-&gt;eh_target_reset_handler())<br \/>| |<br \/>| +&#8211;&gt; scsi_eh_bus_reset()<br \/>| | (Reset the scsi_device. Results in call to<br \/>| | hostt-&gt;eh_bus_reset_handler())<br \/>| |<br \/>| +&#8211;&gt; scsi_eh_host_reset()<br \/>| | (Reset the Scsi_Host. Results in call to<br \/>| | hostt-&gt;eh_host_reset_handler())<br \/>| |<br \/>| +&#8211;&gt; If nothing has worked &#8211; scsi_eh_offline_sdevs()<br \/>| (The device is not recoverable, put it offline)<br \/>|<br \/>+&#8211;&gt; scsi_eh_flush_done_q()<br \/>(For all the EH commands on the done_q, either requeue them (via<br \/>scsi_queue_insert()) if eligible, or finish them up to block layer<br \/>(via scsi_finish_command())<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note 6:<br \/>At each recovery stage we test if we are done (using<br \/>scsi_eh_test_devices()), and take the next severity action only if needed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note 7:<br \/>The error handler takes care that for multiple scsi_cmnds that can be<br \/>recovered by resetting the same component (e.g. same scsi_device), the<br \/>device is reset only once.<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"5\">\n<li>SCSI Commands can be &#8220;hijacked&#8221; As seen above, the EH thread may need to send some EH commands in order to<br \/>check the health and responsiveness of the SCSI device:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>TUR &#8211; Test Unit Ready<\/li>\n\n\n\n<li>STU &#8211; Start \/ Stop Unit<\/li>\n\n\n\n<li>REQUEST_SENSE &#8211; To get the Sense data in response to CHECK_CONDITION However instead of allocating and setting up a new scsi_cmnd for such<br \/>temporary purposes, the EH thread hijacks- the current scsi_cmnd that it is<br \/>trying to recover, in order to send the EH commands. This whole process is<br \/>done in scsi_send_eh_cmnd(). The scsi_send_eh_cmnd saves a context of the current command before hijacking<br \/>it, replaces the scsi_done ptr with its own before dipatching it to the LLD,<br \/>and restores the context later once it is done. The EH commands sent in this<br \/>manner are subject to the same problems of timeouts \/ abort failures \/<br \/>completions &#8211; but they do not take the route taken by normal commands (i.e.<br \/>don&#8217;t take the scsi_softirq_done() or scsi_times_out() route). Every<br \/>thing is handled within scsi_send_eh_cmnd(). This is discussed in following<br \/>sections.<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\" start=\"5\">\n<li>SCSI Command Aborts It refers to the scenario where the SCSI mid level wants to have the LLD<br \/>driver and the hardware below it forget everything about a scsi_cmnd that<br \/>was given to the LLD earlier. The most common reason is that the LLD failed<br \/>to respond in time. 6.1 When would mid level try to abort a command?<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">The SCSI ML may try to abort a scsi_cmnd in the following conditions:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>SCSI mid layer times out on a command, and tried to abort it.<br \/>scsi_times_out()<br \/>-&gt; scsi_abort_command()<br \/>What happens if this abort fails? Schedule the command for EH.<\/li>\n\n\n\n<li>The EH thread tried to abort all the pending commands while trying to<br \/>unjam a host.<br \/>scsi_unjam_host()<br \/>-&gt; scsi_eh_abort_cmds() What happens if this abort fails? We move to higher severity recovery<br \/>steps (start resetting HW components etc) because that is likely to cause<br \/>both LLD and the HW forget aout those commands.<\/li>\n\n\n\n<li>This is a nasty one. During error recovery, the EH thread may &#8220;hijack&#8221;<br \/>a scsi_cmnd to send a EH command (TUR\/STU\/REQ_SENSE) to LLD using<br \/>scsi_send_eh_cmnd(). If such a &#8220;hijacked&#8221; EH command times out, the SCSI<br \/>EH thread will try to abort it.<br \/>scsi_send_eh_cmnd()<br \/>-&gt; scsi_abort_eh_cmnd()<br \/>-&gt; scsi_try_to_abort_cmd() What happens if this abort fails? Similar to the previous case, the<br \/>scsi_abort_eh_cmnd() will try to take higher severity actions (reset bus<br \/>etc) but will not send EH commands such as TUR etc again in order to<br \/>verify if the devices started to respond. 6.2 How SCSI command abort works?<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Unlike EH command like TUR, the ABORT is not a SCSI command that mid layer<br \/>driver sends to LLD. The LLD provides an eh_abort_handler() function<br \/>pointer that is used to abort the command. It is up to the LLD to do<br \/>whatever is needed to abort the command. It may require to send some<br \/>proprietary command to the HW, or fiddle some bits, or do whatever magic<br \/>is necessary.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">6.3 Aborts can fail too<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">As with other things, abort attempts can also fail. The SCSI mid layer does<br \/>the right thing in such situations as depicted in the section above.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note 8:<br \/>Once a block layer hands off a command to the SCSI subsystem, there is no<br \/>way currently for the block layer to cancel \/ abort a request. This needs<br \/>some work.<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"7\">\n<li>SCSI command Retries The SCSI mid level maintains no queues for the SCSI commands it is processing<br \/>(other than the EH command queue). Thus whenever the SCSI ML thinks it needs<br \/>to retry a command, it requeues the request back to the corresponding request<br \/>queue, so that the retries will be made &#8220;naturally&#8221; when the request function<br \/>picks up the next request for processing. When requing such requests back to the request queue, they are put at the<br \/>head so that they go before the other (existing) requests in that request<br \/>queue. 7.1 When would mid level retry a command?<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Following are the conditions that will cause a SCSI command to be retried<br \/>(by putting the blk request back at the request queue):<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Mid layer times out on a scsi_cmnd, aborts it successfully, and requeues<br \/>it.<br \/>scsi_times_out()<br \/>-&gt; scsi_abort_command()<br \/>-&gt; schedules scmd_eh_abort_handler()<br \/>-&gt; scsi_queue_insert()<br \/>-&gt; blk_requeue_request()<\/li>\n\n\n\n<li>EH thread, after recovering a host, requeues back all the scsi_cmnds that<br \/>are eligible for a retry:<br \/>scsi_error_handler()<br \/>-&gt; scsi_unjam_host()<br \/>-&gt; scsi_eh_flush_done_q()<br \/>-&gt; scsi_queue_insert()<br \/>-&gt; blk_requeue_request()<\/li>\n\n\n\n<li>LLD completes the scsi_cmnd, and scsi_decide_disposition() looks at the<br \/>scsi_cmnd-&gt;result and thinks it needs to be retried (For e.g. because the<br \/>bus was busy).<br \/>scsi_softirq_done()<br \/>-&gt; scsi_decide_disposition() returns NEEDS_RETRY<br \/>-&gt; scsi_queue_insert()<br \/>-&gt; blk_requeue_request()<\/li>\n\n\n\n<li>In the scsi_request_fn(), the SCSI ML finds out that the host is busy and<br \/>the scsi_cmnd could not be sent to the LLD, hence it requeues the req<br \/>back on the queue.<br \/>scsi_request_fn()<br \/>-&gt; case note_ready:<br \/>-&gt; blk_requeue_request()<\/li>\n\n\n\n<li>scsi_finish_command() that is called from a variety of places to finish<br \/>off a request to the block level. However, it calls scsi_io_completion()<br \/>that may look at the request and decide to retry it (if it qualifies).<br \/>scsi_finish_command()<br \/>-&gt; scsi_io_completion()<br \/>-&gt; __scsi_queue_insert()<br \/>-&gt; blk_requeue_request() Note 9:<br \/>The case 5 above has a very special case. There may be some cases where<br \/>the scsi_io_completion() decides that a blk request has to be retried,<br \/>however the scsi_cmnd for this req should be relased and instead a new<br \/>scsi_cmnd should be allocated and used for this request at the next<br \/>retry. This can be the case for e.g. if it sees an ILLEGAL REQUEST as a<br \/>response to a READ10 command, and thinks that it may be because the<br \/>device supports only READ6. Thus it may make sense to switch to READ6<br \/>(hence a new scsi_cmnd) at the time of next retry. 7.2 Eligibility criteria for Retry<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Note that SCSI mid level always checks for retry eligibility before it goes<br \/>ahead and requeues the command for retries. The eligibility criteria for a<br \/>scsi_cmnd includes (some of these may not apply in all situations described<br \/>above):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>retries &lt; allowed (Num of retries should be less than allowed retries)<\/li>\n\n\n\n<li>no more than host-&gt;eh_deadline jiffies spent in EH.<\/li>\n\n\n\n<li>scsi_noretry_cmd() should return 0 for the command.<\/li>\n\n\n\n<li>scsi_device must be online<\/li>\n\n\n\n<li>req-&gt;timeout must not have expired<\/li>\n\n\n\n<li>etc.<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Example: Following a scsi_cmnd 8.1 High level view of path taken by example scsi_cmnd<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">We take the example of a block request that for example wants to read a<br \/>block off a scsi disk, how ever the LBA address is out of range for the<br \/>current device (hypothetically). The ML submits it to LLD, but the HW takes<br \/>the command and chokes on it (again hypothetically to trace through the<br \/>abort sequence). So the timeout happens and the ML aborts the<br \/>command, and requeues it. In the next run, the LLD completes the command<br \/>with CHECK_CONDITION. We assume that the SCSI host does not automatically<br \/>get the sense info. The ML schedules the cmnd for EH. The EH thread sends<br \/>the REQUEST_SENSE to get sense info ILLEGAL_REQUEST, and based on it<br \/>completes the request to the block layer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">8.2 Actual Path taken<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Dispatched:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_request_fn()<br \/>|<br \/>+&#8212;&gt; blk_peek_request()<br \/>| |<br \/>| +&#8212;&gt; scsi_prep_fn()<br \/>| (Allocate and setup scsi_cmnd)<br \/>|<br \/>+&#8212;&gt; scsi_dispatch_cmd()<br \/>|<br \/>+&#8212;&gt; hostt-&gt;queue_command()<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Times out:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_times_out()<br \/>|<br \/>+&#8212;&gt; scsi_abort_command() &#8211; returns SUCCESS<br \/>|<br \/>+&#8212;&gt; queue_delayed_work(abort_work)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Abort Handler:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scmd_eh_abort_handler()<br \/>|<br \/>+&#8212;&gt; scsi_try_to_abort_cmd() &#8211; returns SUCCESS<br \/>| |<br \/>| +&#8212;&gt; hostt-&gt;eh_abort_handler()<br \/>|<br \/>+&#8212;&gt; scsi_queue_insert()<br \/>|<br \/>+&#8212;&gt; __scsi_queue_insert()<br \/>|<br \/>+&#8212;&gt; blk_requeue_request()<br \/>(the req is requeued, with req-&gt;special pointing<br \/>to scsi_cmnd)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Request picked up again:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_request_fn()<br \/>|<br \/>+&#8212;&gt; blk_peek_request()<br \/>| (req-&gt;cmd_flags has REQ_DONTPREP set, so does not call<br \/>| scsi_prep_fn() again)<br \/>|<br \/>+&#8212;&gt; scsi_dispatch_cmd()<br \/>|<br \/>+&#8212;&gt; hostt-&gt;queue_command()<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Command is completed with a CHECK_CONDITION:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_softirq_done()<br \/>|<br \/>+&#8212;&gt; scsi_decide_disposition()<br \/>| (Sees the CHECK_CONDITION)<br \/>| |<br \/>| +&#8212;&gt; scsi_check_sense() &#8211; returns FAILED<br \/>| |<br \/>| +&#8212;&gt; scsi_command_normalize_sense()<br \/>| (Fails to find a valid sense data)<br \/>|<br \/>+&#8212;&gt; case FAILED:<br \/>|<br \/>+&#8212;&gt; scsi_eh_scmd_add()<br \/>Add scsi_cmnd to the host EH queue<br \/>|<br \/>+&#8212;&gt; scsi_eh_wakeup()<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The SCSI Error handler thread runs to get the sense info, and completes the<br \/>request once it is done.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">scsi_error_handler()<br \/>|<br \/>+&#8212;&gt; scsi_unjam_host()<br \/>|<br \/>+&#8212;&gt; scsi_eh_get_sense()<br \/>| |<br \/>| +&#8212;&gt; scsi_request_sense()<br \/>| | |<br \/>| | +&#8212;&gt; scsi_send_eh_cmnd()<br \/>| | (Highjacks the smnd to send EH command)<br \/>| | |<br \/>| | +&#8211;&gt; scsi_eh_prep_cmnd()<br \/>| | | (save context of the existing scsi_cmndi,<br \/>| | | allocates a sense buffer, and sets up the<br \/>| | | scsi_cmnd for REQUEST_SENSE)<br \/>| | |<br \/>| | +&#8211;&gt; hostt-&gt;queuecommand(), and then wait\u2026<br \/>| | | (gets the sense data for the cmnd)<br \/>| | |<br \/>| | +&#8211;&gt; scsi_eh_completed_normally() &#8211; returns SUCCESS<br \/>| | |<br \/>| | +&#8211;&gt; scsi_eh_restore_cmnd()<br \/>| | (restores the context of original scsi_cmnd)<br \/>| |<br \/>| +&#8212;&gt; scsi_decide_disposition() &#8211; returns SUCCESS<br \/>| | (This time can see the sense info)<br \/>| |<br \/>| +&#8212;&gt; Set scmd-&gt;retries = scmd-&gt;allowed (to avoid retries)<br \/>| |<br \/>| +&#8212;&gt; scsi_eh_finish_cmd()<br \/>| (Puts the scsi_cmnd on the done_q)<br \/>|<br \/>+&#8212;&gt; scsi_eh_flush_done_q()<br \/>(Sees that scsi_cmnd is not eligible for retries)<br \/>|<br \/>+&#8212;&gt; scsi_finish_command()<br \/>|<br \/>+&#8212;&gt; scsi_io_completion()<br \/>|<br \/>+&#8212;&gt; scsi_end_request()<br \/>|<br \/>+&#8212;&gt; scsi_put_command()<br \/>(Releases the scsi_cmnd)<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"9\">\n<li>References<br \/>==========<br \/>The following are excellent sources of references:<br \/>Documentation\/scsi\/scsi_eh.txt<\/li>\n<\/ol>\n\n\n\n<h2 class=\"has-small-font-size wp-block-heading\">http:\/\/events.linuxfoundation.org\/sites\/events\/files\/slides\/SCSI-EH.pdf<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u539f\u6587\u6765\u6e90\uff1a<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">https:\/\/lwn.net\/Articles\/644318\/<br \/>https:\/\/lkml.org\/lkml\/2015\/5\/12\/853<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u622a\u53d6\u81ea Rajat Jain \u63d0\u4ea4\u7684patch\uff0c\u4f46\u6ce8\u610f\uff0c\u6b64\u6587\u6863\u672a\u88abmerge\uff01 (This document  [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[],"class_list":["post-52","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"http:\/\/fudong.tech\/index.php?rest_route=\/wp\/v2\/posts\/52","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/fudong.tech\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/fudong.tech\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/fudong.tech\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/fudong.tech\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=52"}],"version-history":[{"count":0,"href":"http:\/\/fudong.tech\/index.php?rest_route=\/wp\/v2\/posts\/52\/revisions"}],"wp:attachment":[{"href":"http:\/\/fudong.tech\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=52"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/fudong.tech\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=52"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/fudong.tech\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=52"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}