作者: ghad_wp_uadname

  • 关于统信免费版YUM软件源问题

    关于统信免费版YUM软件源问题

    最近想了解一下 UOS v20 ,因此通过官网下载了免费的 UFU 版,然后安装虚拟机进行测试。但执行 dnf/yum 更新时返回 401 错误无法完成。

    其中,龙蜥版(uniontechos-server-20-1050a-amd64-UFU.iso)可以看到如下错误信息:

    # dnf update
    UniontechOS 20 AppStream                                                                                                                                                                                173  B/s | 172  B     00:00
    Errors during downloading metadata for repository 'UniontechOS-20-AppStream':
      - Status code: 401 for https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/AppStream/x86_64/repodata/repomd.xml (IP: 61.54.25.98)
    Error: Failed to download metadata for repo 'UniontechOS-20-AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
    UniontechOS 20 BaseOS                                                                                                                                                                                   186  B/s | 172  B     00:00
    Errors during downloading metadata for repository 'UniontechOS-20-BaseOS':
      - Status code: 401 for https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/BaseOS/x86_64/repodata/repomd.xml (IP: 61.54.25.98)
    Error: Failed to download metadata for repo 'UniontechOS-20-BaseOS': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
    UniontechOS 20 UFU                                                                                                                                                                                      190  B/s | 172  B     00:00
    Errors during downloading metadata for repository 'UniontechOS-20-UFU':
      - Status code: 401 for https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/UFU/x86_64/repodata/repomd.xml (IP: 61.54.25.98)
    Error: Failed to download metadata for repo 'UniontechOS-20-UFU': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
    Ignoring repositories: UniontechOS-20-AppStream, UniontechOS-20-BaseOS, UniontechOS-20-UFU
    Dependencies resolved.
    Nothing to do.
    Complete!
    

    而 欧拉版(uniontechos-server-20-1050e-amd64-UFU.iso)的错误信息则是:

    # dnf update
    UnionTechOS-Server-20-1050-UFU                                                                                                                                                                          184  B/s | 172  B     00:00
    Errors during downloading metadata for repository 'UnionTechOS-Server-20-UFU':
      - Status code: 401 for https://euler-packages.chinauos.com/server-euler/fuyu/1050/UFU/x86_64/repodata/repomd.xml (IP: 61.54.25.98)
    Error: Failed to download metadata for repo 'UnionTechOS-Server-20-UFU': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
    

    根据上述信息,明显的均为相应的 baseurl 错误导致。但在执行测试前并未对文件做任何修改。因此检查相应的repo 文件,具体信息是:

    龙蜥版:
    # ls /etc/yum.repos.d/
    UniontechOS.repo
    
    # cat /etc/yum.repos.d/UniontechOS.repo
    [UniontechOS-$releasever-AppStream]
    name = UniontechOS $releasever AppStream
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/AppStream/$basearch
    enabled = 1
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-BaseOS]
    name = UniontechOS $releasever BaseOS
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/BaseOS/$basearch
    enabled = 1
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-UFU]
    name = UniontechOS $releasever UFU
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/UFU/$basearch
    enabled = 1
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-PowerTools]
    name = UniontechOS $releasever PowerTools
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/PowerTools/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-Plus]
    name = UniontechOS $releasever Plus
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/Plus/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-Extras]
    name = UniontechOS $releasever Extras
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/Extras/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-Update]
    name = UniontechOS $releasever Update
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/Update/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-HA]
    name = UniontechOS $releasever HighAvailability
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/HighAvailability/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    [UniontechOS-$releasever-OpenStack-U]
    name = UniontechOS $releasever OpenStack-Ussuri
    baseurl = https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/1050/OpenStack-U/$basearch
    enabled = 0
    username=$auth_u
    password=$auth_p
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-uos-release
    gpgcheck = 0
    skip_if_unavailable = 1
    
    欧拉版:
    # ls /etc/yum.repos.d/
    UnionTechOS-everything-x86_64.repo  UnionTechOS-modular-x86_64.repo  UnionTechOS-UFU-x86_64.repo  UnionTechOS-update-x86_64.repo  UnionTechOS-x86_64.repo
    
    # cat /etc/yum.repos.d/UnionTechOS-UFU-x86_64.repo
    #Copyright (c) [2019] Huawei Technologies Co., Ltd.
    #generic-repos is licensed under the Mulan PSL v1.
    #You can use this software according to the terms and conditions of the Mulan PSL v1.
    #You may obtain a copy of Mulan PSL v1 at:
    #    http://license.coscl.org.cn/MulanPSL
    #THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR
    #IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR
    #PURPOSE.
    #See the Mulan PSL v1 for more details.
    
    [UnionTechOS-Server-20-UFU]
    name=UnionTechOS-Server-20-$releasever-UFU
    baseurl=https://euler-packages.chinauos.com/server-euler/fuyu/$releasever/UFU/$basearch
    enabled=1
    gpgcheck=1
    gpgkey=https://euler-packages.chinauos.com/server-euler/fuyu/$releasever/UFU/$basearch/RPM-GPG-KEY-UnionTech
    username=$auth_u
    password=$auth_p
    
    
    
    # cat /etc/yum.repos.d/UnionTechOS-x86_64.repo
    #Copyright (c) [2019] Huawei Technologies Co., Ltd.
    #generic-repos is licensed under the Mulan PSL v1.
    #You can use this software according to the terms and conditions of the Mulan PSL v1.
    #You may obtain a copy of Mulan PSL v1 at:
    #    http://license.coscl.org.cn/MulanPSL
    #THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR
    #IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR
    #PURPOSE.
    #See the Mulan PSL v1 for more details.
    
    [UnionTechOS-Server-20]
    name=UnionTechOS-Server-20-$releasever
    baseurl=https://euler-packages.chinauos.com/server-euler/fuyu/$releasever/OS/$basearch
    enabled=1
    gpgcheck=1
    gpgkey=https://euler-packages.chinauos.com/server-euler/fuyu/$releasever/OS/$basearch/RPM-GPG-KEY-UnionTech
    username=$auth_u
    password=$auth_p
    
    。。。。

    直接尝试通过浏览器访问上述地址发现无法同样是 401 Authorization Required 。而且 龙蜥的相关地址均为 https://enterprise-c-packages.chinauos.com/server-enterprise-c/kongzi/*** ;欧拉的则为 https://euler-packages.chinauos.com/server-euler/fuyu/*** 。

    简单搜索一下,发现官方论坛其实就是一个已知问题 https://bbs.chinauos.com/en/post/14401 。 在龙蜥版中测试提及的办法,确实需要安装 yum-utils

    # rpm -ivh https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/Extras/x86_64/Packages/UnionTech-repos-ufu-1-2.uelc20.x86_64.rpm
    Retrieving https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/Extras/x86_64/Packages/UnionTech-repos-ufu-1-2.uelc20.x86_64.rpm
    warning: /var/tmp/rpm-tmp.0DMq1e: Header V4 RSA/SHA256 Signature, key ID 8df595ed: NOKEY
    error: Failed dependencies:
    	yum-utils is needed by UnionTech-repos-ufu-1-2.uelc20.x86_64
    

    通过上述地址可以找到当前最新的 yum-utilshttps://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/BaseOS/x86_64/Packages/yum-utils-4.0.21-11.uelc20.02.noarch.rpm ,发布时间为 07-Apr-2023 10:07;repo配置文件仍为 https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/Extras/x86_64/Packages/UnionTech-repos-ufu-1-2.uelc20.x86_64.rpm ,发布时间为 07-Apr-2023 18:30 。此时,直接安装上述最新版(截至 2023年5月28日)时ISO中相应的依赖软件包版本无法满足需求:

    # rpm -ivh https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/Extras/x86_64/Packages/UnionTech-repos-ufu-1-2.uelc20.x86_64.rpm https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/BaseOS/x86_64/Packages/yum-utils-4.0.21-11.uelc20.02.noarch.rpm
    Retrieving https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/Extras/x86_64/Packages/UnionTech-repos-ufu-1-2.uelc20.x86_64.rpm
    Retrieving https://enterprise-c-packages.chinauos.com/server-enterprise-c/ufu/kongzi/1050/BaseOS/x86_64/Packages/yum-utils-4.0.21-11.uelc20.02.noarch.rpm
    warning: /var/tmp/rpm-tmp.DYnYHL: Header V4 RSA/SHA256 Signature, key ID 8df595ed: NOKEY
    error: Failed dependencies:
    	dnf >= 4.7.0-6 is needed by yum-utils-4.0.21-11.uelc20.02.noarch
    	dnf-plugins-core = 4.0.21-11.uelc20.02 is needed by yum-utils-4.0.21-11.uelc20.02.noarch
    	python3-dnf >= 4.7.0-6 is needed by yum-utils-4.0.21-11.uelc20.02.noarch
    

    而在欧拉版本中,经过尝试发现可以使用 rpm 进行更新 UnionTech-repos-1.0-3.6.UFU.02.x86_64.rpm 为 https://euler-packages.chinauos.com/server-euler/ufu/fuyu/1050/everything/x86_64/Packages/UnionTech-repos-ufu-1-2.uel20.x86_64.rpm 后可以正常执行 dnf 命令。

    # rpm -ivh https://euler-packages.chinauos.com/server-euler/ufu/fuyu/1050/UFU/x86_64/Packages/UnionTech-repos-1.0-3.6.UFU.02.x86_64.rpm
    Retrieving https://euler-packages.chinauos.com/server-euler/ufu/fuyu/1050/UFU/x86_64/Packages/UnionTech-repos-1.0-3.6.UFU.02.x86_64.rpm
    warning: /var/tmp/rpm-tmp.afssOG: Header V4 RSA/SHA256 Signature, key ID 8df595ed: NOKEY
    Verifying...                          ################################# [100%]
    Preparing...                          ################################# [100%]
    	package UnionTech-repos-1:1.0-3.6.UFU.02.x86_64 is already installed
    
    
    # rpm -Uvh https://euler-packages.chinauos.com/server-euler/ufu/fuyu/1050/everything/x86_64/Packages/UnionTech-repos-ufu-1-2.uel20.x86_64.rpm
    Retrieving https://euler-packages.chinauos.com/server-euler/ufu/fuyu/1050/everything/x86_64/Packages/UnionTech-repos-ufu-1-2.uel20.x86_64.rpm
    warning: /var/tmp/rpm-tmp.3yJpS8: Header V4 RSA/SHA256 Signature, key ID 8df595ed: NOKEY
    Verifying...                          ################################# [100%]
    Preparing...                          ################################# [100%]
    Updating / installing...
       1:UnionTech-repos-ufu-1-2.uel20    ################################# [ 50%]
    Cleaning up / removing...
       2:UnionTech-repos-1:1.0-3.6.UFU.02 ################################# [100%]
    
    # dnf update
    UnionTechOS 1050 OS                                                                                                                                                                                     1.8 MB/s | 5.3 MB     00:02
    UnionTechOS 1050 everything                                                                                                                                                                             1.7 MB/s |  26 MB     00:15
    UnionTechOS 1050 UFU                                                                                                                                                                                    119 kB/s |  30 kB     00:00
    Dependencies resolved.
    ========================================================================================================================================================================================================================================
     Package                                                     Architecture                       Version                                                                   Repository                                               Size
    ========================================================================================================================================================================================================================================
    Installing:
     kernel                                                      x86_64                             4.19.90-2211.5.0.0178.22.uel20                                            UnionTechOS-1050-OS                                      47 M
    Upgrading:
     NetworkManager                                              x86_64                             1:1.26.2-13.uel20                                                         UnionTechOS-1050-OS                                     2.0 M
     NetworkManager-config-server                                noarch                             1:1.26.2-13.uel20                                                         UnionTechOS-1050-OS                                     8.9 k
     NetworkManager-help                                         noarch                             1:1.26.2-13.uel20                                                         UnionTechOS-1050-OS                                     835 k
     NetworkManager-libnm                                        x86_64                             1:1.26.2-13.uel20                                                         UnionTechOS-1050-OS                                     1.6 M
     PackageKit                                                  x86_64                             1.1.12-10.up3.uel20                                                       UnionTechOS-1050-OS                                     566 k
    。。。。。。。。。。。。。。。。。。。。
     rpm-plugin-systemd-inhibit                                  x86_64                             4.15.1-43.uel20                                                           UnionTechOS-1050-OS                                      14 k
    
    Transaction Summary
    ========================================================================================================================================================================================================================================
    Install   20 Packages
    Upgrade  564 Packages
    
    Total download size: 702 M
    Is this ok [y/N]:
    

    因此,对于龙蜥版可以直接备份后修改 repo文件。即将现有文件中的 **/server-enterprise-c/kongzi/** 替换为 **/server-enterprise-c/ufu/kongzi/** ,然后即可直接执行 dnf/yum 进行更新或安装额外软件包。而欧拉版则是 **/server-euler/fuyu/** 替换为 **/server-euler/ufu/fuyu/**

  • SCSI 命令 (scsi_cmnd)的生命周期 — 基于 Linux kernel v4.0

                ==================================
                  SCSI 命令 (scsi_cmnd)的生命周期
                ==================================
    
             Rajat Jain <rajatja@google.com> 于 2015 年 5 月 12 日

    翻译于 2023年3月15日

    | &&& 翻译说明 &&& |
    | 本文对应的原文在之前已向上游社区提交,但在当前的代码中并没有看到此文档。 |
    | 提交的信息可参考如下链接: |
    | https://lwn.net/Articles/644318/ |
    | https://lkml.org/lkml/2015/5/12/853 |
    | 或可参见 http://fudong.tech/?p=52 中截取的正文 |
    | |
    | 文中出现的一些名词说明: |
    | 主机: SCSI host,即对应的主机适配器。可能为 RAID 卡、FC HBA 等。 |
    |
    | 网页因模板排版原因,可能存在错乱,可参考末尾 PDF 文件

    (本文档大致匹配 Linux 内核 v4.0)

    本文档描述了 SCSI 命令 (struct scsi_cmnd) 生命周期的各个阶段,因为它包含了 SCSI中间层驱动程序(SCSI mid level driver)的不同部分。 它描述了在什么条件下以及如何中止、重试或安排 scsi_cmnd 进行错误处理,它是如何恢复(recover)的,以及 SCSI中间层驱动程序通常如何处理块请求。 它详细介绍了调用哪些函数以及每个函数的用途等。

    为了用一个例子来帮助解释,它以一个 scsi_cmnd 的例子为例,它经历了这一切——超时(timeout)、中止(abort)、错误处理(error handling)、重试(retry,也会导致 CHECK_CONDITION 并获取有意义的 sense 信息)。 最后一部分以示例 scsi_cmnd 跟踪在其生命周期中所可能出现的路径(不含底层部分)。

    目录

    [1] scsi_cmnd 的生命周期
    [2] scsi_cmnd 如何排队到 LLD(底层驱动程序)进行处理?
    [3] 一个 scsi_cmnd 是如何完成的?
    [3.1] 命令通过 scsi_softirq_done() 完成
    [3.2] 命令通过 scsi_times_out()$ 完成
    [4] SCSI 错误处理
    [4.1] 我们是怎么来到这里的?
    [4.2] 错误处理什么时候实际运行?
    [4.3] SCSI 错误处理程序线程
    [5] SCSI 命令可以被“劫持(hijacked)”
    [6] SCSI 命令中止
    [6.1]中间层何时会尝试中止命令?
    [6.2] SCSI 命令中止如何工作?
    [6.3] 中止也可能失败
    [7] SCSI 命令重试
    [7.1] 中级何时重试命令?
    [7.2] 重试资格标准
    [8] 示例:遵循 scsi_cmnd(导致 CHECK_CONDITION)
    [8.1] 示例 scsi_cmnd 所用路径的高级视图
    [8.2] 实际采取的路径
    [9] 参考资料

    1. scsi_cmnd 的生命周期 SCSI Mid level 与块层的接口就像任何其他块驱动程序一样。 对于 SCSI中间层添加到系统中的每个块设备,它指示了一堆函数来服务于相应的请求队列。 以下函数在其生命周期内与 scsi_cmnd 相关。请注意,根据实际情况,它可能不会经历其中一些阶段,或者可能必须多次经历某些阶段。 scsi_prep_fn()
      由块层调用以准备请求。 这个函数实际上为请求分配一个新的 scsi_cmnd(来自 scsi_host->cmd_pool)并设置它。 这是 scsi_smnd “诞生”的地方。
      请注意,只有当 blk req 还没有与之关联时(req->special != NULL),才会分配一个新的 scsi_cmnd。 如果 SCSI 较早尝试过该请求,则该请求可能已经具有 scsi_cmnd,并且它导致决定稍后重试(因此请求被放回队列中)。 scsi_request_fn()
      是为请求队列提供服务的实际功能函数。 它简单地检查主机是否准备好接受新命令,如果是,它会将其提交给 LLD:
      scsi_request_fn()
      ->scsi_dispatch_cmd()
      ->hostt->queue_command() 如果 scsi_cmnd 由于某种原因无法排队到 LLD,请求将放回原始请求队列(等待稍后重试)。 scsi_softirq_done()
      是在 LLD 反馈命令完成后调用的处理程序。
      scsi_done()
      ->blk_complete_request()
      ->导致软中断
      ->blk_done_softirq()
      ->scsi_softirq_done() 此函数最重要的目标是确定此请求的进一步操作过程(基于 scsi_cmnd->result 和感知数据(如果存在)),并执行该过程。 相应的选项可以是完成对块层的请求、将其重新排队到块层、或安排它进行错误处理(如果认为有必要)。 稍后将对此进行更详细的讨论。 scsi_times_out()
      是在 LLD 长时间未响应 scsi_cmnd 的结果并且发生超时时调用的函数。 它会尝试查看是否可以通过 LLD 超时处理程序(如果可用)或中止命令来修复这种情况。 如果不是,它会为 EH 安排命令(稍后详细讨论)。 scsi_unprep_fn()
      是被调用以取消准备请求的函数。 它应该撤消 scsi_prep_fn() 所做的任何事情。
    2. scsi_cmnd 如何排队到 LLD 进行处理? 提交部分非常简单。 一旦为块请求调用 scsi_request_fn() 并通过 blk_peek_request() 获取新的块请求,scsi_cmnd 已经设置并准备好发送到 LLD:
      scsi_request_fn()
      ->scsi_dispatch_cmd()
      ->hostt->queue_command()
    3. scsi_cmnd 是如何完成的? 将 scsi_cmnd 提交给 LLD 后,只有两种方法可以完成它: A。 要么 LLD 及时响应。
      (即 结果为通过 scsi_softirq_done() 继续处理命令) b. 或者,LLD没有及时响应,发生超时
      (即 结果为通过 scsi_times_out() 继续处理命令) 我们在下面讨论这两种情况。 注 1:
      可能有重试的 scsi_cmnd(s)。 但是重试的 scsi_cmnd 的完成与新 scsi_cmnd 的完成没有任何不同。 因此,无论重试的结果如何,scsi_cmnds 将始终使用上述两种情况之一结束。 注 2:
      在 scsi_send_eh_cmnd() 的错误处理期间,scsi_cmnd 可能被“劫持”,以发送 EH 命令之一 (TUR / STU / REQUEST_SENSE)。 但是,这些EH命令的完成并没有在以上两种场景中落地。 这是唯一的例外。 一旦 scsi_cmnd 被“解除劫持”,这个原始 scsi_cmnd 的结果仍将经历相同的 2 个场景之一。 3.1 命令通过 scsi_softirq_done() 完成 当 LLD 及时响应时就是这种情况,例如命令正常完成。
      请注意,这里的“完成”并不意味着命令已成功完成。事实上,SCSI 主机硬件甚至可能在没有接受命令的情况下发生故障。然而,scsi_softirq_done() 被调用的事实表明有及时可用的“结果”。我们必须检查这个结果才能决定下一步行动。 scsi_softirq_done()
      |
      +—> scsi_decide_disposition()
      | 查看 scsi_cmnd->result 和感知(sense)数据以确定采取的最佳行动方案。在阅读此函数代码时,不应将 SUCCESS 混淆为表示命令成功,或将 FAILED 混淆为表示命令失败等。此函数的返回值仅指示要采取的操作过程
      |
      +—> 结果为 SUCCESS 时:
      | (完成到块层的命令。例如,设备可能处于离线状态,因此完成命令 – 块层可能稍后自行重试,但这与 SCSI ML 无关)
      | |
      | +—> scsi_finish_command()
      | |
      | +—> scsi_io_completion()(*见下面的注释)
      | |
      | +—> blk_finish_request()
      |
      +—> 结果为 RETRY/ADD_TO_MLQUEUE 时:
      | (重新排队请求队列的命令。例如,设备硬件是忙的,因此 SCSI ML 知道重试可能会有帮助)
      |
      | |
      | +—> scsi_queue_insert()
      | |
      | +—> blk_requeue_request()
      |
      +—> 结果为 FAILED/default 时:
      ( 安排 scsi_cmnd 进入 EH 。例如,存在可能需要总线重置的总线错误。或者我们得到 CHECK_CONDITION,我们需要发出 REQ_SENSE 以获取有关故障的更多信息等)
      |
      +—> scsi_eh_scmd_add()
      将 scsi_cmnd 添加到主机 EH 队列
      scsi_eh_wakeup() 注3:
      scsi_io_completion() 有一个类似于 scsi_decide_disposition() 的辅助逻辑,因为它也查看结果和感知数据并确定如何处理请求。它在要采取的行动过程中做出类似的选择。 此函数中有一个特殊情况涉及在重新排队之前“取消准备(unprepping)”scsi_cmnd,我们将在下面的部分中讨论它。 3.2 命令通过 scsi_times_out() 完成 当 LLD 没有及时响应、块层超时,并因此调用相关 SCSI 设备的请求队列的超时函数时,就会发生这种情况。 scsi_times_out()
      |
      +—> scsi_transport_template->eh_timed_out() – 成功了吗? 如果不…
      | (给transportt一个处理的机会)
      |
      +—> scsi_host_template->eh_timed_out() – 成功了吗? 如果不…
      | (给hostt一个处理的机会)
      |
      +—> scsi_abort_command() – 成功了吗? 如果不…
      | (安排 scsi_cmnd 的 ABORT。如果需要,中止处理程序还将重新排队)
      |
      +—> scsi_eh_scmd_add()
      (为 EH 安排 scsi_cmnd。这肯定有效。因为如果它不起作用,EH 处理程序会将设备标记为离线,这算是一个很好的修复:-))
    4. SCSI 错误处理 SCSI 错误处理应该被认为是 ML 在知道仅仅重试请求可能无济于事并且需要做其他事情(可能是破坏性的)来解决问题时决定采取的行动。 例如 停滞的(stalled)主机可能需要主机重置,并且只有在此之后才能完成请求的重试。 注4:
      (随机想法):将“错误处理(Error Handling)”与“重试(Retries)”进行对比。重试是一件正常的事情,当中层认为它看到了一个本质上是暂时的错误,并且会自行消失而无需明确地做任何事情。因此在这种情况下再次重试请求是有意义的。(另一方面,一个 cmnd 被安排用于 EH,当它知道它需要在重试 cmnd 可以给出好的结果之前做“某事”时)。 注5:
      SCSI 中间层使用 scsi_host->eh_cmd_q 维护一个(每个主机)列表,其中包含已在该主机上为 EH 安排的所有 scsi_cmnd。
      这是 EH 线程在运行时处理的列表。 4.1 我们是如何到达这里的?

    在以下情况下,可以将 scsi_cmnd 标记为 EH:

    • 命令“错误完成(error completed)”,即 scsi_decide_disposition() 返回 FAILED 或表示需要某种错误恢复的失败的东西。 例如 设备硬件出现故障,或者我们有 CHECK_CONDITION。
      scsi_softirq_done()
      ->scsi_decide_disposition = FAILED
      ->scsi_eh_scmd_add()
    • scsi_cmnd 超时,尝试中止失败。
      scsi_times_out()
      ->scsi_abort_command() != SUCCESS
      ->scsi_eh_scmd_add() 4.2 错误处理什么时候真正运行?

    只要有标记为 EH 的 scsi_smnd(插入到 Scsi_Host->eh_cmd_q 中),就会调度 SCSI 错误处理程序线程。一旦 scsi_cmnd 被标记为 EH,ML 就不再接受该特定 Scsi_Host 的任何 scsi_cmnd。然而,EH 线程实际上并没有运行,直到该特定 Scsi_Host 的所有未决 IO 到 LLD 已完成或失败。换句话说,在该主机的 LLD 上挂起的唯一命令是需要 EH (host_busy == host_failed) 的命令。

    这个想法是停止总线(quiesce the bus),以便 EH 线程可以恢复设备,因为它可能需要重置不同的组件才能完成其工作。

    补注 1:
    这里的组件即 HBTL 四个层级。Host:Bus:Target:LUN 对应的 主机:总线:目标端:盘 四者倒序分别执行。
    在实际执行重置时,实际执行的操作除与实际问题相关外,还与相应的底层设备驱动(LLD)设计相关。

    4.3 SCSI 错误处理器线程


    scsi_error_handler()
    |
    +—> transportt->eh_strategy_handler() 如果存在,否则…
    | (如果可用,使用 transportt 自己的错误恢复处理程序)
    |
    +—> scsi_unjam_host()
    | (下面描述的 SCSI ML 错误处理程序。也在 Documentation/scsi/scsi_eh.txt 中描述。基本目标是执行任何需要从当前错误条件中恢复的操作。并在恢复后重新排队符合条件的命令)
    |
    +—> scsi_restart_operations()
    (重启SCSI请求队列的操作)
    |
    +—> scsi_run_host_queues()
    |
    +—> scsi_run_queue()
    |
    +—> blk_run_queue()

    scsi_unjam_host()


    这个想法是创建 2 个列表:work_q、done_q。
    最初,work_q = , done_q = NULL。然后依次递增采取可能恢复 cmnd 或设备的更高严重性的操作项,对 work_q 中的所有请求进行错误处理。不断将请求从 work_q 移动到 done_q,最后一次完成所有请求,而不是单独完成它们。

    scsi_unjam_host()
    |
    +–> 创建 2 个列表:work_q、done_q
    | work_q = <所有 EH scsi 命令>,done_q = NULL
    |
    +–> scsi_eh_get_sense() – 我们完成了吗? 如果不…
    | (对于有CHECK_CONDITION的命令,获取 sense_info)
    | |
    | +–> scsi_request_sense()
    | | (使用 scsi_send_eh_cmnd() 发送“被劫持的(hijacked)”REQ_SENSE 命令)
    | |
    | +–> scsi_decide_disposition()
    | |
    | +–> 如果成功(SUCCESS)则安排完成 scsi_cmnd(通过设置 retries=allowed)
    |
    +–> scsi_eh_abort_cmds() – 我们完成了吗? 如果不…
    | (中止超时的命令)
    | |
    | +–> scsi_try_to_abort_cmd()
    | | (导致调用 hostt->eh_abort_handler(),它负责使 LLD 和 HW 忘记 scsi_cmnd)
    | |
    | +–> scsi_eh_test_devices()
    | (通过发送适当的 EH 命令 (STU / TEST_UNIT_READY) 测试设备现在是否响应。同样,发送这些 EH 命令涉及劫持原始 scsi_cmnd,然后恢复上下文)
    |
    +–> scsi_eh_ready_devs() – 我们完成了吗? 如果不…
    | (采取递增的顺序进行更高严重性的操作以恢复)
    | |
    | +–> scsi_eh_bus_device_reset()
    | | (重置 scsi_device。导致调用 hostt->eh_device_reset_handler())
    | |
    | +–> scsi_eh_target_reset()
    | | (重置 scsi_target。导致调用 hostt->eh_target_reset_handler())
    | |
    | +–> scsi_eh_bus_reset()
    | | (重置 scsi_bus。导致调用 hostt->eh_bus_reset_handler())
    | |
    | +–> scsi_eh_host_reset()
    | | (重置 Scsi_Host。导致调用 hostt->eh_host_reset_handler())
    | |
    | +–> 如果没有任何效果 – scsi_eh_offline_sdevs()
    | (设备不可恢复,请下线)
    |
    +–> scsi_eh_flush_done_q()
    (对于 done_q 上的所有 EH 命令,如果符合条件,要么将它们重新排队(通过 scsi_queue_insert()),要么将它们完成到块层(通过 scsi_finish_command())

    注 6:
    在每个恢复阶段,我们均需要测试是否已完成本阶段的错误恢复(使用 scsi_eh_test_devices()),并仅在需要时才采取下一个更高严重性操作。

    注 7:
    错误处理程序会检查是否可以通过重置某一相同组件(例如,相同的 scsi_device)来恢复的多个 scsi_cmnd,如果可以则此组件只会重置一次。

    1. SCSI命令可以被“劫持(hijacked)” 如上所示,EH 线程可能需要发送一些 EH 命令才能检查 SCSI 设备的健康状况和响应能力:
    • TUR – Test Unit Ready,测试单元就绪
    • STU – Start / Stop Unit,启动/停止单元
    • REQUEST_SENSE – 获取响应 CHECK_CONDITION 的 Sense 数据 然而,EH 线程并没有为这种临时目的分配和设置新的 scsi_cmnd,而是劫持它试图恢复的当前 scsi_cmnd,以便发送 EH 命令。整个过程在 scsi_send_eh_cmnd() 中完成。 scsi_send_eh_cmnd 在劫持它之前保存当前命令的上下文,在将它分发到 LLD 之前用它自己的替换 scsi_done ptr,并在完成后恢复上下文。以这种方式发送的 EH 命令会遇到相同的超时/中止失败/完成 – 但它们不会采用普通命令所采用的路由(即不采用 scsi_softirq_done() 或 scsi_times_out() 路由)。每件事都在 scsi_send_eh_cmnd() 中处理。 这将在以下各节中讨论。
    1. SCSI 命令中止 它指的是 SCSI中间层想要让 底层驱动程序和它下面的硬件忘记之前提供给 LLD 的 scsi_cmnd 的所有内容的场景。最常见的原因是 LLD 未能及时响应。 6.1 中间层何时会尝试中止命令?

    在以下情况下,SCSI ML 可能会尝试中止 scsi_cmnd:

    1. SCSI 中间层命令超时,并试图中止它。
      scsi_times_out()
      -> scsi_abort_command()
      如果中止失败会怎样? 为 EH 安排命令。
    2. EH 线程在尝试解除主机阻塞时尝试中止所有挂起的命令。
      scsi_unjam_host()
      -> scsi_eh_abort_cmds() 如果中止失败会怎样?我们转向更高严重性的恢复步骤(开始重置 HW 组件等),因为这可能会导致 LLD 和 HW 忘记这些命令。
    3. 这是一个讨厌的事。在错误恢复期间,EH 线程可能会“劫持”scsi_cmnd 以使用 scsi_send_eh_cmnd() 向 LLD 发送 EH 命令(TUR/STU/REQ_SENSE)。 如果此类“劫持”EH 命令超时,SCSI EH 线程将尝试中止它。
      scsi_send_eh_cmnd()
      -> scsi_abort_eh_cmnd()
      -> scsi_try_to_abort_cmd() 如果中止失败会怎样?与前一种情况类似,scsi_abort_eh_cmnd() 将尝试采取更高严重性的操作(重置总线等),但不会再次发送 EH 命令(例如 TUR 等)以验证设备是否开始响应。 6.2 SCSI 命令中止如何工作?

    与像 TUR 这样的 EH 命令不同,ABORT 不是中间层驱动程序发送给 LLD 的 SCSI 命令。LLD 提供了一个 eh_abort_handler() 函数指针,用于中止命令。由 LLD 来做任何需要的事情来中止命令。 它可能需要向硬件发送一些专有命令,或者修改一些位,或者做任何必要的魔术更改。

    6.3 中止也可能失败


    与其他事情一样,中止尝试也可能失败。 SCSI 中间层在上一节中描述的情况下做正确的事情。

    注8:
    一旦块层将命令交给 SCSI 子系统,块层目前无法取消/中止请求。这需要一些工作。

    1. SCSI 命令重试 SCSI 中间层不为其正在处理的 SCSI 命令维护任何队列(EH 命令队列除外)。 因此,每当 SCSI ML 认为它需要重试命令时,它会将请求重新排队返回到相应的请求队列,以便在请求函数选择下一个请求进行处理时“自然地”进行重试。 当将此类请求要请求回到请求队列时,它们被放在请求队列的头部,以便它们在该队列中的其他(现有)请求之前。 7.1 中间层何时会重试命令?

    以下是导致重试 SCSI 命令的条件(通过将 blk 请求放回请求队列):

    1. 中间层在 scsi_cmnd 上超时,成功中止并重新排队。
      scsi_times_out()
      -> scsi_abort_command()
      -> 调度 scmd_eh_abort_handler()
      -> scsi_queue_insert()
      -> blk_requeue_request()
    2. EH 线程在恢复主机后,重新排队所有符合重试条件的 scsi_cmnd:
      scsi_error_handler()
      -> scsi_unjam_host()
      -> scsi_eh_flush_done_q()
      -> scsi_queue_insert()
      -> blk_requeue_request()
    3. LLD 完成 scsi_cmnd,scsi_decide_disposition() 查看 scsi_cmnd->result 并认为需要重试(例如,因为总线繁忙)。
      scsi_softirq_done()
      -> scsi_decide_disposition() 返回 NEEDS_RETRY
      -> scsi_queue_insert()
      -> blk_requeue_request()
    4. 在 scsi_request_fn() 中,SCSI ML 发现主机正忙,scsi_cmnd 无法发送到 LLD,因此它将请求重新排回队列。
      scsi_request_fn()
      ->案例note_ready:
      -> blk_requeue_request()
    5. scsi_finish_command() 可以从多个地方被调用以完成对块级别的请求。但是,它也可能会调用 scsi_io_completion() 来查看请求并决定重试(如果符合条件)。
      scsi_finish_command()
      -> scsi_io_completion()
      -> __scsi_queue_insert()
      -> blk_requeue_request() 注 9:
      上面的情况 5 有一个非常特殊的情况。在某些情况下,scsi_io_completion() 决定必须重试 blk 请求,但是应该释放此请求的 scsi_cmnd,而是应该分配一个新的 scsi_cmnd 并在下一次重试时用于此请求。例如,情况可能就是这样:如果它看到 ILLEGAL REQUEST 作为对 READ10 命令的响应,并认为这可能是因为设备仅支持 READ6。因此,在下次重试时切换到 READ6(因此是一个新的 scsi_cmnd)可能是有意义的。 7.2 重试资格标准

    请注意,SCSI 中间层总是在继续之前检查重试资格并重新排队命令以进行重试。 scsi_cmnd 的资格标准包括(其中一些可能不适用于上述所有情况):

    • retries < allowed (Num of retries should be less than allowed retries,即已重试的次数应少于允许的最大重试次数)
    • 花费在 EH 的时间不超过 host->eh_deadline jiffies
    • scsi_noretry_cmd() 应该为命令返回 0。
    • scsi_device 必须在线
    • req->timeout 不能过期
    • 等等。。。
    1. 示例:以一个 scsi_cmnd 过程为例 8.1 示例 scsi_cmnd 所用路径的高级视图

    我们以块请求为例,例如想要从 scsi 磁盘读取块,但是 LBA 地址超出当前设备的范围(假设)。 ML 将其提交给 LLD,但 HW 接受命令并在其上阻塞(再次假设通过中止序列进行跟踪)。所以超时发生,ML 中止命令,并重新排队。在下一次运行中,LLD 使用 CHECK_CONDITION 完成命令。我们假设 SCSI 主机不会自动获取感知信息。ML 为 EH 调度 cmnd。EH线程发送REQUEST_SENSE 获取 sense info ILLEGAL_REQUEST,并据此完成对block层的请求。

    8.2 实际采取的路径


    Dispatched 派发:

    scsi_request_fn()
    |
    +—> blk_peek_request()
    | |
    | +—> scsi_prep_fn()
    | (分配和设置 scsi_cmnd)
    |
    +—> scsi_dispatch_cmd()
    |
    +—> hostt->queue_command()

    超时:

    scsi_times_out()
    |
    +—> scsi_abort_command() – 返回成功(SUCCESS)
    |
    +—> queue_delayed_work(abort_work) ,队列延迟工作(中止工作)

    中止处理程序:

    scmd_eh_abort_handler()
    |
    +—> scsi_try_to_abort_cmd() – 返回成功(SUCCESS)
    | |
    | +—> hostt->eh_abort_handler()
    |
    +—> scsi_queue_insert()
    |
    +—> __scsi_queue_insert()
    |
    +—> blk_requeue_request()
    (req 被重新排队,req->special 指向 scsi_cmnd)

    再次收到请求:

    scsi_request_fn()
    |
    +—> blk_peek_request()
    | (req->cmd_flags 设置了 REQ_DONTPREP,因此不会再次调用 scsi_prep_fn())
    |
    +—> scsi_dispatch_cmd()
    |
    +—> hostt->queue_command()

    命令以 CHECK_CONDITION 完成:

    scsi_softirq_done()
    |
    +—> scsi_decide_disposition()
    | (查看 CHECK_CONDITION)
    | |
    | +—> scsi_check_sense() – 返回失败
    | |
    | +—> scsi_command_normalize_sense()
    | (未能找到有效的感知数据)
    |
    +—> 返回结果为 失败(FAILED):
    |
    +—> scsi_eh_scmd_add()
    将 scsi_cmnd 添加到主机 EH 队列
    |
    +—> scsi_eh_wakeup()

    SCSI 错误处理程序线程运行以获取感知信息,并在其执行完成后完成请求。

    scsi_error_handler()
    |
    +—> scsi_unjam_host()
    |
    +—> scsi_eh_get_sense()
    | |
    | +—> scsi_request_sense()
    | | |
    | | +—> scsi_send_eh_cmnd()
    | | (劫持 smnd 发送 EH 命令)
    | | |
    | | +–> scsi_eh_prep_cmnd()
    | | | (保存现有 scsi_cmndi 的上下文,分配 sence 感知数据缓冲区,并为 REQUEST_SENSE 设置 scsi_cmnd)
    | | |
    | | +–> hostt->queuecommand(),然后等待…
    | | | (获取 cmnd 的感知数据)
    | | |
    | | +–> scsi_eh_completed_normally() – 返回成功
    | | |
    | | +–> scsi_eh_restore_cmnd()
    | | (恢复原始 scsi_cmnd 的上下文)
    | |
    | +—> scsi_decide_disposition() – 返回成功
    | | (此时可以看到sense info)
    | |
    | +—> 设置 scmd->retries = scmd->allowed(避免重试)
    | |
    | +—> scsi_eh_finish_cmd()
    | (将 scsi_cmnd 放在 done_q 上)
    |
    +—> scsi_eh_flush_done_q()
    (看到 scsi_cmnd 不符合重试条件)
    |
    +—> scsi_finish_command()
    |
    +—> scsi_io_completion()
    |
    +—> scsi_end_request()
    |
    +—> scsi_put_command()
    (释放 scsi_cmnd)

    1. 参考文献
      ==========
      以下是极好的参考资料来源:
      Documentation/scsi/scsi_eh.txt

    http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf <===

    补注 2:
    上述 SCSI-EH.pdf 原链接失效,根据内容推断,应为如下链接:
    http://events17.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

  • Life of a SCSI Command (scsi_cmnd) — Linux kernel 4.0

    截取自 Rajat Jain 提交的patch,但注意,此文档未被merge!

               ==================================
               Life of a SCSI Command (scsi_cmnd)
               ==================================
    
            Rajat Jain <rajatja@google.com> on 12-May-2015

    (This document roughly matches the Linux kernel 4.0)

    This documents describes the various phases of a SCSI command (struct scsi_cmnd)
    lifecycle, as it flows though different parts of the SCSI mid level driver. It
    describes under what conditions and how a scsi_cmnd may be aborted, or retried,
    or scheduled for error handling, and how is it recovered, and in general how a
    block request is handled by the SCSI mid level driver. It goes into detail about
    what functions get called and the purpose for each one of them etc.

    To help explain with an example, it takes example of a scsi_cmnd that goes
    through it all – timeout, abort, error handling, retry (also results in
    CHECK_CONDITION and gets sense info). The last section traces the path taken by
    this example scsi_cmnd in its lifetime.

    TABLE OF CONTENTS

    [1] Lifecycle of a scsi_cmnd
    [2] How does a scsi_cmnd get queued to the LLD for processing?
    [3] How does a scsi_cmnd complete?
    [3.1] Command completing via scsi_softirq_done()
    [3.2] Command completing via scsi_times_out()$
    [4] SCSI Error Handling
    [4.1] How did we Get here?
    [4.2] When does Error Handling actually run?
    [4.3] SCSI Error Handler thread
    [5] SCSI Commands can be “hijacked”
    [6] SCSI Command Aborts
    [6.1] When would mid level try to abort a command?
    [6.2] How SCSI command abort works?
    [6.3] Aborts can fail too
    [7] SCSI command Retries
    [7.1] When would mid level retry a command?
    [7.2] Eligibility criteria for Retry
    [8] Example: Following a scsi_cmnd (that results in CHECK_CONDITION)
    [8.1] High level view of path taken by example scsi_cmnd
    [8.2] Actual Path taken
    [9] References

    1. Lifecycle of a scsi_cmnd SCSI Mid level interfaces with the block layer just like any other block
      driver. For each block device that SCSI ML adds to the system, it indicates
      a bunch of functions to serve the corresponding request queue. The following functions are relevant to the scsi_cmnd in its lifetime. Note
      that depending on the situations, it may not go thourgh some of these
      stages, or may have to go through some stages multiple times. scsi_prep_fn()
      is called by the blocklayer to prepare the request. This
      function actually allocates a new scsi_cmnd for the request (from
      scsi_host->cmd_pool) and sets it up. This is where a scsi_smnd is “born”.
      Note, a new scsi_cmnd is allocated only if the blk req did not already have
      one associated with it (req->special != NULL). A req may already have a
      scsi_cmnd if the req was tried by SCSI earlier, and it resulted in a
      decision to retry later (and hence req was put back on the queue). scsi_request_fn()
      is the actual function to serve the request queue. It basically checks
      whether the host is ready for new commands, and if so, it submits it to the
      LLD:
      scsi_request_fn()
      ->scsi_dispatch_cmd()
      ->hostt->queue_command()
      In case a scsi_cmnd could not be queued to LLD for some reason, the req
      is put back on the original request queue (for retry later). scsi_softirq_done()
      is the handler that gets called once the LLD indicates command completed.
      scsi_done()
      ->blk_complete_request()
      ->causes softirq
      ->blk_done_softirq()
      ->scsi_softirq_done()
      The most important goal of this function is to determine the course of
      further action for this req (based on the scsi_cmnd->result and sense data
      if present), and take that course. The options could be to finish off the
      request to block layer, requeue it to block layer, or schedule it for error
      handling (if that is deemed necessary). This is discussed in much detail
      later. scsi_times_out()
      is the function that gets called if the LLD does not respond with the
      result of a scsi_cmnd for a long time, and a time out happens. It tries
      to see if the situation can be fixed by LLD timeout handlers (if available)
      or aborting the commands. If not, it schedules the commands for EH
      (discussed at length later). scsi_unprep_fn()
      is the function that gets called to unprepare the request. It is supposed
      to undo whatever scsi_prep_fn() does.
    2. How does a scsi_cmnd get queued to the LLD for processing? The submission part is very simple. Once the scsi_request_fn() gets called
      for a block request and it picks up a new block request via
      blk_peek_request(), the scsi_cmnd has already been setup and is ready to be
      sent to the LLD:
      scsi_request_fn()
      ->scsi_dispatch_cmd()
      ->hostt->queue_command()
    3. How does a scsi_cmnd complete? Once a scsi_cmnd is submitted to the LLD, there are only 2 ways it can get
      completed: a. Either the LLD responds in time.
      (i.e. resulting in scsi_softirq_done() for the command) b. Or, the LLD does not respond in time and a timeout out occurred
      (i.e. resulting in scsi_times_out() for the command) We discuss both these cases below. Note 1: There may be scsi_cmnd(s) that are re-tried. But completion of a
      re-tried scsi_cmnd is not any different than the completion of a new
      scsi_cmnd. Thus irrespective of retries, the scsi_cmnds will always end up
      in using one of the above 2 scenarios. Note 2: A scsi_cmnd may be “highjacked” during error handling in
      scsi_send_eh_cmnd(), to send one of the EH commands (TUR / STU /
      REQUEST_SENSE). However, the completion of these EH commands does not land up
      in the above two scenarios. This is the only exception. Once the scsi_cmnd is
      “un-hijacked”, the result of this original scsi_cmnd will still go through
      the same 2 scenarios. 3.1 Command completing via scsi_softirq_done() This is the case when the LLD responded in time i.e. completed the command.
      Note that here “completed” does not mean that the command was successfully
      completed. In fact it could have been the case, that the SCSI host hardware
      may have failed without even accepting the command. However, the fact that
      scsi_softir_done() was called, indicates that there is a “result” available
      in a timely fashion. And we’ll have to examine this result in order to
      decide the next course of action. scsi_softirq_done()
      |
      +—> scsi_decide_disposition()
      | Takes a look at the scsi_cmnd->result and sense data to determine
      | what is the best course of action to take. While reading this
      | function code, one should not confuse SUCCESS as meaning the command
      | was successful, or FAILED to mean the command failed etc. The return
      | value of this function merely indicates the course of action to take
      |
      +—> case SUCCESS:
      | (Finish off the command to block layer. For e.g, the device may be
      | offline, and hence complete the command – the block layer may retry
      | on its own later, but that doesn’t concern the SCSI ML)
      | |
      | +—> scsi_finish_command()
      | |
      | +—> scsi_io_completion() (*see note below)
      | |
      | +—> blk_finish_request()
      |
      +—> case RETRY/ADD_TO_MLQUEUE:
      | (Requeue the command to request queue. For e.g. the device HW was
      | busy, and thus SCSI ML knows that retrying may help)
      | |
      | +—> scsi_queue_insert()
      | |
      | +—> blk_requeue_request()
      |
      +—> case FAILED/default:
      (Schedule the scsi_cmnd for EH. For e.g. there was a bus error that
      might need bus reset. Or we got CHECK_CONDITION and we need to issue
      REQ_SENSE to get more info about the failure. etc)
      |
      +—> scsi_eh_scmd_add()
      Add scsi_cmnd to the host EH queue
      scsi_eh_wakeup() Note 3:
      The scsi_io_completion() has a secondary logic similar to
      scsi_decide_disposition() in that it also looks at result and sense data
      and figures out what to do with request. It makes similar choices on the
      course of action to take. There is a special case in this function that
      involves “unprepping” a scsi_cmnd before requeuing it, and we’ll discuss
      it in sections below. 3.2 Command completing via scsi_times_out() This happens when the LLD does not repond in time, the block layer times
      out, and as a result calls the timeout function for the request queue for
      the SCSI device in question. scsi_times_out()
      |
      +—> scsi_transport_template->eh_timed_out() – Successful? If not…
      | (Gives transportt a chance to deal with it)
      |
      +—> scsi_host_template->eh_timed_out() – Successful? If not…
      | (Gives hostt a chance to deal with it)
      |
      +—> scsi_abort_command() – Successful? If not…
      | (Schedule an ABORT of the scsi_cmnd. The abort handler will also
      | requeue it if needed)
      |
      +—> scsi_eh_scmd_add()
      (Schedule the scsi_cmnd for EH. This’ll definitely work. Because if it
      doesn’t work, the EH handler will mark the device as offline, which
      counts as a good fix :-))
    4. SCSI Error Handling SCSI Error handling should be thought of the action the mid level decides to
      take when it knows that merely retrying a request may not help, and it needs
      to do something else (possibly disruptive) in order to fix the issue. For
      e.g. a stalled host may require a host reset, and only after that a retry of
      the request may complete. Note 4:
      (Random thoughts): Contrast the “Error Handling” with “Retries”. A Retry
      is a normal thing to do, when the mid level believes that it has seen an
      error which is transient in nature, and will go away on its own without
      explicitly doing anything. Thus a retry of a request again makes sense in
      this case. (On the other hand a cmnd is scheduled for EH, when it knows
      that it needs to do “something” before a retrying a cmnd can give good
      results). Note 5:
      The SCSI mid level maintains a (per-host) list of all the scsi_cmnd(s)
      that have been scheduled for EH at that host using scsi_host->eh_cmd_q.
      This is the list that gets processed by the EH thread, when it runs. 4.1 How did we Get here?

    A scsi_cmnd could be marked for EH in the following cases:

    • The command “error completed” i.e. scsi_decide_disposition() returned
      FAILED or something that indicates a failure that requires some sort of
      error recovery. E.g. device hardware failed, or we have a CHECK_CONDITION.
      scsi_softirq_done()
      ->scsi_decide_disposition = FAILED
      ->scsi_eh_scmd_add()
    • A scsi_cmnd timed out, and attempt to abort it fails.
      scsi_times_out()
      ->scsi_abort_command() != SUCCESS
      ->scsi_eh_scmd_add() 4.2 When does Error Handling actually run?

    A SCSI error handler thread is scheduled whenever there is a scsi_smnd that
    is marked for EH (inserted in the Scsi_Host->eh_cmd_q). Once a scsi_cmnd is
    marked for EH, the ML does not accept any more scsi_cmnds for that
    particular Scsi_Host. However, the EH thread does not actually run until all
    the pending IOs to the LLD for that particular Scsi_Host have either
    completed or failed. In other words, the only commands pending at the LLD
    for that host are the ones that need EH (host_busy == host_failed).

    The idea is to quiesce the bus, so that EH thread can recover the devices,
    as it may require to reset different components in order to do its job.

    4.3 SCSI Error Handler thread


    scsi_error_handler()
    |
    +—> transportt->eh_strategy_handler() if exists, else…
    | (Use transportt’s own error recovery handler, if available)
    |
    +—> scsi_unjam_host()
    | (The SCSI ML error handler described below. Also described in
    | Documentation/scsi/scsi_eh.txt. Basic goal is to do whatever
    | needs to recover from the current error condition. And requeue the
    | eligible commands after recovery)
    |
    +—> scsi_restart_operations()
    (Restart the operations of the SCSI request queue)
    |
    +—> scsi_run_host_queues()
    |
    +—> scsi_run_queue()
    |
    +—> blk_run_queue()

    scsi_unjam_host()


    The idea is to create 2 lists: work_q, done_q.
    Initially, work_q = , done_q = NULL
    And then error handle all the requests in work_q by taking sequentially
    higher severity action items that may recover the cmnd or device. Keep
    moving the requests from work_q to done_q and in the end finish them all
    in one go rather than individually finishing them up.

    scsi_unjam_host()
    |
    +–> Create 2 lists: work_q, done_q
    | work_q = , done_q = NULL
    |
    +–> scsi_eh_get_sense() – Are we done? if not…
    | (For the commands that have CHECK_CONDITION, get sense_info)
    | |
    | +–> scsi_request_sense()
    | | (Use scsi_send_eh_cmnd() to send a “hijacked” REQ_SENSE cmnd)
    | |
    | +–> scsi_decide_disposition()
    | |
    | +–> Arrange to finish the scsi_cmnd if SUCCESS (by setting
    | retries=allowed)
    |
    +–> scsi_eh_abort_cmds() – Are we done? If not…
    | (Abort the commands that had timed out)
    | |
    | +–> scsi_try_to_abort_cmd()
    | | (Results in call to hostt->eh_abort_handler() which is responsible
    | | making the LLD and the HW forget about the scsi_cmnd)
    | |
    | +–> scsi_eh_test_devices()
    | (Test if the device is responding now by sending appropriate EH
    | commands (STU / TEST_UNIT_READY). Again, sending these EH
    | commands involves highjacking the original scsi_cmnd, and later
    | restoring the context)
    |
    +–> scsi_eh_ready_devs() – Are we done? if not…
    | (Take increasing order of higher severity actions in order to recover)
    | |
    | +–> scsi_eh_bus_device_reset()
    | | (Reset the scsi_device. Results in call to
    | | hostt->eh_device_reset_handler())
    | |
    | +–> scsi_eh_target_reset()
    | | (Reset the scsi_target. Results in call to
    | | hostt->eh_target_reset_handler())
    | |
    | +–> scsi_eh_bus_reset()
    | | (Reset the scsi_device. Results in call to
    | | hostt->eh_bus_reset_handler())
    | |
    | +–> scsi_eh_host_reset()
    | | (Reset the Scsi_Host. Results in call to
    | | hostt->eh_host_reset_handler())
    | |
    | +–> If nothing has worked – scsi_eh_offline_sdevs()
    | (The device is not recoverable, put it offline)
    |
    +–> scsi_eh_flush_done_q()
    (For all the EH commands on the done_q, either requeue them (via
    scsi_queue_insert()) if eligible, or finish them up to block layer
    (via scsi_finish_command())

    Note 6:
    At each recovery stage we test if we are done (using
    scsi_eh_test_devices()), and take the next severity action only if needed.

    Note 7:
    The error handler takes care that for multiple scsi_cmnds that can be
    recovered by resetting the same component (e.g. same scsi_device), the
    device is reset only once.

    1. SCSI Commands can be “hijacked” As seen above, the EH thread may need to send some EH commands in order to
      check the health and responsiveness of the SCSI device:
    • TUR – Test Unit Ready
    • STU – Start / Stop Unit
    • REQUEST_SENSE – To get the Sense data in response to CHECK_CONDITION However instead of allocating and setting up a new scsi_cmnd for such
      temporary purposes, the EH thread hijacks- the current scsi_cmnd that it is
      trying to recover, in order to send the EH commands. This whole process is
      done in scsi_send_eh_cmnd(). The scsi_send_eh_cmnd saves a context of the current command before hijacking
      it, replaces the scsi_done ptr with its own before dipatching it to the LLD,
      and restores the context later once it is done. The EH commands sent in this
      manner are subject to the same problems of timeouts / abort failures /
      completions – but they do not take the route taken by normal commands (i.e.
      don’t take the scsi_softirq_done() or scsi_times_out() route). Every
      thing is handled within scsi_send_eh_cmnd(). This is discussed in following
      sections.
    1. SCSI Command Aborts It refers to the scenario where the SCSI mid level wants to have the LLD
      driver and the hardware below it forget everything about a scsi_cmnd that
      was given to the LLD earlier. The most common reason is that the LLD failed
      to respond in time. 6.1 When would mid level try to abort a command?

    The SCSI ML may try to abort a scsi_cmnd in the following conditions:

    1. SCSI mid layer times out on a command, and tried to abort it.
      scsi_times_out()
      -> scsi_abort_command()
      What happens if this abort fails? Schedule the command for EH.
    2. The EH thread tried to abort all the pending commands while trying to
      unjam a host.
      scsi_unjam_host()
      -> scsi_eh_abort_cmds() What happens if this abort fails? We move to higher severity recovery
      steps (start resetting HW components etc) because that is likely to cause
      both LLD and the HW forget aout those commands.
    3. This is a nasty one. During error recovery, the EH thread may “hijack”
      a scsi_cmnd to send a EH command (TUR/STU/REQ_SENSE) to LLD using
      scsi_send_eh_cmnd(). If such a “hijacked” EH command times out, the SCSI
      EH thread will try to abort it.
      scsi_send_eh_cmnd()
      -> scsi_abort_eh_cmnd()
      -> scsi_try_to_abort_cmd() What happens if this abort fails? Similar to the previous case, the
      scsi_abort_eh_cmnd() will try to take higher severity actions (reset bus
      etc) but will not send EH commands such as TUR etc again in order to
      verify if the devices started to respond. 6.2 How SCSI command abort works?

    Unlike EH command like TUR, the ABORT is not a SCSI command that mid layer
    driver sends to LLD. The LLD provides an eh_abort_handler() function
    pointer that is used to abort the command. It is up to the LLD to do
    whatever is needed to abort the command. It may require to send some
    proprietary command to the HW, or fiddle some bits, or do whatever magic
    is necessary.

    6.3 Aborts can fail too


    As with other things, abort attempts can also fail. The SCSI mid layer does
    the right thing in such situations as depicted in the section above.

    Note 8:
    Once a block layer hands off a command to the SCSI subsystem, there is no
    way currently for the block layer to cancel / abort a request. This needs
    some work.

    1. SCSI command Retries The SCSI mid level maintains no queues for the SCSI commands it is processing
      (other than the EH command queue). Thus whenever the SCSI ML thinks it needs
      to retry a command, it requeues the request back to the corresponding request
      queue, so that the retries will be made “naturally” when the request function
      picks up the next request for processing. When requing such requests back to the request queue, they are put at the
      head so that they go before the other (existing) requests in that request
      queue. 7.1 When would mid level retry a command?

    Following are the conditions that will cause a SCSI command to be retried
    (by putting the blk request back at the request queue):

    1. Mid layer times out on a scsi_cmnd, aborts it successfully, and requeues
      it.
      scsi_times_out()
      -> scsi_abort_command()
      -> schedules scmd_eh_abort_handler()
      -> scsi_queue_insert()
      -> blk_requeue_request()
    2. EH thread, after recovering a host, requeues back all the scsi_cmnds that
      are eligible for a retry:
      scsi_error_handler()
      -> scsi_unjam_host()
      -> scsi_eh_flush_done_q()
      -> scsi_queue_insert()
      -> blk_requeue_request()
    3. LLD completes the scsi_cmnd, and scsi_decide_disposition() looks at the
      scsi_cmnd->result and thinks it needs to be retried (For e.g. because the
      bus was busy).
      scsi_softirq_done()
      -> scsi_decide_disposition() returns NEEDS_RETRY
      -> scsi_queue_insert()
      -> blk_requeue_request()
    4. In the scsi_request_fn(), the SCSI ML finds out that the host is busy and
      the scsi_cmnd could not be sent to the LLD, hence it requeues the req
      back on the queue.
      scsi_request_fn()
      -> case note_ready:
      -> blk_requeue_request()
    5. scsi_finish_command() that is called from a variety of places to finish
      off a request to the block level. However, it calls scsi_io_completion()
      that may look at the request and decide to retry it (if it qualifies).
      scsi_finish_command()
      -> scsi_io_completion()
      -> __scsi_queue_insert()
      -> blk_requeue_request() Note 9:
      The case 5 above has a very special case. There may be some cases where
      the scsi_io_completion() decides that a blk request has to be retried,
      however the scsi_cmnd for this req should be relased and instead a new
      scsi_cmnd should be allocated and used for this request at the next
      retry. This can be the case for e.g. if it sees an ILLEGAL REQUEST as a
      response to a READ10 command, and thinks that it may be because the
      device supports only READ6. Thus it may make sense to switch to READ6
      (hence a new scsi_cmnd) at the time of next retry. 7.2 Eligibility criteria for Retry

    Note that SCSI mid level always checks for retry eligibility before it goes
    ahead and requeues the command for retries. The eligibility criteria for a
    scsi_cmnd includes (some of these may not apply in all situations described
    above):

    • retries < allowed (Num of retries should be less than allowed retries)
    • no more than host->eh_deadline jiffies spent in EH.
    • scsi_noretry_cmd() should return 0 for the command.
    • scsi_device must be online
    • req->timeout must not have expired
    • etc.
    1. Example: Following a scsi_cmnd 8.1 High level view of path taken by example scsi_cmnd

    We take the example of a block request that for example wants to read a
    block off a scsi disk, how ever the LBA address is out of range for the
    current device (hypothetically). The ML submits it to LLD, but the HW takes
    the command and chokes on it (again hypothetically to trace through the
    abort sequence). So the timeout happens and the ML aborts the
    command, and requeues it. In the next run, the LLD completes the command
    with CHECK_CONDITION. We assume that the SCSI host does not automatically
    get the sense info. The ML schedules the cmnd for EH. The EH thread sends
    the REQUEST_SENSE to get sense info ILLEGAL_REQUEST, and based on it
    completes the request to the block layer.

    8.2 Actual Path taken


    Dispatched:

    scsi_request_fn()
    |
    +—> blk_peek_request()
    | |
    | +—> scsi_prep_fn()
    | (Allocate and setup scsi_cmnd)
    |
    +—> scsi_dispatch_cmd()
    |
    +—> hostt->queue_command()

    Times out:

    scsi_times_out()
    |
    +—> scsi_abort_command() – returns SUCCESS
    |
    +—> queue_delayed_work(abort_work)

    Abort Handler:

    scmd_eh_abort_handler()
    |
    +—> scsi_try_to_abort_cmd() – returns SUCCESS
    | |
    | +—> hostt->eh_abort_handler()
    |
    +—> scsi_queue_insert()
    |
    +—> __scsi_queue_insert()
    |
    +—> blk_requeue_request()
    (the req is requeued, with req->special pointing
    to scsi_cmnd)

    Request picked up again:

    scsi_request_fn()
    |
    +—> blk_peek_request()
    | (req->cmd_flags has REQ_DONTPREP set, so does not call
    | scsi_prep_fn() again)
    |
    +—> scsi_dispatch_cmd()
    |
    +—> hostt->queue_command()

    Command is completed with a CHECK_CONDITION:

    scsi_softirq_done()
    |
    +—> scsi_decide_disposition()
    | (Sees the CHECK_CONDITION)
    | |
    | +—> scsi_check_sense() – returns FAILED
    | |
    | +—> scsi_command_normalize_sense()
    | (Fails to find a valid sense data)
    |
    +—> case FAILED:
    |
    +—> scsi_eh_scmd_add()
    Add scsi_cmnd to the host EH queue
    |
    +—> scsi_eh_wakeup()

    The SCSI Error handler thread runs to get the sense info, and completes the
    request once it is done.

    scsi_error_handler()
    |
    +—> scsi_unjam_host()
    |
    +—> scsi_eh_get_sense()
    | |
    | +—> scsi_request_sense()
    | | |
    | | +—> scsi_send_eh_cmnd()
    | | (Highjacks the smnd to send EH command)
    | | |
    | | +–> scsi_eh_prep_cmnd()
    | | | (save context of the existing scsi_cmndi,
    | | | allocates a sense buffer, and sets up the
    | | | scsi_cmnd for REQUEST_SENSE)
    | | |
    | | +–> hostt->queuecommand(), and then wait…
    | | | (gets the sense data for the cmnd)
    | | |
    | | +–> scsi_eh_completed_normally() – returns SUCCESS
    | | |
    | | +–> scsi_eh_restore_cmnd()
    | | (restores the context of original scsi_cmnd)
    | |
    | +—> scsi_decide_disposition() – returns SUCCESS
    | | (This time can see the sense info)
    | |
    | +—> Set scmd->retries = scmd->allowed (to avoid retries)
    | |
    | +—> scsi_eh_finish_cmd()
    | (Puts the scsi_cmnd on the done_q)
    |
    +—> scsi_eh_flush_done_q()
    (Sees that scsi_cmnd is not eligible for retries)
    |
    +—> scsi_finish_command()
    |
    +—> scsi_io_completion()
    |
    +—> scsi_end_request()
    |
    +—> scsi_put_command()
    (Releases the scsi_cmnd)

    1. References
      ==========
      The following are excellent sources of references:
      Documentation/scsi/scsi_eh.txt

    http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

    原文来源:

    https://lwn.net/Articles/644318/
    https://lkml.org/lkml/2015/5/12/853

  • 一些资源获取站点

    记录一些可以不花钱或者很低成本就可以获取所需的网站。

    图书:

    1. Z Libray(Z-Lib) 站点,懂得都懂。之前的域名已被米国政府没收,需要访问的自行检查洋葱。
    2. Library Genesis (libgen) 这个不是 Z-lib !!! 下载的时候,注意选择镜像站点,某些只能同时下载三本书。而且这个提供资源合集的种子,并不是所有的站点我们都可以直接访问。实际站点的信息在 https://libgen.onl/library-genesis/ 发布。但站点可能需要科学上网才能访问。
    3. 搬书匠,不多说,地址为: http://banshujiang.cn/
    4. 书格。这个主要还是我们的古籍。地址为: https://new.shuge.org/

    杂志:

    1. Magazine Lib 是一个免费下载PDF 格式电子杂志的网站。大多数情况下速度非常快,但部分资源可能会在发布3天甚至更短的时间内就会被删除(下载链接失效,后端 vk 的文件删除)。地址为 https://magazinelib.com/
    2. docutr 是一个可以下载 电子书 和 杂志、报纸的网站。其中,电子书并不多。主要还是用于下载杂志、报纸。速度还是很不错的。地址为: https://www.docutr.com/