翼度科技»论坛 云主机 LINUX 查看内容

FCoE测试重启调试记录

10

主题

10

帖子

30

积分

新手上路

Rank: 1

积分
30
环境

CPU:Phytium,S2500/64 C00
内核版本:4.19.90-25.10
网讯网卡:txgbe
共两台设备,光纤直连
复现步骤

设备A、B分别执行以下操作,即可复现
  1. modprobe fcoe
  2. systemctl start lldpad
  3. systemctl start fcoe
复制代码
总结

重启问题是SCSI存储模块libfcoe中fcoe_ctlr_timer_work(drivers/scsi/fcoe/fcoe_ctlr.c)函数访问了非法的内存地址,地址异常原因是编码问题导致,使用结构体强制赋值而忽略了list指针成员的值。
调试记录

查看内核日志

麒麟4.19.90 25.10版本,开启CONFIG_FCOE后,加载txgbe系统发生重启,查看系统日志未发现异常信息,需要抓取串口信息
获取串口信息

串口信息抓取失败,内核在bios阶段串口日志正常,但内核阶段无输出,根据日志可知console enable时ttyS1还未创建。(4.19.90 25.10 51.40现象一样)
  1. [root@compute ~]# dmesg | grep tty
  2. [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.19.90-51.0.v2207.fortest.ky10.aarch64 root=/dev/mapper/klas-root ro crashkernel=auto rd.lvm.lv=klas/root rd.lvm.lv=klas/swap acpi=on video=VGA-1:640x480-32@60me smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me console=ttyS1,115200 loglevel=7
  3. [   14.695352] 00:02: ttyS0 at MMIO 0x200002f8 (irq = 0, base_baud = 115200) is a 16550A
  4. [   15.478192] console [ttyS0] enabled
  5. [   15.479627] HISI0031:00: ttyS1 at MMIO 0x28001000 (irq = 7, base_baud = 3125000) is a 16550A
  6. [   90.445804] audit: type=1300 audit(1676610550.480:126): arch=c00000b7 syscall=105 success=yes exit=0 a0=aaaaf7e3bcb0 a1=2ab7 a2=aaaae0aff1c8 a3=aaaaf7e322d0 items=0 ppid=3107 pid=5502 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="modprobe" exe="/usr/bin/kmod" key=(null)
  7. [  182.268682] audit: type=1006 audit(1676610660.130:141): pid=5598 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
  8. [  333.116270] audit: type=1006 audit(1676610810.970:231): pid=6326 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=3 res=1
复制代码
使用kdump

kdump部署命令:
  1. yum install -y kexec-tools
  2. systemctl restart kdump.service
  3. kdumpctl restart
复制代码
4.19.90 25.10 kdump 宕机
4.19.90 51.40 kdump 测试正常
kdump日志:
  1. 1950 [  811.607882] Unable to handle kernel paging request at virtual address fffffffffffffed8
  2. 1951 [  811.608613] Mem abort info:
  3. 1952 [  811.608882]   ESR = 0x96000005
  4. 1953 [  811.609175]   Exception class = DABT (current EL), IL = 32 bits
  5. 1954 [  811.609697]   SET = 0, FnV = 0
  6. 1955 [  811.609989]   EA = 0, S1PTW = 0
  7. 1956 [  811.610285] Data abort info:
  8. 1957 [  811.610576]   ISV = 0, ISS = 0x00000005
  9. 1958 [  811.610933]   CM = 0, WnR = 0
  10. 1959 [  811.611219] swapper pgtable: 64k pages, 48-bit VAs, pgdp = 000000003e78cc5f
  11. 1960 [  811.611855] [fffffffffffffed8] pgd=0000000000000000, pud=0000000000000000
  12. 1961 [  811.612454] Internal error: Oops: 96000005 [#1] SMP
  13. 1962 [  811.612930] Modules linked in: qedf qed crc8 fcoe libfcoe libfc scsi_transport_fc xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_f     ilter ip6_tables iptable_filter tun ebtable_filter ebtable_nat ebtables iptable_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi br_netfilter bridge 8021q garp mrp ipmi_ssif stp rfkill llc sunrpc vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif     _ce ghash_ce joydev ch341 ses sha2_ce usbserial sha256_arm64 enclosure txgbe ngbe sha1_ce sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables megaraid_sas ast dm_mirror dm_region_hash dm_log gb
  14. 1963 [  811.618343] Process kworker/6:2 (pid: 1264, stack limit = 0x0000000070e328c3)
  15. 1964 [  811.618970] CPU: 6 PID: 1264 Comm: kworker/6:2 Kdump: loaded Not tainted 4.19.90-51.0.v2207.fortest.ky10.aarch64 #1
  16. 1965 [  811.619951] Source Version: 8a40033b0116040c501be072732bf7ac6ebdb4f7
  17. 1966 [  811.620504] Hardware name: \xe4\xb8\xad\xe5\x9b\xbd\xe9\x95\xbf\xe5\x9f\x8e\xe7\xa7\x91\xe6\x8a\x80\xe9\x9b\x86\xe5\x9b\xa2\xe8\x82\xa1\xe4\xbb\xbd\xe6\x9c\x89\xe9\x99\x90\xe5\x85\xac\xe5\x8f\xb8 \xe6\x93\x8e\xe5\xa4\xa9EF862/GW-748E2A-FTG,
  18. 1967 [  811.621673] Workqueue: events fcoe_ctlr_timer_work [libfcoe]
  19. 1968 [  811.622203] pstate: a0000005 (NzCv daif -PAN -UAO)
  20. 1969 [  811.622640] pc : fcoe_ctlr_timer_work+0x988/0xd30 [libfcoe]
  21. 1970 [  811.623165] lr : fcoe_ctlr_timer_work+0x934/0xd30 [libfcoe]
  22. 1971 [  811.623698] sp : ffffb5c314a17cf0
  23. 1972 [  811.624001] x29: ffffb5c314a17cf0 x28: ffffb81b23c91a50
  24. 1973 [  811.624518] x27: ffff288838b08910 x26: ffffb81b23c91a68
  25. 1974 [  811.625005] x25: 000000010000c5a1 x24: ffffb7db23311e08
  26. 1975 [  811.625547] x23: ffffb81b23c91a08 x22: ffff2888802b5000
  27. 1976 [  811.626025] x21: ffffb7db23311c80 x20: fffffffffffffec8
  28. 1977 [  811.626543] x19: ffffb5c30f555010 x18: ffffffffffffffff
  29. 1978 [  811.627129] x17: 0000000000000000 x16: 0000000000000000
  30. 1979 [  811.627630] x15: ffff2888802b5a38 x14: ffff288900b6cb8f
  31. 1980 [  811.628112] x13: ffff288880b6cba2 x12: ffff28888092f918
  32. 1981 [  811.628618] x11: 0000000000000040 x10: ffff288880927e40
  33. 1982 [  811.629086] x9 : ffff288880927e38 x8 : ffffb5bb0000c8e8
  34. 1983 [  811.629561] x7 : fefefefefefefeff x6 : 0000000000000000
  35. 1984 [  811.630027] x5 : 000073746e657665 x4 : 0000000000000006
  36. 1985 [  811.630499] x3 : 0000000000000000 x2 : ffffb5c3144c7200
  37. 1986 [  811.631000] x1 : 0000000000000000 x0 : 0000000000000000
  38. 1987 [  811.631474] Call trace:
  39. 1988 [  811.631753]  fcoe_ctlr_timer_work+0x988/0xd30 [libfcoe]
  40. 1989 [  811.632238]  process_one_work+0x1fc/0x490
  41. 1990 [  811.632655]  worker_thread+0x50/0x4b8
  42. 1991 [  811.633001]  kthread+0x134/0x138
  43. 1992 [  811.633315]  ret_from_fork+0x10/0x18
  44. 1993 [  811.633659] Code: f9409e80 d104e014 eb00039f 540003e0 (b9401283)
  45. 1994 [  811.634313] SMP: stopping secondary CPUs
  46. 1995 [  811.643778] Starting crashdump kernel...
  47. 1996 [  811.644615] Bye!
复制代码
crash日志:
  1. WARNING: kernel relocated [99295086MB]: patching 104404 gdb minimal_symbol values
  2.       KERNEL: /lib/debug/lib/modules/4.19.90-51.0.v2207.fortest.ky10.aarch64/vmlinux
  3.     DUMPFILE: /var/crash/127.0.0.1-2023-02-17-11:02:11/vmcore  [PARTIAL DUMP]
  4.         CPUS: 128
  5.         DATE: Fri Feb 17 10:58:52 CST 2023
  6.       UPTIME: 00:57:06
  7. LOAD AVERAGE: 1.01, 0.23, 0.08
  8.        TASKS: 1555
  9.     NODENAME: localhost.localdomain
  10.      RELEASE: 4.19.90-51.0.v2207.fortest.ky10.aarch64
  11.      VERSION: #1 SMP Thu Feb 16 14:14:39 CST 2023
  12.      MACHINE: aarch64  (unknown Mhz)
  13.       MEMORY: 127.8 GB
  14.        PANIC: "Unable to handle kernel paging request at virtual address fffffffffffffed8"
  15.          PID: 810
  16.      COMMAND: "kworker/122:1"
  17.         TASK: ffff870061ec1300  [THREAD_INFO: ffff870061ec1300]
  18.          CPU: 122
  19.        STATE: TASK_RUNNING (PANIC)
  20. crash>
复制代码
pc指针指向fcoe_ctlr_timer_work+0x988/0xd30 , 对应汇编代码
  1. crash> bt
  2. PID: 810    TASK: ffff870061ec1300  CPU: 122  COMMAND: "kworker/122:1"
  3. #0 [ffff870061f3f7e0] machine_kexec at ffff5eb1feed40c4
  4. #1 [ffff870061f3f840] __crash_kexec at ffff5eb1feff4708
  5. #2 [ffff870061f3f9d0] crash_kexec at ffff5eb1feff4830
  6. #3 [ffff870061f3fa00] die at ffff5eb1feebfe9c
  7. #4 [ffff870061f3fa40] die_kernel_fault at ffff5eb1feee3634
  8. #5 [ffff870061f3fa70] __do_kernel_fault at ffff5eb1feee32d8
  9. #6 [ffff870061f3faa0] do_translation_fault at ffff5eb1ffb0c22c
  10. #7 [ffff870061f3fad0] do_mem_abort at ffff5eb1feeb130c
  11. #8 [ffff870061f3fcd0] el1_ia at ffff5eb1feeb310c
  12.      PC: ffff5eb1bd8d513c  [fcoe_ctlr_timer_work+3180]
  13.      LR: ffff5eb1bd8d513c  [fcoe_ctlr_timer_work+3180]
  14.      SP: ffff870061f3fce0  PSTATE: 40000005
  15.     X29: ffff870061f3fce0  X28: ffff84a8510d1410  X27: ffff5eb1bd8e0e98
  16.     X26: fffffffffffffec8  X25: ffff5eb1bd8e0248  X24: ffff8700630cc480
  17.     X23: 000000010004c592  X22: ffff5eb1bd8e0000  X21: ffff5eb1bd8e0000
  18.     X20: ffff8700630cc608  X19: ffff5eb200135000  X18: ffffffffffffffff
  19.     X17: 0000000000000001  X16: 0000000000000007  X15: ffff5eb200135a38
  20.     X14: ffff5eb2809ecb8f  X13: ffff5eb2009ecba2  X12: ffff5eb2007af918
  21.     X11: 0000000005f5e0ff  X10: ffff5eb200785b70   X9: ffff5eb200350018
  22.      X8: ffff5eb1ff63f658   X7: 6c74635f656f6366   X6: 0000000000000773
  23.      X5: ffff8709fc3a1500   X4: 0000000000000006   X3: ffff8709fc3a1500
  24.      X2: 32ccc9c32ff8a200   X1: 0000000000000000   X0: 0000000000000017
  25. #9 [ffff870061f3fce0] fcoe_ctlr_timer_work at ffff5eb1bd8d5138 [libfcoe]
  26. #10 [ffff870061f3fdb0] process_one_work at ffff5eb1fef477a0
  27. #11 [ffff870061f3fe00] worker_thread at ffff5eb1fef47a84
  28. #12 [ffff870061f3fe70] kthread at ffff5eb1fef4f7e0
  29. crash> dis -l ffff5eb1bd8d513c
  30. 0xffff5eb1bd8d513c <fcoe_ctlr_timer_work+3180>: ldr     w4, [x26,#16]
  31. crash>
复制代码
drivers/scsi/fcoe/fcoe_ctlr.c:fcoe_ctlr_timer_work函数对应内核模块drivers/scsi/fcoe/libfcoe.ko
加载libfcoe、libfc模块
  1. crash> mod -s libfcoe
  2.      MODULE       NAME                    SIZE  OBJECT FILE
  3. ffff5eb1bd8f0680  libfcoe               262144  /usr/lib/debug/usr/lib/modules/4.19.90-51.0.v2207.fortest.ky10.aarch64/kernel/drivers/scsi/fcoe/libfcoe.ko.debug
  4. crash> mod -s libfc
  5.      MODULE       NAME                    SIZE  OBJECT FILE
  6. ffff5369cc050bc0  libfc                 327680  /usr/lib/debug/usr/lib/modules/4.19.90-51.0.v2207.fortest.ky10.aarch64/kernel/drivers/scsi/libfc/libfc.ko.debug
复制代码
测试步骤:

通过在fcoe_ctlr_timer_work函数中增加调试信息(scsi/libfc/libfc.ko、scsi/fcoe/libfcoe.ko、scsi/bnx2fc/bnx2fc.ko等模块也添加了调试信息)
  1. 陪测端modprobe fcoe
  2. systemctl start lldpad
  3. systemctl start fcoe被测端modprobe scsi_transport_fcinsmod libfc.ko debug_logging=0xffinsmod libfcoe3.ko debug_logging=0xffmodprobe cnicinsmod bnx2fc.ko debug_logging=0xffmodprobe fcoe
  4. systemctl start lldpad
  5. systemctl start fcoe
复制代码
  1. static void fcoe_ctlr_vn_disc(struct fcoe_ctlr *fip)
  2. {
  3.     struct fc_lport *lport = fip->lp;
  4.     struct fc_disc *disc = &lport->disc;
  5.     struct fc_rport_priv *rdata;
  6.     struct fcoe_rport *frport;
  7.     void (*callback)(struct fc_lport *, enum fc_disc_event);
  8.     printk("%s-%d \n", __func__, __LINE__);
  9.     mutex_lock(&disc->disc_mutex);
  10.     printk("%s-%d \n", __func__, __LINE__);
  11.     callback = disc->pending ? disc->disc_callback : NULL;
  12.     disc->pending = 0;
  13.     printk("%s-%d \n", __func__, __LINE__);
  14.     list_for_each_entry_rcu(rdata, &disc->rports, peers) {                                                                                                                                                                                                                     
  15.         printk("%s-%d rdata 0x%llx \n", __func__, __LINE__, (long long) rdata);
  16.         if (!kref_get_unless_zero(&rdata->kref)) {
  17.             printk("%s-%d \n", __func__, __LINE__);
  18.             continue;
  19.         }   
  20. ......
复制代码
  1. [ 1403.774784] fc_rport_create-173
  2. [ 1403.775174] host1: Assigned Port ID 005b49
  3. [ 1403.775483] fc_rport_create-176 rdata 0xffffec6d50468800  lport 0xffffecc562e98a08
  4. [ 1403.775900] host1: lport 005b49: Entered READY from state FLOGI
  5. [ 1403.776661] fcoe_ctlr_vn_add-2536 rdata 0xffffec6d50468800
  6. [ 1403.777796] host1: fip: vn_add rport 00f848 new state 0
  7. [ 1403.777804] fcoe_ctlr_timer_work-1820
  8. [ 1403.777811] fcoe_ctlr_timer_work-1822
  9. [ 1403.779145] fcoe_ctlr_timer_work-1824
  10. [ 1403.779607] fcoe_ctlr_vn_timeout-3147
  11. [ 1403.779998] fcoe_ctlr_vn_timeout-3149
  12. [ 1403.780366] fcoe_ctlr_vn_timeout-3178
  13. [ 1403.780766] fcoe_ctlr_vn_timeout-3183
  14. [ 1403.781161] fcoe_ctlr_vn_timeout-3185
  15. [ 1403.781571] fcoe_ctlr_vn_timeout-3196
  16. [ 1403.781971] fcoe_ctlr_vn_disc-3099
  17. [ 1403.782345] fcoe_ctlr_vn_disc-3101
  18. [ 1403.782731] fcoe_ctlr_vn_disc-3104
  19. [ 1403.783096] fcoe_ctlr_vn_disc-3106 rdata 0xffffec6d50468800
  20. [ 1403.784697] fcoe_ctlr_vn_disc-3115
  21. [ 1403.785090] fcoe_ctlr_vn_disc-3106 rdata 0xfffffffffffffec8
  22. [ 1403.785621] Unable to handle kernel paging request at virtual address fffffffffffffed8
复制代码
  1. crash> struct -o fc_rport_priv
  2. struct fc_rport_priv {
  3.     [0] struct fc_lport *local_port;
  4.     [8] struct fc_rport *rport;
  5.    [16] struct kref kref;
  6. ......
  7. }
  8. SIZE: 432
复制代码
由于&rdata->kref(0xfffffffffffffed8) == rdata(0xfffffffffffffec8) + 16,
因此程序出错位置为if (!kref_get_unless_zero(&rdata->kref)) {
可以确定的是&disc->rports链表中遍历出了2个rdata,第2个rdata是一个错误的地址,然而根据日志显示fc_rport_create-176 rdata 0xffffec6d50468800  lport 0xffffecc562e98a08 rdata只被创建了1次。
值得说明的是任何一次复现问题第2个rdata始终为0xfffffffffffffec8, 在代码中添加以下过滤条件:
  1.    list_for_each_entry_rcu(rdata, &disc->rports, peers) {
  2.         if ((long long)rdata == 0xfffffffffffffec8) {
  3.             printk("%s-%d rdata 0x%llx break\n", __func__, __LINE__, (long long) rdata);
  4.             break;
  5.         }
复制代码
不会触发重启问题,但无法确定FCOE功能是否可用,影响域未知,该方案仅用于调试。
后续测试上述方案依然存在地址异常问题
定位地址异常原因

在fcoe_ctlr_vn_add、fc_disc_init、fc_rport_create中定位异常点,添加调试信息确定&lport->disc.rports列表是否被正常初始化,并在多处遍历&lport->disc.rports以确定异常地址0xfffffffffffffec8是何时进入该列表的。
drivers/scsi/fcoe/fcoe_ctlr.c
  1. static void fcoe_ctlr_vn_add(struct fcoe_ctlr *fip, struct fc_rport_priv *new) {
  2. ... ...
  3.           frport = fcoe_ctlr_rport(rdata);
  4.     LIBFCOE_FIP_DBG(fip, "vn_add rport %6.6x %s state %d\n",
  5.             port_id, frport->fcoe_len ? "old" : "new",
  6.             rdata->rp_state);
  7.     list_for_each_entry_rcu(tmprdata, &lport->disc.rports, peers) {
  8.         printk("%s-%d rdata 0x%llx lport 0x%llx rports 0x%llx\n", __func__, __LINE__,
  9.                 (long long) tmprdata, (long long) lport, (long long) &lport->disc.rports);
  10.     }
  11.     //*frport = *fcoe_ctlr_rport(new);                                                                                                                                                                                                                                            
  12.     printk("%s-%d frport 0x%llx *frport 0x%llx fcoe_ctlr_rport(new) 0x%llx *fcoe_ctlr_rport(new) 0x%llx\n",
  13.             __func__, __LINE__, (long long) frport, (long long) &(*frport),
  14.             (long long)(fcoe_ctlr_rport(new)), (long long)&(*fcoe_ctlr_rport(new)));
  15.     list_for_each_entry_rcu(tmprdata, &lport->disc.rports, peers) {
  16.         printk("%s-%d rdata 0x%llx lport 0x%llx rports 0x%llx\n", __func__, __LINE__,
  17.                 (long long) tmprdata, (long long) lport, (long long) &lport->disc.rports);
  18.     }
  19.     frport = fcoe_ctlr_rport(new);
  20.     frport->time = 0;
  21.     list_for_each_entry_rcu(tmprdata, &lport->disc.rports, peers) {
  22.         printk("%s-%d rdata 0x%llx lport 0x%llx rports 0x%llx\n", __func__, __LINE__,
  23.                 (long long) tmprdata, (long long) lport, (long long) &lport->disc.rports);
  24.     }
  25. }
复制代码
  1. [ 1437.536857] fc_rport_create-182 rdata 0xffffd1aa9dedf400  lport 0xffffd202a4a0da08 rports 0xffffd202a4a0da50
  2. [ 1437.538199] fcoe_ctlr_vn_add-2527 rdata 0xffffd1aa9dedf400 lport 0xffffd202a4a0da08 rports 0xffffd202a4a0da50
  3. [ 1437.539130] fcoe_ctlr_vn_add-2540 rdata 0xffffd1aa9dedf400 lport 0xffffd202a4a0da08 rports 0xffffd202a4a0da50
  4. [ 1437.539963] fcoe_ctlr_vn_add-2554 rdata 0xffffd1aa9dedf400 lport 0xffffd202a4a0da08 rports 0xffffd202a4a0da50
  5. [ 1437.540916] fcoe_ctlr_vn_add-2560 frport 0xffffd1aa9dedf400 *frport 0xffffd1aa9dedf400 fcoe_ctlr_rport(new) 0xffffd1aa8b04bb28 *fcoe_ctlr_rport(new) 0xffffd1aa8b04bb28
  6. [ 1437.542148] fcoe_ctlr_vn_add-2563 rdata 0xffffd1aa9dedf400 lport 0xffffd202a4a0da08 rports 0xffffd202a4a0da50
复制代码
分析定位地址异常原因

定位发现*frport = *fcoe_ctlr_rport(new); 该代码执行后遍历&lport->disc.rports,出现了异常地址节点0xfffffffffffffec8,去掉该代码后地址正常
frport = fcoe_ctlr_rport(rdata); frport是rdata(fc_rport_priv)父类结构体fcoe_rportA指针
*frport = *fcoe_ctlr_rport(new);将new(fc_rport_priv)父类结构体fcoe_rportB赋值给rdata(fc_rport_priv)父类结构体fcoe_rportA,但是由于fcoe_rportA.rdata.peers已经加入链表&lport->disc.rports,该赋值会破坏fcoe_rportA.rdata.peers指针内容,从而导致&lport->disc.rports中地址异常。
  1. static inline struct fcoe_rport *fcoe_ctlr_rport(struct fc_rport_priv *rdata)
  2. {
  3.     return container_of(rdata, struct fcoe_rport, rdata);
  4. }
  5. struct fcoe_rport {
  6.     struct fc_rport_priv rdata;
  7.     unsigned long time;
  8.     u16 fcoe_len;
  9.     u16 flags;
  10.     u8 login_count;
  11.     u8 enode_mac[ETH_ALEN];
  12.     u8 vn_mac[ETH_ALEN];
  13. };
  14. struct fc_rport_priv {
  15.     struct fc_lport         *local_port;
  16.     struct fc_rport         *rport;
  17.     struct kref         kref;
  18.     enum fc_rport_state         rp_state;
  19.     struct fc_rport_identifiers ids;
  20.     u16             flags;
  21.     u16                 max_seq;
  22.     u16             disc_id;
  23.     u16             maxframe_size;
  24.     unsigned int                retries;
  25.     unsigned int                major_retries;
  26.     unsigned int                e_d_tov;
  27.     unsigned int                r_a_tov;
  28.     struct mutex                rp_mutex;
  29.     struct delayed_work     retry_work;
  30.     enum fc_rport_event         event;
  31.     struct fc_rport_operations  *ops;
  32.     struct list_head            peers;
  33.     struct work_struct          event_work;
  34.     u32             supported_classes;
  35.     u16                         prli_count;
  36.     struct rcu_head         rcu;
  37.     u16             sp_features;
  38.     u8              spp_type;
  39.     void                (*lld_event_callback)(struct fc_lport *,                                                                                                                                                                                                                  
  40.                               struct fc_rport_priv *,
  41.                               enum fc_rport_event);
  42. };
复制代码
解决方案

此处不直接使用强制结构体赋值,替换为结构体成员手动赋值
  1. -       *frport = *fcoe_ctlr_rport(new);
  2. +       frport->fcoe_len = new->fcoe_len;
  3. +       frport->flags = new->flags;
  4. +       frport->login_count = new->login_count;
  5. +       memcpy(frport->enode_mac, new->enode_mac, ETH_ALEN);
  6. +       memcpy(frport->vn_mac, new->vn_mac, ETH_ALEN);
复制代码
测试正常,不在重启,经排查该问题在开源社区版内核已有成熟的补丁方案https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d478418703d6bcdd163d5d8127683c6c471539f0
  1. scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv
  2. Instead of using the generic 'fc_rport_priv' structure as argument and then
  3. having to painstakingly outcast this to fcoe_rport we should be passing the
  4. fcoe_rport structure itself and reduce complexity.
  5. Signed-off-by: Hannes Reinecke <hare@suse.de>
  6. Reviewed-by: Christoph Hellwig <hch@lst.de>
  7. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
复制代码
来源:https://www.cnblogs.com/forwards/p/17144235.html
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作!

举报 回复 使用道具