简单的linux Oops定位到bug代码行操作实践
oops日志root@bsp84:/home/xxx/xin.sun# dmesg
oops module init!
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#2] SMP NOPTI
CPU: 0 PID: 16257 Comm: insmod Tainted: G D WOE 5.4.269 #13
Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.4 10/30/2020
RIP: 0010:init_oopsdemo+0x15/0x30
Code: Bad RIP value.
RSP: 0018:ffffabe0c3cafc60 EFLAGS: 00010286
RAX: 0000000000000012 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff927efbe1c8c8 RDI: ffff927efbe1c8c8
RBP: ffffabe0c3cafc60 R08: 0000000000000cf7 R09: ffffffff8ada5c58
R10: 0000000000000000 R11: ffffabe0c3cafad0 R12: ffffffffc09f5000
R13: ffff927ec396cae0 R14: ffffabe0c3cafe68 R15: ffffffffc09f7000
FS:00007f961aea8540(0000) GS:ffff927efbe00000(0000) knlGS:0000000000000000
CS:0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc09f4feb CR3: 000000004506e004 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
? show_regs+0x54/0x60
? __die+0x87/0xd0
? no_context+0x1b1/0x580
? __bad_area_nosemaphore+0x50/0x1f0
? bad_area_nosemaphore+0x16/0x20
? __do_page_fault+0x20d/0x4d0
? __irq_work_queue_local+0x57/0x60
? do_page_fault+0x2c/0xe0
? page_fault+0x34/0x40
? 0xffffffffc09f5000
? init_oopsdemo+0x15/0x30
do_one_initcall+0x4a/0x210
? _cond_resched+0x19/0x40
? kmem_cache_alloc_trace+0x170/0x230
do_init_module+0x4f/0x20f
load_module+0x1e77/0x22f0
__do_sys_finit_module+0xfc/0x120
? __do_sys_finit_module+0xfc/0x120
__x64_sys_finit_module+0x1a/0x20
do_syscall_64+0x57/0x1a0
entry_SYSCALL_64_after_hwframe+0x5c/0xc1
RIP: 0033:0x7f961a9c0539
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 053d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f9 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc74d636d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055c130d767c0 RCX: 00007f961a9c0539
RDX: 0000000000000000 RSI: 000055c130513cee RDI: 0000000000000003
RBP: 000055c130513cee R08: 0000000000000000 R09: 00007f961ac93000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
R13: 000055c130d76770 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: oops_module(OE+) xt_REDIRECT xt_mark ip6table_filter ip6table_nat xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_filter iptable_nat nf_nat ip6table_mangle ip6_tables iptable_mangle bpfilter xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 aufs lyn_drv(OE) nls_iso8859_1 intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ast drm_vram_helper kvm_intel ttm kvm drm_kms_helper ipmi_ssif crct10dif_pclmul drm crc32_pclmul ghash_clmulni_intel binfmt_misc aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate i2c_algo_bit fb_sys_fops syscopyarea sysfillrect joydev sysimgblt input_leds dax_pmem_compat device_dax nd_pmem dax_pmem_core nd_btt mei_me lpc_ich mei ioatdma ipmi_si ipmi_devintf ipmi_msghandler acpi_pad mac_hid acpi_power_meter sch_fq_codel coretemp overlay br_netfilter bridge stp llc nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c parport_pc ppdev lp parport ip_tables
x_tables autofs4 hid_generic usbhid hid ixgbe xfrm_algo ahci dca i40e mdio libahci wmi
CR2: 0000000000000000
---[ end trace 65a5e940388f4905 ]---
RIP: 0010:0x2000000000
Code: Bad RIP value.
RSP: 0018:ffffabe0c13b7a10 EFLAGS: 00010206
RAX: 0000002000000000 RBX: ffff927a9f406580 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 84b0c0b88b841b74
RBP: ffffabe0c13b7a38 R08: 0000000000000001 R09: ffffffff8ada4478
R10: ffff927e81fdd700 R11: ffffabe0c13b7900 R12: ffff927a9f406588
R13: 0000000000000000 R14: ffff927ecf6bf7e0 R15: ffff927eebc09900
FS:00007f961aea8540(0000) GS:ffff927efbe00000(0000) knlGS:0000000000000000
CS:0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001fffffffd6 CR3: 000000004506e004 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
oops日志解读
1、BUG: kernel NULL pointer dereference, address: 0000000000000000
bug:内核空指针引用,地址:00000000000000002、 PGD 0 P4D 0
pgd,p4d,大概试图访问的地址的页表信息,本例中为0,不知道怎么用,有的还有pud3、Oops: 0002 [#2] SMP NOPTI
0002应该是bit1为1,其他为0的一个错误码,[#2]中的2代表oops触发2次了(这是因为我前面已经触发一次),常见error_code如下,还有其他的需要查询内核代码了
* error_code: * bit 0 == 0 means no page found, 1 means protection fault * bit 1 == 0 means read, 1 means write * bit 2 == 0 means kernel, 1 means user-mode * bit 3 == 0 means data, 1 means instruction
SMP NOPTI,显示内核的重要特性SMP和PREEMPT被显示的配置情况。这条信息所在的内核启用了SMP支持,NOPTI不知道干啥用的4、CPU: 0 PID: 16257 Comm: insmod Tainted: G D WOE 5.4.269 #13
CPU后面的数字代表服务器核id,PID指进程id,Comm应该代表执行的命令,最后5.4.269代表当前内核的版本号,#13不知道干啥用的。
中间这个Tainted表示内核污染原因,在kernel/panic.c +316可以找到对应,但说实话看代码也不太明白什么意思。。。
网上有个表格,抄一下:Tainted描述‘G’if all modules loaded have a GPL or compatible license‘P’if any proprietary module has been loaded. Modules without a MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by insmod as GPL compatible are assumed to be proprietary.‘F’if any module was force loaded by “insmod -f”.‘S’if the Oops occurred on an SMP kernel running on hardware that hasn’t been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable.‘R’if a module was force unloaded by “rmmod -f”.‘M’if any processor has reported a Machine Check Exception.‘B’if a page-release function has found a bad page reference or some unexpected page flags.‘U’if a user or user application specifically requested that the Tainted flag be set.‘D’if the kernel has died recently, i.e. there was an OOPS or BUG.‘W’if a warning has previously been issued by the kernel.‘C’if a staging module / driver has been loaded.‘I’if the kernel is working around a sever bug in the platform’s firmware (BIOS or similar).5、Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.4 10/30/2020
显而易见6、RIP: 0010:init_oopsdemo+0x15/0x30
init_oopsdemo+0x15,出错函数和相对偏移,0x30是函数大小,前面的0010不清楚
RIP也可能给不出出错函数名,比如RIP: 0010:0x2000000000,就只有一个出错的地址
有的oops还会给PC和LR,如:
PC is at init_oopsdemo+0x24/0x38
LR is at init_oopsdemo+0x18/0x38
PC当时CPU指令持有的地址,LR子程序返回的地址(需要反汇编对应汇编代码的地址)7、RSP: 0018:ffffabe0c13b7a10 EFLAGS: 00010206 到 PKRU: 55555554 一堆
都是异常捕获时cpu寄存器的值8、Call Trace:
? show_regs+0x54/0x60
? __die+0x87/0xd0
? no_context+0x1b1/0x580
? __bad_area_nosemaphore+0x50/0x1f0
? bad_area_nosemaphore+0x16/0x20
? __do_page_fault+0x20d/0x4d0
? __irq_work_queue_local+0x57/0x60
? do_page_fault+0x2c/0xe0
? page_fault+0x34/0x40
? 0xffffffffc09f5000
? init_oopsdemo+0x15/0x30
do_one_initcall+0x4a/0x210
? _cond_resched+0x19/0x40
? kmem_cache_alloc_trace+0x170/0x230
do_init_module+0x4f/0x20f
load_module+0x1e77/0x22f0
__do_sys_finit_module+0xfc/0x120
? __do_sys_finit_module+0xfc/0x120
__x64_sys_finit_module+0x1a/0x20
do_syscall_64+0x57/0x1a0
entry_SYSCALL_64_after_hwframe+0x5c/0xc1
RIP: 0033:0x7f961a9c0539
这是出错后dump出来的调用栈,从下往上调用,可以看到do_page_fault报错前面的是init_oopsdemo+0x15(至于那个纯地址暂时不清楚,先不管),此时去看源代码,如果能直接看出来问题,那就不用往下尝试了,下面的方法是通过相对偏移得到问题代码具体的行号。9、通过gdb得到行号
root@bsp84:/home/xxx/demo/oops_module# gdb oops_module.ko -q
Reading symbols from oops_module.ko...done.
(gdb) list *init_oopsdemo+0x15
0x45 is in init_oopsdemo (/home/xxx/demo/oops_module/oops_module.c:10).
5 MODULE_AUTHOR("ZHONGYI");
6
7 staticint init_oopsdemo(void)
8 {
9 printk("oops module init! \n");
10 *((int*)0x00) = 0x19760817;
11 return 0;
12 }
13
14 module_init(init_oopsdemo);
(gdb)
执行gdb oops_module.ko,然后执行list *init_oopsdemo+0x15,可以看到控制台输出了具体的文件路径、名字以及行号10,并且打印了附近的10行代码。
这里要注意的是,ko文件需要包含符号信息,可以使用file查看:
oops_module.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID=41b822641db831d48e67d8a67501ce3dc7b9bed9, with debug_info, not stripped
显示with debug_info, not stripped,说明编译时带有了符号表,如果没有的话需要加上-g重新编译。
其中一种方式是在Makefile中增加一行:EXTRA_CFLAGS=-g10、使用addr2line
objdump -t my_module.ko | grep my_module_function
addr2line -e ./oops_module.ko 0x15
demo/oops_module/oops_module.c:10
[*]-a:在函数名、文件名和行号信息之前,以十六进制形式显示地址。
[*]-b:指定目标文件的格式为bfdname。
[*]-C:将低级别的符号名解码为用户级别的名字。
[*]-e:指定需要转换地址的可执行文件名,默认文件是a.out。
[*]-f:在显示文件名、行号信息的同时显示函数名。
[*]-s:仅显示每个文件名(the base of each file name)去除目录名。
[*]-i:如果需要转换的地址是一个内联函数,则还将打印返回第一个非内联函数的信息。
[*]-j:读取指定section的偏移而不是绝对地址。
[*]-p:使打印更加人性化:每个地址(location)的信息都打印在一行上。
[*]-r:启用或禁用递归量限制。
[*]--help:打印帮助信息。
[*]--version:打印版本号。
11、faddr2line
和addr2line类似,我的服务器上没装,没有试用。
附录
源码
#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("BSD/GPL");
MODULE_AUTHOR("ZHONGYI");
staticint init_oopsdemo(void)
{
printk("oops module init! \n");
*((int*)0x00) = 0x19760817;
return 0;
}
module_init(init_oopsdemo);
staticvoid cleanup_oopsdemo(void)
{
printk("oops module exit! \n");
}
module_exit(cleanup_oopsdemo);
MODULE_LICENSE("GPL");Makefile
KVERS = $(shell uname -r)
#oops_module.c
# Kernel modules
obj-m += oops_module.o
# Specify flags for the module compilation.
#EXTRA_CFLAGS=-g -O0
build: kernel_modules
kernel_modules:
make -C /lib/modules/$(KVERS)/build M=$(CURDIR) modules
clean:
make -C /lib/modules/$(KVERS)/build M=$(CURDIR) clean
来源:https://www.cnblogs.com/sunsuns/p/18493011
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作!
页:
[1]