Awk实战案例精讲

嘁哩喀喳 · 发表于 2023-11-3 14:27:26

Awk实战案例精讲

插入几个新字段

在"a b c d"的b后面插入3个字段e f g。

echo a b c d|awk '{$3="e f g "$3}1'

复制代码

格式化空白

移除每行的前缀、后缀空白，并将各部分左对齐。

aaaa bbb ccc
bbb aaa ccc
ddd fff eee gg hh ii jj

复制代码

awk 'BEGIN{OFS="\t"}{$1=$1;print}' a.txt

复制代码

执行结果：

aaaa bbb ccc
bbb aaa ccc
ddd fff eee gg hh ii jj

复制代码

筛选IPv4地址

从ifconfig命令的结果中筛选出除了lo网卡外的所有IPv4地址。

## 1.法一：
ifconfig | awk '/inet / && !($2 ~ /^127/){print $2}'
# 按段落读取
## 2.法二：
ifconfig | awk 'BEGIN{RS=""}!/lo/{print $6}'
## 3.法三：
ifconfig |\
awk '
BEGIN{RS="";FS="\n"}
!/lo/{$0=$2;FS=" ";$0=$0;print $2;FS="\n"}
'

复制代码

读取.ini配置文件中的某段

[base]
name=os_repo
baseurl=https://xxx/centos/$releasever/os/$basearch
gpgcheck=0
enable=1
[mysql]
name=mysql_repo
baseurl=https://xxx/mysql-repo/yum/mysql-5.7-community/el/$releasever/$basearch
gpgcheck=0
enable=1
[epel]
name=epel_repo
baseurl=https://xxx/epel/$releasever/$basearch
gpgcheck=0
enable=1
[percona]
name=percona_repo
baseurl = https://xxx/percona/release/$releasever/RPMS/$basearch
enabled = 1
gpgcheck = 0

复制代码

awk '
BEGIN{RS=""} # 按段落
/\[mysql\]/{
print;
while( (getline)>0 ){
if(/\[.*\]/){
exit
}
print
}
}' a.txt

复制代码

根据某字段去重

去掉uid=xxx重复的行。

2019-01-13_12:00_index?uid=123
2019-01-13_13:00_index?uid=123
2019-01-13_14:00_index?uid=333
2019-01-13_15:00_index?uid=9710
2019-01-14_12:00_index?uid=123
2019-01-14_13:00_index?uid=123
2019-01-15_14:00_index?uid=333
2019-01-16_15:00_index?uid=9710

复制代码

awk -F"?" '!arr[$2]++{print}' a.txt

复制代码

结果：

2019-01-13_12:00_index?uid=123
2019-01-13_14:00_index?uid=333
2019-01-13_15:00_index?uid=9710

复制代码

次数统计

portmapper
portmapper
portmapper
portmapper
portmapper
portmapper
status
status
mountd
mountd
mountd
mountd
mountd
mountd
nfs
nfs
nfs_acl
nfs
nfs
nfs_acl
nlockmgr
nlockmgr
nlockmgr
nlockmgr
nlockmgr

复制代码

awk '
{arr[$1]++}
END{
OFS="\t";
for(idx in arr){printf arr[idx],idx}
}
' a.txt

复制代码

统计TCP连接状态数量

$ netstat -tnap
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1139/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2285/master
tcp 0 96 192.168.2.17:22 192.168.2.1:2468 ESTABLISHED 87463/sshd: root@pt
tcp 0 0 192.168.2017:22 192.168.201:5821 ESTABLISHED 89359/sshd: root@no
tcp6 0 0 :::3306 :::* LISTEN 2289/mysqld
tcp6 0 0 :::22 :::* LISTEN 1139/sshd
tcp6 0 0 ::1:25 :::* LISTEN 2285/master

复制代码

统计得到的结果：

5: LISTEN
2: ESTABLISHED

复制代码

netstat -tnap |\
awk '
/^tcp/{
arr[$6]++
}
END{
for(state in arr){
print arr[state] ": " state
}
}
'

复制代码

一行式：

netstat -tna | awk '/^tcp/{arr[$6]++}END{for(state in arr){print arr[state] ": " state}}'
netstat -tna | /usr/bin/grep 'tcp' | awk '{print $6}' | sort | uniq -c

复制代码

统计日志中各IP访问非200状态码的次数

日志示例数据：

111.202.100.141 - - [2019-11-07T03:11:02+08:00] "GET /robots.txt HTTP/1.1" 301 169

复制代码

统计非200状态码的IP，并取次数最多的前10个IP。

# 法一
awk '
$8!=200{arr[$1]++}
END{
for(i in arr){print arr[i],i}
}
' access.log | sort -k1nr | head -n 10
# 法二：
awk '
$8!=200{arr[$1]++}
END{
PROCINFO["sorted_in"]="@val_num_desc";
for(i in arr){
if(cnt++==10){exit}
print arr[i],i
}
}' access.log

复制代码

统计独立IP

url 访问IP 访问时间访问人

a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.23|2015-11-20 20:34:48|guest
c.com.cn|202.109.134.24|2015-11-20 20:34:48|guest
a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
a.com.cn|202.109.134.24|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.25|2015-11-20 20:34:48|guest

复制代码

需求：统计每个URL的独立访问IP有多少个(去重)，并且要为每个URL保存一个对应的文件，得到的结果类似：

a.com.cn 2
b.com.cn 2
c.com.cn 1

复制代码

并且有三个对应的文件：

a.com.cn.txt
b.com.cn.txt
c.com.cn.txt

复制代码

代码：

BEGIN{
FS="|"
}
!arr[$1,$2]++{
arr1[$1]++
}
END{
for(i in arr1){
print i,arr1[i] >(i".txt")
}
}

复制代码

输出第二字段重复的所有整行

如下文本内容

1 zhangsan
2 lisi
3 zhangsan
4 lisii
5 a
6 b
7 c
8 d
9 a
10 b

复制代码

问题1：输出第二列重复的所有整行，即输出结果：

1 zhangsan
3 zhangsan
5 a
9 a
6 b
10 b

复制代码

代码：

awk '{
arr[$2]++;
if(arr[$2]>1){
if(arr[$2]==2){
print first[$2]
};
print $0
}else{
first[$2]=$0
}
}' a.txt

复制代码

问题2：输出第二列不重复的所有整行，即输出结果：

2 lisi
4 lisii
7 c
8 d

复制代码

代码：

awk '{
arr[$2]++
first[$2]=$0
}
END{
for(i in arr){
if(arr[i]==1){print first[i]}
}
}' a.txt

复制代码

相邻重复行去重，并保留最后一行

根据字段进行比较，去除相邻的重复行，并保留重复行中的最后一行以及那些非重复行。
a.log内容：

TCP 10.33.4.149:19404 wrr
-> 10.27.4.197:19404 FullNat 10 2 0
TCP 10.33.4.150:19039 wrr
TCP 10.33.4.150:19089 wrr
-> 10.27.4.201:19089 FullNat 10 2 0
TCP 10.33.4.150:19094 wrr
TCP 10.33.4.150:19102 wrr
TCP 10.33.4.150:19107 wrr
-> 10.27.100.150:19107 FullNat 10 18 0
TCP 10.33.4.150:19111 wrr
TCP 10.33.4.150:19112 wrr
TCP 10.33.4.150:19113 wrr
TCP 10.33.4.150:19114 wrr
TCP 10.33.4.150:19207 wrr
-> 10.27.100.150:19207 FullNat 10 18 0

复制代码

以第一字段判断重复，去重相邻行，最终输出结果：

TCP 10.33.4.149:19404 wrr
-> 10.27.4.197:19404 FullNat 10 2 0
TCP 10.33.4.150:19089 wrr
-> 10.27.4.201:19089 FullNat 10 2 0
TCP 10.33.4.150:19107 wrr
-> 10.27.100.150:19107 FullNat 10 18 0
TCP 10.33.4.150:19207 wrr
-> 10.27.100.150:19207 FullNat 10 18 0

复制代码

方案：
[code]# 用第一个字段比较。如果想按其他字段比较，换成$N，N为对应的字段号awk '{ if($1!=prev){ a[++n]=$0; }else{ a[n]=$0 }}{prev=$1}END{ for(i=1;i