grep 搜索

grep 搜索

grep 搜索

grep
##Usage: grep [OPTION]... PATTERN [FILE]...
##Try 'grep --help' for more information.

**

1. 快速上手

## 创建测试文件
echo -e "ab\na\nb\nc\nd\ne\nf\ng\nA\nB\nC" > test.txt
## 搜索存在'a'的内容
grep a test.txt

NseFc8.png

2. 常用

$gr

## "."模糊匹配
grep a. test.txt

Nse9ht.png

## 显示所在行
grep -n a test.txt

Nsei1f.png

## 输出匹配次数(多少行)
grep -c a test.txt

NsZv0H.png

## 忽略大小写
grep -i a test.txt

NseSAA.png

##多项匹配: "|" 分隔开多选必配项。注意,如果我没记错的话,最多可同时匹配1000项
grep -E 'a|b' test.txt

NseP9P.png

##打印匹配项的上下一行或多行
grep -nA 1 A test.txt # 等同于 grep -n -A 1 A test.txt, -n 用于显示行数
grep -nB 1 A test.txt # 显示匹配的上一行
grep -nA 1 -B 1 A test.txt # 同时显示匹配的上下两行

NseVBQ.png

3. 快速应用:

抓取fasta序列:
测试文件:test.fa.txt
首先,把fasta文件规整成一行ID,一行序列格式。

##通过一系列骚操作,把多行序列转换成单行
cat test.fa.txt | tr '\n' '#'| sed 's/#>/\n>/g'|sed 's/#/\n/;s/#//g' > test.fa

3.1,获取所有的fasta ID

grep ">" test.fa

NseptI.png

3.2,获取杠内ID

grep ">" test.fa| awk -F\| '{print $2}'

NsZx7d.png

3.3,通过grep 多项匹配,获取前两个ID及其序列

grep -EA 1 $(grep ">" test.fa| awk -F\| '{print $2}'|head -n 2| tr '\n' '|'|sed 's/^/"|/;s/$/"/') test.fa
##or
grep -EA 1 $(grep ">" test.fa|\
awk -F\| '{print $2}'|\
head -n 2|\
tr '\n' '|'|\
sed 's/^/"|/;s/$/"/') test.fa

'''
其中,$(grep ">" test.fa| awk -F\| '{print $2}'|head -n 2| tr '\n' '|'|sed 's/^/"|/;s/$/"/') 的结果为
的结果为:
"|Q6GZX4|Q6GZX3|"
因此可以直接被 grep -E 参数识别
'''

NseEng.png

4. 其他详细请看:

参数解释:https://www.runoob.com/linux/linux-comm-grep.html
复杂正则匹配:https://www.cnblogs.com/keithtt/p/6820540.html

Multi-words/strings grep

Shell can only 1000 args and if you exceeded it, you would end as error. For doing large number or words/strings match, we need to input words/strings in a file and grep it later. But the consuming time is raising largely after 1000.

Number or words time
1000 0m1.954s
2000 0m4.825s
5000 0m26.232s

Though, you can match thousands of words in a time, but you don’t have too or it doesn’t deserve it. In this time, a loop would much faster than single threads grep.

Author

Karobben

Posted on

2020-06-26

Updated on

2024-01-11

Licensed under

Comments