曲径通幽论坛

标题: 在 grep 中使用 perl 正则(-P选项) [打印本页]

作者: beyes    时间: 2012-4-28 08:55
标题: 在 grep 中使用 perl 正则(-P选项)
测试文本内容:
[url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg[/url][url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img]]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img][/url][/img]

许多网址连接在一起,我们希望将这些图片网址每行独立出来,那么应该怎么做?

第 1 种方法使用 sed :
[beyes@beyes    shell] $ cat url.txt |sed 's/.jpg/.jpg\n/g' |sed '/^$/d'
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

这里的做法是在每个图片后缀 .jpg 后添加一个换行。

第 2 种方法使用 perl 来实现:
[code=perl]#!/usr/bin/perl

$_ = "[url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg[/url][url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img]]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img][/url][/img]";

s#http(.*?)(.jpg)#http$1$2\n#g;

print "$_";[/mw_shl_code]
运行输出:
[beyes[url=u.php?uid=10][url=u.php?uid=10]@beyes  [/url]  [/url]   shell]$ perl get.pl
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

上面的做法和 sed 的思想一致,也是在 .jpg 后面添加一个换行。需要注意的是正则中使用了 (.*?) 这种形式 --- 非贪婪匹配,这种匹配不会像贪婪匹配那样一直发生到行尾。

第 3 种方法使用 grep:
但是 grep 有没有非贪婪匹配呢?答案是肯定的。一般情况下我们多用 -E 选项(等效于 egrep),这没有像 perl 上面的非贪婪匹配,但是用 -P 选项(即是用 Perl 正则)就可以了,如下所示:
[beyes@beye  shell]$ grep -P -o "http(.*?)(.jpg)" url.txt
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

这比起上面的 Perl 脚本看起来更简洁些,其中 -o 选项表示打印出“仅匹配”的内容;如果省略 -o 选项,那么会将含有匹配部分的整行都打印出来,而这行有时可能会非常的长(常见于 HTML 源码)。-o -P 则表示,仅打印出该行中匹配的部分。





欢迎光临 曲径通幽论坛 (http://www.groad.net/bbs/) Powered by Discuz! X3.2