曲径通幽论坛

标题: 在 grep 中使用 perl 正则(-P选项) [打印本页]

作者: beyes 时间: 2012-4-28 08:55
标题: 在 grep 中使用 perl 正则(-P选项)
测试文本内容：

[url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg[/url][url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img]]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img][/url][/img]

许多网址连接在一起，我们希望将这些图片网址每行独立出来，那么应该怎么做？

第 1 种方法使用 sed ：

[beyes@beyes shell] $ cat url.txt |sed 's/.jpg/.jpg\n/g' |sed '/^$/d'
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

这里的做法是在每个图片后缀 .jpg 后添加一个换行。

第 2 种方法使用 perl 来实现：
[code=perl]#!/usr/bin/perl

$_ = "[url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130449-50.jpg[/url][url=http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img]]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg[img]http://imgp.39yst.com/uploads/allimg/120221/11-120221130450-50.jpghttp://imgp.39yst.com/uploads/al ... 120221130450-51.jpg[/img][/url][/img]";

s#http(.*?)(.jpg)#http$1$2\n#g;

print "$_";[/mw_shl_code]
运行输出：

[beyes[url=u.php?uid=10][url=u.php?uid=10]@beyes [/url] [/url] shell]$ perl get.pl
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

上面的做法和 sed 的思想一致，也是在 .jpg 后面添加一个换行。需要注意的是正则中使用了 (.*?) 这种形式 --- 非贪婪匹配，这种匹配不会像贪婪匹配那样一直发生到行尾。

第 3 种方法使用 grep：
但是 grep 有没有非贪婪匹配呢？答案是肯定的。一般情况下我们多用 -E 选项(等效于 egrep)，这没有像 perl 上面的非贪婪匹配，但是用 -P 选项(即是用 Perl 正则)就可以了，如下所示：

[beyes@beye shell]$ grep -P -o "http(.*?)(.jpg)" url.txt
http://imgp.39yst.com/uploads/allimg/120221/11-120221130449.jpg
http://imgp.39yst.com/uploads/al ... 120221130449-50.jpg
http://imgp.39yst.com/uploads/allimg/120221/11-120221130450.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-50.jpg
http://imgp.39yst.com/uploads/al ... 120221130450-51.jpg

这比起上面的 Perl 脚本看起来更简洁些，其中 -o 选项表示打印出“仅匹配”的内容；如果省略 -o 选项，那么会将含有匹配部分的整行都打印出来，而这行有时可能会非常的长（常见于 HTML 源码）。-o -P 则表示，仅打印出该行中匹配的部分。

欢迎光临曲径通幽论坛 (http://www.groad.net/bbs/)