曲径通幽论坛

 找回密码
 立即注册
搜索
查看: 7515|回复: 2
打印 上一主题 下一主题

diff -- 比较文件及目录

[复制链接]

4918

主题

5880

帖子

3万

积分

GROAD

曲径通幽,安觅芳踪。

Rank: 6Rank: 6

积分
34397
跳转到指定楼层
楼主
发表于 2011-6-2 14:04:13 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
语法
diff [OPTION]... FILES

参数说明:

-i  --ignore-case :忽略文件内容的大小写。

-ignore-file-name-case :忽略文件名大小写的区别。
--no-ignore-file-name-case :不忽略文件名大小写的区别。
在某些发行版上,这两个选项无效。比如在 Fedora15 上,可能是未曾对此 bug 打过补丁。但在 Debian 及 ubuntu 上却不存在此问题,通过查看 Debian 的修正列表,这个选项的补丁来自 ubuntu 。该选项输出如下:
beyes@debian:~/command/diff$ diff diffdir1 diffdir2
Only in diffdir1: filename
Only in diffdir2: FILENAME
beyes@debian:~/command/diff$ diff --ignore-file-name-case diffdir1 diffdir2
上面,diffdir1 和 diffdir2 分别是两个目录;它们下面分别有 filename 和 FILENAME 两个文件,其中这两个文件的内容相同。

-E  --ignore-tab-expansion :忽略 TAB 键扩展。比如你系统里的 TAB 默认是 3 个空格的,那么两个单词间用一个 TAB 和用 3 个空格,diff 在使用该选项时认为它们是一样的。

-b  --ignore-space-change : 忽略单词间的空格。使用该选项时,单词间的空白都被忽略,这些空白包括 TAB 键的空白以及空格键的空白。
比如下面两份文件内容会被认为是一样的:
[beyes@localhost diff]$ cat filename.txt
hello         world
[beyes@localhost diff]$ cat FILENAME.TXT
hello                world

-w  --ignore-all-space : 忽略所有空白。该选项比 -b 选项又显得有所增强,比较下面两个文件内容,在使用该选项时,diff 认为是一样的:
[beyes@localhost diff]$ cat filename.txt
h e llo         world
[beyes@localhost diff]$ cat FILENAME.TXT
hello                world

-B  --ignore-blank-lines :忽略空白行。

-I RE  --ignore-matching-lines=RE :忽略匹配行。-I 选项后接的 RE 可以是行中的某个单词或某个字符。只要两个文件中的行都具有 RE 匹配,即使这两行还有其它的不同,diff 也会认为它们相同。看下面的例子:
[beyes@localhost diff]$ cat filename.txt
hello         world



are you ok?
[beyes@localhost diff]$ cat FILENAME.TXT
hello                world
are you here?
注意上面内容的不同,下面是两种比较方式:
[beyes@localhost diff]$ diff -bB  filename.txt FILENAME.TXT
2,5c2
<
<
<
< are you ok?
---
> are you here?
下面使用 -I 选项:
[beyes@localhost diff]$ diff -bB -I are filename.txt FILENAME.TXT

are you ok? 和 are you here? 虽然区别于 ok 和 here 两个单词,但因为 -I 的指定匹配,所以 diff 忽略了这行内容的差异。

--strip-trailing-cr : 剪掉每行末端的回车换行。这个选项的作用主要用于不同的系统间对待换行的不同的,比如 linux 的换行是 /r/n,而 widnows 的换行则是 /n。使用该选项就是忽略换行符的比较。看下面例子:
linux 里和 windows 里分别有 1 文件,其中内容都一样(换行符不一样),默认 diff 比较下:
[beyes@localhost diff]$ diff filename.txt FILENAME.txt
1,2c1,2
< are you ok?
< how are you?
---
> are you ok?
> how are you?
加上  --strip-trailing-cr 后的比较:
[beyes@localhost diff]$ diff --strip-trailing-cr filename.txt FILENAME.txt
上面已经没有差异输出了,这正是我们想要的结果。

-a  --text : 将比较的文件都当作是文本文件,这是一种带有强制的行为。比如可以利用该选项比较两个二进制文件,有时两个二进制文件在执行时的功能一样,但是可能一些版权信息被剪掉,通过该选项能够感知到差异之处。类似与 md5 的 checksum 。

-c  -C NUM  --context[=NUM] : 在不同的行处输出 NUM 指定的行的内容,这个 NUM 的值默认为 3。看下面例子:
beyes@debian:~/command/diff$ cat filename
ignore changes whose lines
another sample
for proper operation
the context output format
how are you
are you ok
what is this
what is that
welcome to gorad.net
show the most recent line
ouput ed script
ouput a normal diff
另外一个文件和上面 filename 不同的是在 "what is this" 这句后面添加了个问号,下面比较输出:
beyes@debian:~/command/diff$ diff -C 2 filename FILENAME
*** filename    2011-06-02 16:19:52.000000000 +0800
--- FILENAME    2011-06-02 16:20:04.000000000 +0800
***************
*** 5,9 ****
  how are you
  are you ok
! what is this
  what is that
  welcome to gorad.net
--- 5,9 ----
  how are you
  are you ok
! what is this?
  what is that
  welcome to gorad.net
在输出中,不同的行的前面用 "!" 标识出来。因为 C 指定 NUM=2,所以输出了该行的上下两行内容。

--label LABEL : 使用 LABEL 指定的内容代替输出文件头。这里的“输出文件头”是指上面 -C 和 -c 选项中提到的:
*** filename    2011-06-02 16:19:52.000000000 +0800
--- FILENAME    2011-06-02 16:20:04.000000000 +0800
这里,我们感觉这个头部太长,所以可以指定一个头部将其替换,以能简明输出。所以这个选项要配合 C 或 c 选项来使用,比如:
beyes@debian:~/command/diff$ diff -C 2  --label=filename --label=FILENAME filename FILENAME
*** filename
--- FILENAME

***************
*** 5,9 ****
  how are you
  are you ok
! what is this
  what is that
  welcome to gorad.net
--- 5,9 ----
  how are you
  are you ok
! what is this?
  what is that
  welcome to gorad.net

-p  --show-c-function : 检查 C 语言函数是否发生了变化。这个参数比较实用,可以检查 C 代码。比如有一个 hello.c 的内容为:
[C++] 纯文本查看 复制代码
#include <stdio.h>

void diff_test(void)
{
    printf ("hello diff command");
}

int main()
{
    diff_test();

    return 0;
}

后来发现 diff_test() 函数里的 printf() 漏添加了个换行符 '\n',然后就在 hello2.c 这个文件里修改之。用 diff 来检查函数的变更情况:
beyes@debian:~/command/diff$ diff -p hello.c hello2.c
*** hello.c    2011-06-02 16:50:09.000000000 +0800
--- hello2.c    2011-06-02 16:50:25.000000000 +0800
***************
*** 2,8 ****
  
  void diff_test(void)
  {
!     printf ("hello diff command");
  }
  
  int main()
--- 2,8 ----
  
  void diff_test(void)
  {
!     printf ("hello diff command\n");
  }
  
  int main()
函数的不同之处也用 "!" 标识了出来。

-q  --brief :该选项只是告诉你所比较的文件是否相同,这只是个提示信息:
beyes@debian:~/command/diff$ diff -q hello.c hello2.c
Files hello.c and hello2.c differ

-F RE  --show-function-line=RE :后面的 RE 是正则表达式(RE 是 regular expression)。该选项不会自动采取“内容格式”(也可称为“上下文格式")或”统一格式“的样式输出,如果希望如此,必须指定相应参数。该选项找出在大块相异的地方最近的且匹配所给正则表达式的相同的行。

--normal :默认格式输出

4918

主题

5880

帖子

3万

积分

GROAD

曲径通幽,安觅芳踪。

Rank: 6Rank: 6

积分
34397
沙发
 楼主| 发表于 2011-6-3 10:26:13 | 只看该作者

context format 内容格式(上下文格式)

在 diff 将比较结果输出时,可以以 ”上下文格式“ 这种格式输出,这种输出格式也是对源代码发布更新时输出的标准格式。

“上下文输出” 格式的特点是在比较得出的不同处的输出附近的几行内容。如果要选择这种输出格式,可以通过 -C lines, --context[=lines], 或 -c 这几种参数来指定。

下面比较两个不同内容的文件:

lao 文件的内容:
The Way that can be told of is not the eternal Way;
The name that can be named is not the eternal name.
The Nameless is the origin of Heaven and Earth;
The Named is the mother of all things.
Therefore let there always be non-being,
so we may see their subtlety,
And let there always be being,
so we may see their outcome.
The two are the same,
But after they are produced,
they have different names.

tzu 文件的内容:
The Nameless is the origin of Heaven and Earth;
The named is the mother of all things.

Therefore let there always be non-being,
so we may see their subtlety,
And let there always be being,
so we may see their outcome.
The two are the same,
But after they are produced,
they have different names.
They both may be called deep and profound.
Deeper and more profound,
The door of all subtleties!

在比较这两个文件时使用 -c 参数指定输出格式为 context format :
$ diff -c lao tzu
*** lao    2011-06-03 09:20:59.000000000 +0800
--- tzu    2011-06-03 09:22:00.000000000 +0800
***************
*** 1,7 ****
- The Way that can be told of is not the eternal Way;
- The name that can be named is not the eternal name.
  The Nameless is the origin of Heaven and Earth;
! The Named is the mother of all things.
  Therefore let there always be non-being,
  so we may see their subtlety,
  And let there always be being,
--- 1,6 ----
  The Nameless is the origin of Heaven and Earth;
! The named is the mother of all things.
!
  Therefore let there always be non-being,
  so we may see their subtlety,
  And let there always be being,
***************
*** 9,11 ****
--- 8,13 ----
  The two are the same,
  But after they are produced,
  they have different names.
+ They both may be called deep and profound.
+ Deeper and more profound,
+ The door of all subtleties!
输出说明:
*** lao    2011-06-03 09:20:59.000000000 +0800
--- tzu    2011-06-03 09:22:00.000000000 +0800
这两行是头部信息,其中有 文件名,时间戳,时区。这两行信息可以通过 --label=LABEL 选项参数修改。

像 *** 1,7 ****,--- 1,6 ----,*** 9,11 ****,--- 8,13 ---- 表示的是比较的内容区域(比较起始行到比较结束行)。

在比较输出结果中:
"!" 表示在比较中行与行之间内容的不同。
"+" 表示第 2 个文件中有,而第 1 个文件中完全没有的内容。
"-" 表示第 1 个文件中有,而第 2 个文件中完全没有的内容。

4918

主题

5880

帖子

3万

积分

GROAD

曲径通幽,安觅芳踪。

Rank: 6Rank: 6

积分
34397
板凳
 楼主| 发表于 2011-6-3 11:51:24 | 只看该作者

Unified Format 统一输出格式

Unified Format 格式是上面 Context Format 输出格式的一个变种,它忽略了一些冗余的内容输出,使能这种输出格式可以使用 -U lines, --unified[=lines], 或 -u  这些选项,其中参数 lines 指示在输出相异的结果后输出几行相同的内容,如果不指定则默认为 3 行,如:
lao tzu 仍然采用上面 context format 中引用的内容。
$ diff -U 1 lao tzu
--- lao    2011-06-03 09:20:59.000000000 +0800
+++ tzu    2011-06-03 09:22:00.000000000 +0800
@@ -1,5 +1,4 @@
-The Way that can be told of is not the eternal Way;
-The name that can be named is not the eternal name.
The Nameless is the origin of Heaven and Earth;
-The Named is the mother of all things.
+The named is the mother of all things.
+
Therefore let there always be non-being,
@@ -11 +10,4 @@
they have different names.
+They both may be called deep and profound.
+Deeper and more profound,
+The door of all subtleties!

$ diff -U 2 lao tzu
--- lao    2011-06-03 09:20:59.000000000 +0800
+++ tzu    2011-06-03 09:22:00.000000000 +0800
@@ -1,6 +1,5 @@
-The Way that can be told of is not the eternal Way;
-The name that can be named is not the eternal name.
The Nameless is the origin of Heaven and Earth;
-The Named is the mother of all things.
+The named is the mother of all things.
+
Therefore let there always be non-being,
so we may see their subtlety,

@@ -10,2 +9,5 @@
But after they are produced,
they have different names.
+They both may be called deep and profound.
+Deeper and more profound,
+The door of all subtleties!

$ diff -u lao tzu
--- lao    2011-06-03 09:20:59.000000000 +0800
+++ tzu    2011-06-03 09:22:00.000000000 +0800
@@ -1,7 +1,6 @@
-The Way that can be told of is not the eternal Way;
-The name that can be named is not the eternal name.
The Nameless is the origin of Heaven and Earth;
-The Named is the mother of all things.
+The named is the mother of all things.
+
Therefore let there always be non-being,
so we may see their subtlety,
And let there always be being,

@@ -9,3 +8,6 @@
The two are the same,
But after they are produced,
they have different names.

+They both may be called deep and profound.
+Deeper and more profound,
+The door of all subtleties!

$ diff -U 9 lao tzu
--- lao    2011-06-03 09:20:59.000000000 +0800
+++ tzu    2011-06-03 09:22:00.000000000 +0800
@@ -1,11 +1,13 @@
-The Way that can be told of is not the eternal Way;
-The name that can be named is not the eternal name.
The Nameless is the origin of Heaven and Earth;
-The Named is the mother of all things.
+The named is the mother of all things.
+
Therefore let there always be non-being,
so we may see their subtlety,
And let there always be being,
so we may see their outcome.
The two are the same,
But after they are produced,
they have different names.
         #这里最多只有 7 行相同的,所以即使指定 9 也只能输出 7 行
+They both may be called deep and profound.
+Deeper and more profound,
+The door of all subtleties!
在上面的输出中:
--- lao    2011-06-03 09:20:59.000000000 +0800
+++ tzu    2011-06-03 09:22:00.000000000 +0800
这是输出头,依次为文件名,时间戳,时区。这个头可以用 --label=LABEL 选项进行修改。

@@ -10,2 +9,5 @@ 表示比较的区间。注意比较区间数字前面的加 "+", "-" 符号,这和头部中的 "+", "-" 符号是对应着的,这里 "-" 号代表 lao 这个文件,"+" 号代表 tzu 这个文件。10,2 和 9,5 这种形式为 start,count ,表示从哪行开始比较(start),接着比较多少行(count) 。

在比较结果输出中,前面不带 "-" 和 "+" 号的行表示两个文件中存在相同的行;前面带有 "-" 号的行表示 lao 这个文件中有而  tzu 这个文件中没有或者是 tzu 文件中有异于 lao 文件中的行。"+" 号的意思正好是 "-" 号反过来之意。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|曲径通幽 ( 琼ICP备11001422号-1|公安备案:46900502000207 )

GMT+8, 2025-6-19 02:13 , Processed in 0.086107 second(s), 22 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表