12 Mathematical correctness of dircomp
xfarrow edited this page 2023-05-20 10:34:11 +00:00

Mathematical correctness of dircomp.

Warning This proof does not invalidate the "AS IS" nature of this program, without any warranty whatsoever.

Definitions

Two directories are considered to be equivalent if and only if these conditions apply:

  1. They have the same number of files and folders;
  2. For each file in a directory, there is the same file in the other. Two files are considered to be the same if
    1. they have the same name, including extension;
    2. they have the same content;
  1. For each folder in a directory, there is a folder with the same name in the other;
  2. [Will apply only if the -r option is specified]. For each folder in a directory, there is an equivalent folder in the other, with the same name.

Proof

The main function responsable for telling if two directories are the same is analyze_directories(). We will prove its correctness in order to prove the correctness of the whole program.

Point 1

It is straightforward to note that in every loop over both the directories (while ((element = readdir(directory1)) != NULL) and while ((element = readdir(directory2)) != NULL)) it is checked if an element (which can be a file or a folder) with the same name exists in the other directory. In particular, through the function access() for files and stat() for folders. Hence, knowing that cannot coexist two elements with the same name in the same directory, if one directory has one or more files than the other, it will be catched and the variable is_directory_equal will be set to false.

Point 2

When the first directory is scanned through (in the while ((element = readdir(directory1)) != NULL)), the following actions will be performed for each file:

  • It is checked if a file with the same name exists in the second directory;
  • If it exists there, their content is compared through the functions byte_by_byte_file_comparison() or hash_by_hash_file_comparison().

If one of these conditions fails, then is_directory_equal will be set to false.

When the second directory is scanned through (in the while ((element = readdir(directory2)) != NULL)) only the following action will be performed for each file:

  • It is checked if a file with the same name exists in the first directory. If it does not, is_directory_equal will be set to false.

Because if it does, it's guaranteed that its content is the same, since it's already been analyzed when the first directory was scanned through.

Point 3

As explained in "Point 1", it is checked if each element (including folders) with the same name exists in the other directory. If it does not, is_directory_equal will be set to false.

Point 4

While looping over the elements of the first directory, if a folder is found and if one with the same name exists in the second directory, analyze_directories() will be called again on said folders, generating a recursion. It is guaranteed that the recursion will terminate because it will eventually be reached a leaf of the File System (no subfolders). The result of this recursive call will be put in and with the current value of is_directory_equal and its result assigned to is_directory_equal. This is done because if two subdirectories are not equivalent, then the root directories will not be equal (hence, false and anything = false), but if said subdirectories are equivalent, then the overall result depends on the current result of is_directory_equal (and potentially of the parent recursive calls) (hence, true and anything = [depends on the 'anything']).

Conclusion

We've seen that if one of these four cases fails, the variable is_directory_equal gets set to false. Since nowhere else is_directory_equal gets set to true except in its definition, if it gets set to false, it is guaranteed that the directories will be considered to be not equivalent since this is the value which will be returned from the function.

Instead, if the flow of execution never hits these points (hence, all the cases are true), is_directory_equal will keep being true, which will tell the calling function that the directories are equivalent.