* Update domains-blacklist.conf
0. Add more comments so it should be much easier for anyone to get understanding how to choose the rules which is delivered in varies levels.
1. Sort rules from Energized so it is ordered in the sort of size, which would make sense.
* Add rule from AdAway
AdAway seems to be a project last more than 9 years. I tried it for several days and haven't experienced any false positive yet.
* Update Energized Protection URLs
EnergizedProtection url links have changed, it seems they had to delete them from github and moved them to their self hosted domain (block.energized.pro).
* Re enabling EnergizedProtection BLU
I commented it out by mistake oops :)
This is very clumsy, as it doesn't handle time-based rules properly,
and doesn't handle whitelists at all.
Adding globs to the "names" list is also an ugly hack just to have
them included in the final output.
* Improve script to remove redundant lines
Let the script remove those lines that are covered by regular expressions already
* add optional "-o OUTPUT_FILE" argument
This ensures that UTF-8 is used.
The redirect to file functionality from before is maintained, because "default=None" is used for the -o argument
I also fixed the formatting slightly to avoid newlines at the beginning of the file.
* improve glob matching
- rename regexes into globs
- only check trusted (local) files for globs
- use fnmatch instead of manually converting globs into regular expressions and matching them
- modify is_glob function to check only for the following characters: * [ ] ?
- improve get_lines_with_globs function, by using the native filter and lambda functions
- improve covered_by_glob function, by checking if line is part of glob_list, instead of calling is_glob again
- print "ignored entries due to globs in local-additions" to the output as well to better differentiate from other duplicates
Not all URLs are extracted from the complicated csv file.
However, they do offer a txt file for the same list, which does work correctly with the current regex:
https://www.malwaredomainlist.com/forums/index.php?topic=3270.0
This url replacement pull request is easier than rewriting the entire regex (which then breaks other lists).