* remove isc.sans.edu lists
It says "Service Suspended" when opening these links
* change Peter Lowe's list to domains only
There is no need to manually strip away all the 127.0.0.1 at the beginning of each line if there is already a method for domains only provided on the website. Could also be modified to ignore old entries with `&startdate%5Byear%5D=2015`.
Adding `&mimetype=plaintext` doesn't seem to change anything for `hostformat=nohtml`, but could be added as well.
* Remove lists intended for adblockers
The Adblock Warning Removal List currently has 559 lines, only two of which are actually useable for dnscypt-proxy (adscat.ru, shellcat.ru).
Fanboy Social currently has 20162 lines and only 118 lines can be used, which is about 0.6%.
CJX Annoyance List: 512 lines, 19 lines usable, but it's just a lite version of the already included Easylists.
Prebake: 1160 lines, 4 lines usable (also not updated since May 2018)
Most of the remaining domains should be covered by a larger domains blocklist, such as Energized BLU, therefore I think it's safe to remove them.
* remove lists included in Energized Blu
Since Energized Blu is enabled by default, there is no need to also enable lists by default that are already contained in it.
Energized Blu contains the following sources:
1hosts, add.2o7Net, add.Dead, Risk & Spam, Adguard Filters, Ador, Anti-PopAds, Coin Blocker, Disconnectme Ads, Malware & Malvertising, EasyPrivacy Specific, hBlock, Lightswitch Ads & Tracking, Spam404, KADhosts, MoaAB, MobileAdTrackers, No Tracking, NSABlocklist, someonewhocares, StevenBlack, Wally3K_Blacklist & Zeus Tracker
* Update domains-blacklist.conf
0. Add more comments so it should be much easier for anyone to get understanding how to choose the rules which is delivered in varies levels.
1. Sort rules from Energized so it is ordered in the sort of size, which would make sense.
* Add rule from AdAway
AdAway seems to be a project last more than 9 years. I tried it for several days and haven't experienced any false positive yet.
* Update Energized Protection URLs
EnergizedProtection url links have changed, it seems they had to delete them from github and moved them to their self hosted domain (block.energized.pro).
* Re enabling EnergizedProtection BLU
I commented it out by mistake oops :)
This is very clumsy, as it doesn't handle time-based rules properly,
and doesn't handle whitelists at all.
Adding globs to the "names" list is also an ugly hack just to have
them included in the final output.
* Improve script to remove redundant lines
Let the script remove those lines that are covered by regular expressions already
* add optional "-o OUTPUT_FILE" argument
This ensures that UTF-8 is used.
The redirect to file functionality from before is maintained, because "default=None" is used for the -o argument
I also fixed the formatting slightly to avoid newlines at the beginning of the file.
* improve glob matching
- rename regexes into globs
- only check trusted (local) files for globs
- use fnmatch instead of manually converting globs into regular expressions and matching them
- modify is_glob function to check only for the following characters: * [ ] ?
- improve get_lines_with_globs function, by using the native filter and lambda functions
- improve covered_by_glob function, by checking if line is part of glob_list, instead of calling is_glob again
- print "ignored entries due to globs in local-additions" to the output as well to better differentiate from other duplicates
Not all URLs are extracted from the complicated csv file.
However, they do offer a txt file for the same list, which does work correctly with the current regex:
https://www.malwaredomainlist.com/forums/index.php?topic=3270.0
This url replacement pull request is easier than rewriting the entire regex (which then breaks other lists).