Hosts Tool : Making Use of Bad Hosts Lists

There are a number of providers on-line who publish hosts files containing domain names of "bad" servers. I introduce here a tool to make use of them efficiently. The angry hosts file tool (originally from Steven Black, merging back currently) is free and open-source, you can get it on my github here.

The angry hosts file tool generates a hosts file for your machine, which will block the domains from the famous bad-hosts listing web sites.  In this article we discuss first the idea behind these lists explaining briefly the concept of domain name resolution and hosts files. We continue to the tool introduction with the explanation of what it does exactly. Finally the discussion on security implications plus several more related questions follows.

Disclaimer On this web site you might read about or get access to various kinds of software and technology, including but not limited to libraries, operating systems, software for communications, mobile phones and tablets, Android software and Linux, even cars and motorcycles, security and penetration testing software, software used in security research and forensics, some samples of software which can be used (elsewhere) for malicious or illegal purposes. You will read about or be provided with the ways to change it, to operate it and to use it. You might find advice and recommendations, which are only an opinion, and not a legal advice or commercial recommendation..
Bear in mind, please, that everything you do, you do solely at your own risk and responsibility. In no way the author of this web site, information, graphics and other materials presented here or related to it can be made liable or anyhow else responsible for your own actions as well as actions of any third party and their direct or indirect results or consequences with or without the use of this information as well as the software, technology and systems mentioned and/or presented here, no matter if developed by the author or by any third party.
In no way it is guaranteed that you will meet any suitability for any particular purpose, safety, security, legality or even simply functioning of the software and systems described here. You have to make sure each time yourself, whether what you do, is really what you intend to do, and that you are ready to be yourself responsible for. All the recommendations and experiences described here are the opinions of corresponding authors and are to be taken with care and own full responsibility.
The software provided on or through this web site, linked to from this web site or anyhow else related to this web site is provided by the corresponding authors on their own terms. We provide all the software here as is without any guarantees to you. You are responsible for deciding whether it is suitable for you or not. You are also responsible for all direct or indirect consequences of using this software.
Other web sites linked to from the current one are out of the author's control, we can not guarantee anything about their content, its quality or even legality. We can not be liable for any use of the linked to web sites or of the information presented there.
We reasonably try to keep this website running smoothly and to deliver information to the best of our knowledge corresponding to the state of the art at the times when the information is composed, usually presented together with the information, and out of good intents. We can not however guarantee and can not be liable for this website being temporarily or permanently unavailable, presenting unreliable information or software, or any other similar or not malfunctioning or functioning not up to your expectations as well as any consequences which might result from this site's operation.


Update:See the newer article about the update of the angry hosts file using dnsmasq here.


Table of Content




We continue below describing the hosts lists and their purpose.

What are the Bad-Hosts List

Here are a few examples of hosts lists published on-line:

  • hosts-file.net - one of the most comprehensive lists, categorized
  • MVPS - Windows-targeting, but generally useful list
  • AdzHosts - a list blocking mostly advertising networks


There are some other sources online.

The purpose of such lists is to block by domain name numerous servers which have some kind of unwanted content by the end user: tracking services, advertisements, malware, viruses, phishing, spam, etc. See for example this classification. We will call these hosts "bad" just to have a way to call them. They are not necessarily all "bad" to the opinion of everybody, e.g. some ads might be desired by some users.

A user could add information from such lists into the "hosts" file of the personal system and thus try to filter out bad sites by domain name.

Below we discuss the DNS basics necessary to understand what stands behind this blocking technique.

DNS Basics

If you already know, how DNS and hosts files work, feel free to skip to the next section.

Domain names are the web-site names we are used to, e.g. "google.com". Dislike people address websites by name, computers need to address each other on the Internet using IP addresses. For example, instead of google.com a computer will have to use 173.194.113.110. A domain name system (DNS) is a distributed service to convert domain names in IP addresses. It is needed for computers, to know, which IP address to contact when a certain domain name is given. This process is called domain name resolution.

DNS is distributed. At first when a DNS query appears an OS service inside each user's machine reacts. It might know some of the domain names like "localhost", some locally configured addresses, and addresses queried before usually. It also knows an address of a DNS server, which is responsible for resolving a domain name, when it is not known locally. Usually an address of the DNS server is provided by the network configuration (often DHCP protocol), but it also can be set up manually (we discuss why this can be useful down in the article).

Every machine has a local simple DNS database. It can be located in a so-called "hosts" file. It is in /etc/hosts in Unix-like systems. You can easily google out, where your system has one. The hosts file has a simple format:

# Comments describe what is happening there.
# Usually an IP address on the left is associated
# with a domain name on the right.
127.0.0.1 localhost

The hosts file normally takes a priority in the domain name resolution process. I.e. if you specify an address of some host there, its address will never be actually resolved through the upstream DNS server, but will be taken from the hosts file.

Putting bad hosts in your hosts file and making them resolve to localhost will effectively block the bad domain names.

127.0.0.1 example.badhost.com

As the majority of user machines do not run any servers, such request (to a bad host originally) will simply fail.

On-line hosts lists are formatted exactly like this making thousands of domain names to resolve to localhost. Below we discuss, how to make use of the multitude of such lists in a practical way.

The Angry Hosts File

Here I motivate for an automation, a tool to build a good hosts file for you, which I have here: https://github.com/qutorial/angryhostsfile. It is an improvement on a Steven Black's tool.

The major differences are:

  • It uses more hosts lists capturing more "bad" hosts 400K vs. 30K.
  • It supports a custom preamble, when you have custom hosts file already. Just save it into myhosts file and you are good to go.

To try it out just clone the repository and run it:

git clone --depth 1 https://github.com/qutorial/angryhostsfile.git hosts
cd hosts
./updateHostsFile.py

Then backup your hosts, if there is something important:

cp /etc/hosts ./myhosts

And run the program:

./updateHostsFile.py

It's an interactive program, it will communicate to you everything what is done. It is not modifying any of your system settings by default, so feel safe to run it. A number of default choices are preselected, so you could just press Enter and get a new hosts file compiled.

Do you want to update all data sources? [Y/n]
Updating source someonewhocares.org from http://someonewhocares.org/hosts/zero/hosts
Updating source hosts-file.net from http://hosts-file.net/download/hosts.txt
Updating source adzhosts from http://skylink.dl.sourceforge.net/project/adzhosts/HOSTS.txt
. . .
Do you want to exclude any domains?
For example, hulu.com video streaming must be able to access its tracking and ad servers in order to play video. [y/N]
OK, we won't exclude any domains.
Success! Your shiny new hosts file has been prepared.
It contains 391,669 unique entries.
!! Make sure to backup your hosts file before replacing it !!
Do you want to replace your existing hosts file with the newly generated file? [y/N] 

Now you should have a hosts file in the working directory containing a compiled collection of bad hosts.

If you want your own custom hosts configuration to appear on its front, just save it into myhosts file before running the script. It will be automatically prepended.

If you answer "yes" to the last question, your password will be asked, and the hosts file will be replaced with the newly generated one. Make sure to make your old hosts file taken into account, once it is not empty and there are any important settings!


What does this tool exactly do?

There are a number of needs you might want to satisfy when making use of the hosts lists provided on-line. The angry hosts file automates it for you. Below we describe the steps a tool could take to make use of the hosts files available on-line.

First, you want to gather them from a number of sources you like. Compiling one list from them without duplicates follows.

Next, you might have your own hosts rules: local machines, some aliases for your servers, your own bad hosts. They have to be added to the final hosts file, preferably in the beginning to stay observable and easily editable.

Afterwards, updates might be needed, as the hosts lists update on-line. Repeating the procedure of downloading and merging is necessary.

Surely you want this final list to be "secure" - to only block the bad hosts, but never override "good" DNS records, e.g. substituting the IP address of Facebook or worse: the IP address of your bank, with a malicious IP. You want to make sure that the lists only mention localhost IP address as the destination (this can also lead to a problem though, but less likely, see the Discussion part below).


This is actually all what the tool is doing for you, so you just have to wait, when the lengthy lists are loaded. Currently they are about 400K.

Discussing the Solution

He we very briefly discuss a number of related topics some of them are of great importance to this article showing the limitations of this blocking approach.

Hosts and security

DNS service in general and hosts file in particular are getting subject to attacks: link, link, another link. The general idea is to redirect your computer without knowing it to another server instead of the wanted one: bad-atacker.com instead of my-bank.com. Or to block access to wanted sites: when get-security-updates.com like website gets blocked to your machine.

Another issue with DNS would be tracking based on it. The idea is that your DNS provider (maybe your ISP?) might track your DNS requests and thus know well which websites you are visiting.

Communicating to your DNS server securely is important as well. DNSCrypt protocol supported by some DNS servers is a significant improvement over unprotected DNS exchange.

Data volumes reduction

By blocking unwanted servers, in particular web-ads service, you can get significant data volume reduction. I tried to compare volumes for a famous weather forecast website. The data reduction was 25% with the hosts file generated by the angry hosts file tool.

You can try it on your own. Switch off blockers in your web browser, and save completely with all the resources a given web page. Then replace the hosts file with the one containing blocks for the bad hosts. Save your page again, compare the size difference.

Redirecting to 127.0.0.1

Is redirecting to the local machine a good blocking method? Well, it's not the most optimal. First of all querying local hosts takes short but not zero time. Next, you could have a server listening on your local ports. Requests coming to it as redirects from bad hosts might not be expected. Additionally, if this server gets infected, it can be a "bad host" itself. Of-course, in this case you already have a problem anyway.

Malware with IP adresses

Bad hosts which address themselves by IPs already do not need DNS resolution. Thus the blocking method described here will not work in this case. Blocking IP's in your firewall could potentially be the way then.

Subdomains problem

Update: This problem is solved by the new version of the angry hosts file tool with dnsmasq, read about it here.

If your hosts file blocks hacker.com, still malware.hacker.com will go through the DNS resolution process and will be not blocked. This is a major weakness of this approach.

0.0.0.0 vs. 127.0.0.1

While studying Steven Black's solution I had to change 0.0.0.0 destination address to 127.0.0.1 despite his intentional migration to 0.0.0.0 of all the lists.

Although in theory 0.0.0.0 should be an invalid address, it resolves to "all the addresses localhost has" in modern OSes. To check it out, try pinging it on your machine: ping 0.0.0.0

Another approach: uBlock origin, Adblock Plus and similar

These are so-called HTML-filters: plug-ins for your browser, which filter our calls to bad domains. Usually they take bad hosts lists as the basic set of hosts to exclude. These tools work on a more high-level ground, removing parts from HTML itself. This can present a more fine-grain solution for the web browser, as you can configure a lot of details there blocking domains, subdomains, and different elements like CSS, JS separately.

Usually plug-ins are installed per-user. So only one user is protected by default when you install these filters. Other users of the system and other programs (e-mail client for instance) as well are not protected by this solution. When using hosts file you are in control of what is blocked and not yourself. When using third-party blockers you give up on this right in favor of the ease of use. The consequences are not always the best, e.g. Adblock is not blocking all of the hosts in the hosts list of our program in scope of the acceptable ads program.

Browser plug-ins are more flexible in the sense that you could configure finely what is getting blocked. This and the fact that they are implemented in a high-level language like Java Script makes them consume much more resources from your machine than an edited hosts file or the new solution using dnsmasq do. Adblock Plus for instance takes more than 60MB of RAM on my machine! This is not the best option if your machine is not the newest one.

Available alternative DNS providers

It might make sense to configure explicitly a DNS server your system is using. It might be a good solution for:

  • security, when trusting a concrete DNS server more
  • tracking, when known that the selected DNS server does not track
  • reliability, when the DNS server is expected to be more reliable
  • avoiding censorship, as some DNS servers might be blocking certain hosts


Here is a big list of DNS servers: https://servers.opennicproject.org/
More servers are googlable, of-course.

Proxies

When browsing using proxies DNS resolution might happen avoiding the use of local DNS subsystem and thus hosts file. The blocking described here will not work. Examples of situations when you do no use local DNS includes:

  • using Opera browser with Opera Turbo mode, it is default in Opera Mini on Android
  • TOR Browser Bundle

This is how the content from ads servers normally blocked might again come to light.

Alternatives for Other Platforms

Although the tool described here is written in Python and is cross-platform you might want to use other tools on platforms where Python does not work as naturally as on PCs:

Sophisticated users could of-course change the hosts file on these platforms after generating it on PC first.

Conclusion

Due to the number of limitations described above this blocking approach is considered by us to be far from ideal. Still it adds a bit on top of your system's security and won't hurt your experience saving your traffic potentially.



Thanks for reading my blog!
Created: 10/29/2015
Last edited on: 10/29/2015
Your comment: