2023-02-04
In 2022, I set up a local DNS server so I could try out blocking ads on the DNS level with something like pihole. I changed my DHCP server to point to this server (using an IP address like 192.168.1.225) as the primary DNS server and for redundancy I kept Google's Public DNS server 8.8.8.8. as the secondary DNS. That way I could restart my machine without any issues. Things were great. I was able to see devices connect to my new local DNS server and see ads blocked on the DNS level.
However, I eventually noticed my Android devices would curiously ignore my DNS server. After a little investigation, I realized that Android has something called "Private DNS".
By default, Android has Private DNS in the "automatic" aka Opportunistic mode.
With the opportunistic privacy profile, the DNS server IP address may be configured directly by the user or obtained from the local network (using DHCP or some other means). The client resolver attempts to establish a secure connection on port 853 to the specified DNS server. If a secure connection is established, this provides privacy for the user's queries from passive observers on the path. Since the client does not verify the authenticity of the server it is not protected from an active attacker. If the client cannot establish a secure connection on port 853, it falls back to communicating with the DNS server on the standard DNS port 53 over UDP or TCP without any security or privacy. The use of Opportunistic Privacy is intended to support incremental deployment of increased privacy with a view to widespread adoption of the strict privacy profile.— DNS-over-TLS - How it Works
Since I didn't have DNS-over-TLS (DoT) set up, it would go to the secondary DNS server of 8.8.8.8, which does have DoT set up. I configured my DNS server to accept DoT connections and got a certificate. Since the authenticity of the server is not supposed to be validated, I assumed a generic cert from Let's Encrypt would get the job done. Unfortunately, by default, Let's Encrypt uses a certificate chain cross signed with an expired root certificate (for more background see these blog posts). Android rejects this immediately upon seeing it. The solution is to regenerate the cert using "ISRG Root X1" as the preferred issuer.
After confirming my DoT server was set up with the right certs and could respond correctly to the 853 port via kdig, I tried again with my Android devices. Still no good. Only a very small number of regular DNS requests were made to my local DNS server, likely for bootstrapping. The majority of requests when browsing were going to Google's Public DNS (8.8.8.8). I assumed I had another configuration issue of some sort. To confirm, I temporarily changed my DHCP server to only have my local DNS server. Surprisingly, this worked and all the requests were indeed being sent via DoT to my local DNS server. But, when there were 2 servers in my DHCP, why did my local server get deprioritized compared to 8.8.8.8?
My first guess was there was some special casing for Google's Public DNS by Android since they are both Google products. I then tried using some other Public DNS servers like Cloudflare's 1.1.1.1 and OpenDNS's 208.67.222.222. I saw the same behavior where my server was always being deprioritized compared to these public DNS servers. Given the wide variety of servers I tried, I felt like it was unlikely to be special casing by Android. I scoured the web for if someone had dug into this and written up an explanation, but I didn't find anything.
At this point, I had no other leads so I started looking at the Android source code to try to figure out the mystery. I found the main folder for DNS resolution related code and started digging. I found a function called res_tls_send, which does the DNS query over TLS. I then found DnsTlsDispatcher::getOrderedAndUsableServerList, which stated a sorted ordering.
// Our preferred DnsTlsServer order is:
// 1) reuse existing IPv6 connections
// 2) reuse existing IPv4 connections
// 3) establish new IPv6 connections
// 4) establish new IPv4 connections
However, I didn't think any of the sorted differences applied since all the addresses were IPv4 and the existence of the connection wouldn't differ between a well known public DNS and my own local DNS server after rebooting my Android device. In that case, the ordering would just be the order of the list of tlsServers that was passed in. That ordering comes from a function called validatedServers. The ordering thus comes a std::map called dotServersMap sorting based on the 32-bit integer representing an IP Address.
In that case, my local DNS server's IP of 192.168.1.225 would correspond to 3232261089. And 8.8.8.8 would correspond to 134744072, which is smaller. That would explain the reason why 8.8.8.8 is prioritized. OpenDNS's 208.67.222.222 would correspond to 3494108894, which is larger than my local server's IP. Despite that, OpenDNS was being prioritized compared to my local DNS.
After staring at the code a ton, I realized with my limited C++ and Android internals knowledge, I probably wasn't going to get too much further without running some code. I compiled Android's DnsResolver code, added some logging, and ran some unit tests. With that I was able to find out that the actual integer values used in the std::map corresponding to 192.168.1.225, 8.8.8.8, and 208.67.222.222 were 3774982336, 134744072, and 3739108304 respectively. So my local IP address in practice was indeed larger than the public DNSes. Google's public DNS matched my calculated values, but the other IP addreses didn't. I wasn't able to figure out why the numbers differed from what I expected, but my brother who is more proficient in C++ was able to deduce that the reason for this. IP Addresses are represented in big endian order aka network byte order. But, my ARM processor had the value stored in little endian order. Thus, the values were reversed, which explains the difference in the integer values.
Since the value of the bytes were being reversed, I just needed to make sure that the lowest byte for my local DNS server was lower than the lowest byte of the public DNS server I used. Thus, I switched the IP of my local DNS server to 192.168.1.4 which would have an integer representation as 67217600, less than Google's Public DNS. Using 192.168.1.4 and OpenDNS's 208.67.222.222, I was finally able to get the expected behavior where my local DNS server was prioritized. Yet, using Google's 8.8.8.8 would still prioritize Google's server over my own.
At this point, I was truly stumped. Luckily, looking at the history of the Git commits to Android, I was able to find the contact for one of the people who wrote this networking code. I asked this person and they explained the situation in about 5 minutes. It turns out there is special casing for Google's 8.8.8.8 after all! This special casing would make 8.8.8.8 automatically be categorized as a DoH (DNS-over-HTTPS) server. Then, in the code right before res_tls_send was called, there is a conditional that checks if there are DoH servers available. If DoH servers are available, it queries using the DoH servers instead of using DoT!
This investigation took months and I was frequently out of my depth, but I'm glad to have finally figured out most of the points of confusion. Because I only want to use a public DNS as a backup in case my local DNS fails, I unfortunately can't use 8.8.8.8 due to the prioritization. I also can't use Cloudflare's 1.1.1.1 because it's really hard to get a lower integer representation as that without changing my subnet. I just picked another public DNS and I now finally have the setup I want. It was a long journey.
Any error corrections or comments can be made by sending me a pull request.