Address Bar Spoofing Attacks against Microsoft Internet Explorer 6

                    Amit Klein, Trusteer


Summary
=======

IE6 is the second most popular web browser (after IE7), with
market share of around 25% (according to recent surveys e.g.
http://marketshare.hitslink.com/report.aspx?qprid=2).

This write-up presents two new phishing attack techniques,
abusing an address bar issue (security vulnerability) with IE6 in
combination with non-standard DNS domain names. The net result is
that a phishing site may present itself via a link that when
clicked in IE6 displays an almost indistinguishable URL from the
one in used by the genuine site. The technique is new, i.e. it's
different than the ASCII similar characters and IDN homographs
attacks.

There are two techniques: the first technique presents an address
bar which is very similar (visually) to the address bar expected
for the genuine domain, by abusing the NBSP character. The second
technique presents an address bar visually identical to the one
expected for the genuine domain, using the fact that a non-DNSish
characters are not displayed in the address bar in some cases.
This technique requires registration of a non-standard domain,
hence it is probably theoretic only (although "site down"
imitation is still possible).

The attacks were verified with Windows XP SP2 and Windows XP SP3.


Introduction
============

URLs typically include host name, which tells the browser (after
DNS resolution) where to fetch the resource from. While regular
host names contain alphanumeric characters (a-z, A-Z and 0-9),
dots, hyphens and (in Intranets only) underscores, it is possible
to construct (at least syntactically) URLs whose host part
contain any octet (as explained in RFC 1035 section 3.1). The
interpretation of such characters when presented as links (when
IDN is not supported by the browser, see below) by the browser
and by the DNS infrastructure, as well as the way those
characters are presented by the browser (in the address bar) are
the subject of this write-up.

Non-DNS characters can be provided to the browser in several ways
(assuming e.g. an anchor HTML tag context):
*    In raw form, i.e. as a byte (octet), e.g. $
*    In HTML-encoded form, e.g. &#x24;
*    In URL-encoded form, e.g. %24

In raw form, the data is provided as-is. In HTML-encoded form,
the data is considered Unicode, and may undergo encoding. In URL-
encoded format, the data is (again) directly decodable into raw
form. The difference is subtle, but important. The octet values
00-7F (corresponding to the ASCII characters) have a single
interpretation across all systems. However, octet values 80-FF
may have different interpretation depending on the code page and
encoding system in use.


Address bar spoofing in IE6
===========================

Non-DNS characters
------------------
Within the ASCII range (00-7F), only the DNS subset of ASCII
characters is allowed.
As for higher values (e.g. A9 or %A9): IE6 uses DnsQuery_A to
resolve the name. DnsQuery_A assumes that the characters are in
the "current" Windows ANSI codepage (e.g. Windows-1252 or
Windows-1255, see
http://www.microsoft.com/globaldev/reference/WinCP.mspx for a
list of Single Byte code pages). It translates the characters
into UTF-8 representation and sends them this way. So %A9 is URL-
decoded into the byte (\xA9) by IE6, then this raw byte is
forwarded to DnsQuery_A, which interprets it according to the
current codepage (e.g. Windows-1252 or Windows-1255) as
COPYRIGHT_SIGN, moves to Unicode (U+00A9), and UTF-8 encodes this
symbol (into the 2 byte sequence (\xC2) (\xA9)). The net result
is that http://www.foo%A9bar.com goes out as a DNS query on
www.foo(\xC2)(\xA9)bar.com.
As it happens, almost all single-byte character sets (Windows-
1250...Windows-1258) interpret (\xA9) as COPYRIGHT_SIGN, and the
one exception being Windows-874 (Thai) which does not.

NOTE: the code page for a particular Windows box is determined
through the Control Panel (Regional and Language Options ->
advanced [tab], in the Languages for non-Unicode programs). The
Windows ANSI code page is derived from the language specified via
the table as provided in http://msdn.microsoft.com/en-
us/library/ms776260.aspx. For example, if the language is English
(all variants) then the Windows ANSI code page is 1252, whereas
if the language is Hebrew, then the Windows ANSI code page is
1255. As can be seen, the only languages whose code page is not
Windows-1250...Windows-1258 are the far east languages Chinese,
Japanese, Korean and Thai. So with the exception of these
languages, IE6 will request a DNS resolution for
www.foo(\xC2)(\xA9)bar.com when it navigates to
http://www.foo%A9bar.com/.


Attack #1: Raw/HTML-encoded characters
--------------------------------------

IE6 allows "raw" high-bit characters to be typed in the address
bar, e.g.

http://www.foo(c)bar.com/

In such case, the character is displayed in the address bar
(unlike %A9 which is not).

It is possible to present this URL in a link, e.g.:

<a href="http://www.foo&#x00A9;bar.com">FooBar</a>

NOTE: An HTML-encoded character is displayed as the corresponding
Unicode symbol. However, if this symbol is not mapped to the
current code page, IE will not resolve the host name (it shows an
"invalid syntax" error page).

A more interesting, and phishing related example is using the
Non-Blocking Space character (NBSP, Unicode U+00A0). This
character is rendered in the address bar as a space (NBSP is
mapped as 0xA0 in all single-byte character set codepages, i.e.
Windows-1250...Windows-1258 and Windows-874). Thus it opens up an
address bar spoofing trick similar in effect to a one already
disclosed (first reported in BugTraq December 2003:
http://www.securityfocus.com/archive/1/346948, then picked up by
CERT http://www.kb.cert.org/vuls/id/652278 and fixed by Microsoft
as MS04-004).
For example, consider the following phishing link (mimicking
www.yourbankhere.com, yet the real page is served from the domain
phish.site):

<a
href="http://www.yourbankhere.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n
bsp;&nbsp;.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;.phish.site/">YourBankHere</a>

It should be noted that auto-complete does work for these URLs.

When the address bar box is not wide enough to show the whole
URL, the picture is almost identical to that of the genuine URL
(notice there's no slash after the host name, and the additional
dots). When the address bar is at its full width, some users may
still be fooled as the real domain is way off to the right,
separated from the left part of the hostname by many white
spaces. This shows up visually as (may wrap around in the text):

http://www.yourbankhere.com                              
.                               .             .phish.site/

The attack can be easily implemented using DNS wildcard mapping,
assuming the attacker controls the phish.site domain. The
attacker simply needs to add the following line for the
phish.site zone configuration file (tested with BIND9):

*.phish.site.    IN    A    ...IP address...

Note that the Host header will contain raw 0xA0 bytes. So by
including the following PHP code in the index.php of the phishing
server, the attacker can cater for multiple simultaneous phishing
attacks:

<?php
$match=array();
preg_match("/^([a-zA-Z0-9_.-]+)\\xA0/",
           $_SERVER['HTTP_HOST'],$match);
echo "This is a phishing site for ".$match[1];
?>


Attack #2 (theoretic): URL-encoded characters
---------------------------------------------
It's possible to include URL-encoded characters in the address
bar of IE6. IE6 URL-decodes them before querying the DNS, and
internally this is how they are kept.

Now, here's where it gets interesting: high-bit characters will
not be displayed in the address bar. So instead of showing
visually as "http://www.foo%A9bar.com/" (or
"http://www.foo(c)bar.com/") as one may expect, the address bar
will show "http://www.foobar.com/".

Theoretically, this can be used for phishing. A phisher can
register, say foo(\xC2)(\xA9)bar.com and use that in a phishing
URL (http://www.foo%A9bar.com/). When clicked, the IE6 address
bar will display the expected URL, http://www.foobar.com/.
However, this vulnerability seems to be theoretic only, since (in
the author's limited experience), it's not possible
administratively to register such domain names.

As for domain security, as far as IE6 is concerned, these are two
different domains. Cookies are not shared, access across domains
is denied, SSL certificate will not match, etc. Also, the Host
header includes the value with the original raw character - i.e.
the Host header is:

Host: www.foo(\xA9)bar.com

Even if no real domain can be registered, this can still be
somewhat of an annoyance. For example, spam can offer a URL as
evidence that a company's site is not available, or was hacked.
So if an attacker wants to defame www.foobar.com, he may do so by
sending spam with text such as "foobar inc. went chapter 11 -
site is down. Check out http://www.foo%A9bar.com/". This will end
up in DNS resolution failure.

Auto-completion applies to the address bar string (not the real
URL), hence auto-completing, say, www.fo will result in
www.foobar.com (the real domain name), and the browser will
navigate to the genuine site.


Vendor status
=============

Microsoft (MSRC) was informed of the two issues on January 13th,
2008. MSRC acknowledged the two problems and assigned the first
one the ticket MSRC7899, and the second one MSRC7900. However,
Microsoft declined to fix the issues.


Additional notes and observations
=================================


Label and name lengths
----------------------

Labels are limited (per RFC 1034 section 2.3.4) to 63 octets.
This means that no more than 31 consecutive NBSPs can be used.
The trick is to split them into labels (by inserting a dot).
Names are limited to 255 octets (per RFC 1034 section 2.3.4).
This includes the accumulated length of all labels, plus a length
octet preceding all labels (including the 0-length root label).
Apparently, both restrictions are enforced by DnsQuery_A (in
fact, names are limited to 254-256 bytes, including dots).

Additionally, it seems that a non-DNS character counts as 3
octets towards the total name limit (but not towards the label
length limit). A possible explanation is that when a non-ASCII
character is encountered, the worst case UTF-8 representation
length is used (3 octets) rather than the actual UTF-8
rerpesentation length (2 bytes for characters whose Unicode index
is smaller than 0x800 e.g. NBSP, 3 bytes for all other Unicode
characters). Thus, it is impossible to use more than 85 such
characters.


HTTP Caching
------------

There seems to be an additional bug in IE6 regarding how
resources whose URL contain high-bit set bytes are cached. The
key URL is constructed in an erroneous manner. It seems that the
key is constructed as following: take the string consisting of
the URL-encoded URL, e.g. http://www.foo%A9bar.com/, and
overwrite it with the decoded (shorter) URL,
http://www.foo(\xA9)bar.com/, in this case resulting in
http://www.foo(\xA9)bar.comom/. Obviously no regular URL will
match this key, so the caching is meaningless.

The key used for caching retrieval is probably the URL-encoded
version of the URL. And since it's never there, the effect is of
no-caching.


DNS caching
-----------

It was also verified that BIND 9 (the most popular DNS server
software) is capable of serving such domains, both as an
authoritative server and as a caching DNS server (verified with
BIND 9.2.4 as an authoritative and caching server, and 9.4.1-P1
as a caching server). In order to configure BIND to serve the
domain as an authoritative name server, the high-bit bytes
(\xC2)(\xA9) should be inserted to the zone file. Care should be
exercised here with the choice of editor, since some text editors
don't handle high-bit bytes well. It is advised to review the
file contents with a hex-dump tool (e.g. od) to ensure that the
correct bytes were entered.

Windows DNS server (verified with Windows 2003 for Small Business
Server SP2) as a cache server also supports such domain names (no
testing was done regarding Windows DNS server as an authoritative
server).


Root and TLD support
--------------------

Apparently, the root servers and the .COM/.NET gTLD servers are
indifferent to non-ASCII characters in sub-domains, and they will
happily respond to such queries pointing at the authoritative DNS
server (which would be the attacker's server).


IDN and homographs
------------------

The first attack, while abusing the same underlying phenomenon
(different logical symbols which are rendered into graphically
identical or almost indistinguishable forms - SP and NBSP in our
case) is nonetheless completely different than the homograph
attack (http://www.shmoo.com/idn/homograph.txt), which makes use
of the IDN extension to DNS.

The second (theoretic) attack is completely different because the
non-DNS characters simply do not appear in the address bar.


Author's additional notes:


I noticed that some of the crucial information may have been "reformatted" on the way.                                                
So please note that the HTML markup in attack #1 is                                                                                     
                                                                                                                                        
<a href="http://www.yourbankhere.com                                                                                                    
followed by 30 ampersand-NBSP-semicolon, followed by a dot followed by                                                                  
another 31 ampersand-NBSP-semicolon followed by a dot, followed by 13                                                                   
ampersand-NBSP-semicolon followed by a dot followed by                                                                                  
phish.site/">YourBankHere</a>                                                                                                           
                                                                                                                                        
The address bar appears like                                                                                                            
"http://www.yourbankhere.com" followed by 30 spaces, a dot, 31 spaces, a                                                                
dot, 13 spaces, a dot and finally "phish.site/"                                                                                         
                                                                                                                                        
Thanks,                                                                                                                                 
-Amit