Content Ownership and Validating File Types
I was referred to Billy (BK) Rios’s blog as an article there somewhat relates to research I conducted on how browsers react when faced with different combinations of content-types, dispositions, and data. This is where I became aware of the term “Content Ownership” – the concept is familiar but the pretty term was not.
It’s pretty clear that taking ownership of other people’s content is risky business. Especially, given Internet Explorer’s willingness to render data as HTML despite the advertised Content-type. However, serving up other people’s content is often a requirement. Having that content look pretty is also a requirement. Flickr surely can’t send images with a Content-Disposition type of ‘attachment’ without killing their user experience. We are left with a need for additional validation. So how do we do that? File extensions surely don’t work. Nor will anything else that originates from the user and makes claims about the content format. What about file(1) and magic(4)?
For those unfamiliar, magic(4) contains a grammar that is used by file(1) to determine a file type based on the file’s content. For example:
blake@cecilia ~ $ file /bin/ls /bin/ls: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), ... blake@cecilia ~ $ file /etc/passwd /etc/passwd: ASCII text
At first glance, this seems like a viable options. However, as some simple testing will show, those utilizing file(1) and magic(4) to perform content-type validation are in for a surprise.
I was able to construct a GIF, PDF, Macromedia Flash (FLS), Macromedia Flash Video (FLV), and Compressed Macromedia Flash (CWS) files that satisfy file(1) yet render as HTML/JavaScript in Internet Explorer 7. It’s worth noting that I also attempted creating a simple JPEG and PNG in a similar manner but IE would not render them as HTML. I suspect this is due to the upper ASCII characters found within these files – more testing on that later. The implications of this are that if anyone is relying on file(1) or magic(3) for making content ownership decisions they may have a problems.
Below are the files I used for testing:
JPEG: 0000h: FF D8 FF E0 00 10 4A 46 49 46 3C 68 74 6D 6C 3E ......JFIF<html> 0010h: 3C 73 63 72 69 70 74 3E 61 6C 65 72 74 28 29 3B <script>alert(); 0020h: 3C 2F 73 63 72 69 70 74 3E 3C 2F 68 74 6D 6C 3E </script></html>
PDF: 0000h: 50 44 46 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74 PDF<html><script 0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70 >alert();</scrip 0020h: 74 3E 3C 2F 68 74 6D 6C 3E t></html>
FWS: 0000h: 46 57 53 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74 FWS<html><script 0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70 >alert();</scrip 0020h: 74 3E 3C 2F 68 74 6D 6C 3E t></html>
FLW: 0000h: 46 4C 56 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74 FLV<html><script 0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70 >alert();</scrip 0020h: 74 3E 3C 2F 68 74 6D 6C 3E t></html>
CWS: 0000h: 43 57 53 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74 CWS<html><script 0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70 >alert();</scrip 0020h: 74 3E 3C 2F 68 74 6D 6C 3E t></html>
PNG: 0000h: 89 50 4E 47 0D 0A 1A 0A 3C 68 74 6D 6C 3E 3C 73 .PNG....<html><s 0010h: 63 72 69 70 74 3E 61 6C 65 72 74 28 29 3B 3C 2F cript>alert();</ 0020h: 73 63 72 69 70 74 3E 3C 2F 68 74 6D 6C 3E script></html>
GIF: 0000h: 47 49 46 38 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 GIF8<html><scrip 0010h: 74 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 t>alert();</scri 0020h: 70 74 3E 3C 2F 68 74 6D 6C 3E pt></html>
and the file(1) output for these files:
blake@cecilia ~ $ for i in `ls`; do file $i;done compressed_flash.cws: Macromedia Flash data (compressed), version 60 flash.fla: Macromedia Flash data, version 60 flashvideo.fla: Macromedia Flash Video gif.gif: GIF image data 28020 x 15980 jpeg.jpg: JPEG image data, JFIF standard 104.116, thumbnail 99x114 pdf.pdf: Macintosh PDF File (data) : F<html><script>alert();</script png.png: PNG image data, 1668442480 x 1950245228, 101-bit
This brings up a bigger issue – the need for those writing browsers and those writing web platforms to agree on a standardized method for identifying content types. As we can see above, the differences in detection mechanisms will more than likely result in very real security implications.