ZippedWhile I was at the Midsize Enterprise Summit, one of the other attendees was telling me something interesting that he learned about the new Microsoft Office 2007 documents. It appears that the new Office document standard is really just a zip file with XML and media files! It is referred to as the Office Open XML File Format.

Let’s check out the new file format by creating a document in Word 2007. We will then take that document, open it up in using a zip archive program, and then examine the contents of that file. If my colleague is correct, this could mean that it may be very easy to open, edit, and create Microsoft Office 2007 documents without using Office at all!

The first thing that we are going to do is create a Word document. The document that I have created looks similar to this one:

Word Document

The bottom section is a graphic while the Daily Cup of Tech Bean There, Done Tech portions are text.

Next, let’s open up this file in our zip archive viewer of choice. I’m just going to use the archiver built into Windows by renaming the file from .docx to .zip and then double clicking on the file.

There are several files in this archive located in several different folders. The vast majority of these files are XML files with some noted exceptions that we will talk about later. Some of the more important files of note include:

  • /[Content_Types].xml - basic structure of the file.
  • /word/document.xml - actual content of the document.
  • /word/styles.xml - style and style settings used in the document.
  • /docProps/app.xml - properties (Author, Title, etc.) for the document.
  • /word/settings.xml - all setting (zoom factor, etc.) for the Word application when opening this document.
  • /word/theme/theme1.xml - theme and theme settings for this document.
  • /word/fontTable.xml - fonts and font settings for this document.
  • /word/webSettings.xml - web related settings (e.g. web root, etc.).
  • /docProps/core.xml - base information about the document (e.g. time last changed, user last changed, etc.).

There is also a folder called /media that contains all of the non-text information embedded in the document.

I noted a couple of interesting things about the contents of the file that seemed to be rather un-Microsoft (for lack of a better term). First of all, it was incredibly easy to get into the file archive. No proprietary algorithms. No heavily licensed encryption. Just pure XML text in a nice ZIP wrapper.

The other thing that surprised me is the media format. When I pasted the graphic into the Word document, I did not link it to a file. It was just an image in my computer’s clipboard. Yet, when Word saved the file in the /media folder of the archive, it saved it as a PNG file format. This seems a bit unusual to me because Microsoft has always loved its BMP files and to see a PNG file caught me off guard. But, maybe I shouldn’t have been surprised since Windows Vista uses PNG compression for its icons.

The final thing that I thought was interesting was references to websites such as http://purl.org and http://schemas.openxmlformats.org.  Again, this feels more open source-ish and less Microsoft-ish.  Who knows?  Maybe Microsoft has turned over a new leaf!

If you are interested in reading more about the new Office Open XML File Format, you can check out the Microsoft Developer Network article.

In general, I think this is a good move on Microsoft’s part.  If they want to be seen as a more gentle company that is part of the whole computer/Internet community, adopting some open standards like this will really help.  When anyone can access their files with basic tool such as a zip archiver and a text editor, they come across as having nothing less to hide.

Similar Posts:

If you found this post useful, why don't you buy me a cup of coffee to show your gratitude?