Posted on

Word 2007 Blank Pages Between Chapters

While working on a documentation project for a client we ran into a unique problem.  According to good technical writing practices, you always want chapters to start on an odd page.  This puts new chapters on the right-hand page of a bound book.  You also want to ensure that any preceding blank page is not 100% blank, most standards dictate at least a footer with a page number and possibly a header with the title of the prior chapter (old school methods were to put a “This page intentionally left blank.” message on the preceding blank page, which is one of my favorite all time ironies).  After digging around for hours (OK, maybe 10 minutes) I found the solution to this blank page problem.

It turns out that when you are forcing pages to start on an odd page # for things like a chapter to always appear on the right side of a bound book, you end up with blank pages preceding some of those pages.  That is just how Word works.  If your previous chapter ends on an odd page, it automatically inserts the blank so that the new chapter starts on the odd page as well.   However, it is clearly documented that Word does not put ANYTHING on that page, not even headers or footers.  There is not setting that says “carry headers/footers over to blank pages”.

The official Microsoft solution… use the hard page break, CTRL-ENTER just before the odd-page-break whenever you want to force Word to insert the header/footer on the blank page.

http://support.microsoft.com/kb/264905

The problem with that solution?  If your document changes the CTRL-ENTER will force a new page ALWAYS.  If your existing work is updated and the prior chapter now pushes up from ending on an odd page to ending on an even page, guess what?   You get TWO BLANK PAGES… one with a footer (the forced blank page) that is now an odd page # with a footer, and an even page with NOTHING… the exact problem you were trying to solve the first go-around.

Bottom line, to get technical documentation standard page footers AND chapters starting on the right-hand page, you will constantly need to scan & manually revise the page breaks in the document every time you update the content, especially if you lengthen any chapter.

Wonderful.  Thanks Microsoft, thank you very much.  Why even provide “odd page break”?  Might as well keep things simple and force 100% manual management of pages with CTRL-ENTER throughout.   Grrrrr… sometimes working with these high tech tools can be so aggravating.     20+ years of MS-Word development and they still don’t have an elegant solution for this common problem.  It was even documented as far back as 2000 and Microsoft had enough inquiries to have written TWO knowledgebase articles about it back then.

Posted on

Versioning Word Documents In Git

We need your help!


Cyber Sprocket is looking to qualify for a small business grant so we can continue our development efforts. We are working on a custom application builder platform so you can build custom mobile apps for your business. If we reach our 250-person goal have a better chance of being selected.

It is free and takes less than 2 minutes!

Go to www.missionsmallbusiness.com.
Click on the “Login and Vote” button.
Put “Cyber Sprocket” in the search box and click search.
When our name comes up click on the vote button.

 

And now on to our article…

 

At first I didn’t know if I should write this email. I really, really, really do not like dealing with Word documents. It has nothing to do with Word specifically as a product; I hate documents in that kind of format in general, including the stuff OpenOffice.org produces. I don’t like working with WYSIWYG documents, at all. One argument I can make against using Word files on projects is that you can’t meaningfully put them in a repository.

Well—this isn’t true. You can do it, and actually do things like diff Word documents. So ultimately I decided it is more helpful to share this information than to secretly hide it in an attempt to keep people from using that God awful format. Of course, I’m going to regret it as soon as there’s some Word document in one of the repositories…

A rarely used feature of Git (in my experience) is its ability to assign ‘attributes’ to files. You do this by making a .gitattributes file in the repository. It is a text file that maps file names or globs to attributes. A simple example would be

*.fl[av] binary

This tells Git that all ‘flv’ and ‘fla’ files are binary, and therefore Git should never try to diff them or perform any CRLF conversions, regardless of any other settings.

Something else we can do with attributes is control how diffs are generated for files. For our specific task here, we want to tell Git to use our customized ‘diff driver’ for Word documents. We can start out by putting this in our attributes file:

*.doc diff=word

Now whenever Git diffs ‘doc’ files it will invoke the ‘word’ driver. Which means now we have to define that driver. We can do this in one of three places.

    1. Our personal, global .gitconfig file.
    2. A .gitconfig file in the repository that can be shared by developers.
    3. The .git/config file in the repository, which is not shared.

Adding support for diffing certain files is something we typically want to share with everyone on a project, so the second choice makes the most sense here. But the way we define the driver is the same regardless of where we actually do it. First I will show you what we have to put in the file to define the driver, then discuss it.

    textconv = strings

The first line should look familiar if you have messed around with your .gitconfig file before; it is your typical INI file section header. When we assigned the attribute ‘diff=word’ that means Git will look for the section ‘’ for the definition. The second line sets the ‘textconv’ property of the driver; this property names a program or command that is capable of translating the file into a text format which Git can then diff like normal. The ‘strings’ program is part of the GNU binutils package, which you can get on all platforms. It rips out all of the printable strings from a binary file.

With that said, it should be clear now how this helps us diff Word documents. Our driver passes in the ‘doc’ file to a program that can take out all of the printable strings. Even though Word is a binary format, it stores the text of the document as text strings that we can pull out. Once we have done that, Git is capable of diffing the file like normal, and we can meaningfully use tools like ‘git log -p’ to get an idea of the changes that some commit made to a Word document.

This techinque can be used with any file format for which you can generate meaningful text output. For example, if you use a tool to take the metadata out of image files then you can make a driver for that and get useful diff info. This never affects the way Git stores these files; they will still be handled just like any other binary file. The benefits are only cosmetic, allowing us to use Git’s diffing tools to get a better idea of what changes have been applied to those binary files. But nonetheless, that information can be very useful when working with such files.