Two and a half millennia ago (give or take a few decades), the Philosopher King wrapped up the Book of Ecclesiastes by saying, "There is no end to the writing of books."
He must have seen the Internet coming, and the problems of content management on the Web. Thousands of technical documentation professionals can still identify with the Biblical sentiment.
Today, of course, we are less concerned with books than with documents. Documents are containers of content - text, spreadsheets, graphics, or any combination of these. In addition to their traditional printed form, documents are increasingly being made available on the Internet, intranets, and the World Wide Web.
At one point, documents were placed on line mainly as downloadable files. This meant the reader needed to have whatever proprietary software was needed to open them.
The latest development is to make documents available as Web pages. In this form, the documents are available to anyone with a browser. While this is great for the end user, it creates additional problems for the document owner. To begin with, how does the owner transform a document from its native, proprietary format to HTML (HyperText Markup Language) for the browsers?
The Classic "One Way" Trip to HTML
Getting documents onto the Web and maintaining them there can be a real challenge. The traditional method involved a considerable amount of work and the experience was often less than satisfactory.
- The process was "one-way." You could create a document in a word processor (for example) as a binary file. You could then convert it into HTML, but you couldn't bring the HTML file back into the word processor for updating.
- With a "WYSIWYG" ("What You See Is What You Get") word processor, the document displayed in the word processor looked pretty much like the document that came out of the printer. Unfortunately, the converted document as displayed in the browser usually did not look at all like the document as displayed in the word processor or as printed.
- If the HTML file were edited, the content would no longer be the same as the word processor file, and vice versa. Without care being taken, there could be important differences in the information contained in the two files.
Similar problems came up if a spreadsheet or a presentation was converted for use on the Web. Something had to be done.
The answer was a long time in coming, but it is now possible to "round trip" content. Using new technology in Microsoft Office 2000 and later versions, you can:
- Create a document in MS Word, Excel, or PowerPoint and save it in native Office binary format.
- Save the document as an HTML file and publish it on the Web or an intranet.
- While viewing the HTML document in a browser, open and edit the document in the Office application that created it, save it again to the Web as an HTML document, and also save it as an Office document in its native binary format.
MS Office applications can export complicated documents in this way to HTML and bring them back into the application that created them. They can do this without loss of data or formatting. When the HTML file is displayed in a compatible browser, it appears nearly the same as the binary does in the native MS Office application.
When MS Word, Excel, or PowerPoint saves a document as a Web page, the HTML file contains the text. Graphics are also saved. In addition, the file contains a combination of HTML tags, XML tags, and Cascading Style Sheets (CSS).
HTML (HyperText Markup Language) describes the appearance of information in a document. With HTML, content can be displayed in the same way across platforms. HTML also allows linking within and between documents created by different people and software, residing on different systems connected to the network. Users do not have to have proprietary software running on their systems. They need only a browser that is capable of properly interpreting the HTML and displaying the results.
XML (eXtensible Markup Language) describes the structure of a document. XML makes it possible to treat documents as data. In this way, a document can move across platforms and preserve its meaning. This simplifies production of documents that can serve multiple purposes: print, online help, Web delivery, and so on.
Cascading Style Sheets are a way of specifying rules that govern characteristics of HTML documents. You might think of them as being like the "styles" that can be set in a word processor. The styles set override browser defaults so that (for example) text can be displayed just as it would be in a native MS Word document.
The purpose of the tags and the CSS is to preserve the data structure and the formatting of the document. Finally, the file also identifies its parent Office application. This combination is how the user's experience of the document is able to remain constant.
The process is by no means perfect at this point. The Office-specific markup can sometimes considerably increase the size of the HTML file. This means more storage space may be needed and the resulting files may download more slowly.
For all of this, the user gains a number of benefits. For example, documents keep their Office-specific formatting. Features that are not supported on the Web, such as line colors and character effects, are still available in the Office document. These elements are stored with the document. When you open the document in the Web browser, the features won't appear. But when you reopen your Office document in the Office program that created it, you can still view and edit them.
When Would You Round Trip?
You wouldn't just convert Office documents to HTML documents and back again for entertainment. Here are six specific problems that can be solved partly or completely through round tripping.
- You must collaborate with people in remote locations to produce a document.
- You want to put together a Web site but you don't know how to do HTML markup.
- You are maintaining a Web site, but not always from the same location or machine, and other people also work on the site.
- You would like to save a document to the Web so that it doesn't have to be kept on your hard drive.
- You want to put together a Web site but you only have MS Office to use in developing content.
- You want documents that are reusable, portable between platforms, readable by anyone anywhere in the world, and easy to maintain.
How Do You Round Trip?
Round tripping is basically a three-step process:
- Create an MS Office document using Word, Excel, or PowerPoint.
- Save the document as a Web page (HTML file) and publish to the Web.
- Open the HTML file in a compatible browser, choose Edit With… from the toolbar or the File menu, and edit as required in the original Office application.
Round tripping is easiest in MS Word. Begin by creating your document and formatting as you want it to appear. You may also simply open an existing file that you wish to publish as a Web page.
On the File Menu, select Save as Web Page…. You have the option to save the resulting HTML file to your local hard drive, to a Web folder, or to an FTP (File Transfer Protocol) location.
Word will tell you which, if any, of the features in the document are not supported by Web browsers, and how many times each of these features is used.
You can choose to remove the unsupported formatting and apply formatting that is supported. For example, shadow text is removed and the text is formatted as bold. These changes will only affect the HTML document's appearance in the browser. The Word document's appearance will not be changed.
You can also obtain specific information about how to create Web pages that use formatting supported only by specific browsers. Use the Help in Word to do this.
This completes the first half of the trip. Your file has now been saved as HTML.
NOTE: Although the menu choice said, "Save as Web Page," you may still need to publish the HTML file in order to actually make it available to other Web or intranet users. All of the graphics and certain other files that will be needed by browsers to display your document will have been saved to a folder with the same name as your document file. Check with your Webmaster or your Internet Service Provider (ISP) technical staff for details on how to correctly publish your document.
Now for the return trip. Open your favorite browser and point it to the HTML document file on your local system. When the file opens in the browser, you will notice that an "edit" button with the MS Word logo has appeared on your menu bar. There is also a new item on the File menu: "Edit with Microsoft Word for Windows."
Choose either the button or the File menu and Microsoft Word will open (if it is installed on the system you are using). The HTML document will open as a valid Microsoft Word document, with its entire original formatting intact. Make your edits. When you are finished, save the document as a Web page again, and republish it on the Web. At the same time, you should save the edited document as a binary Word file. This provides a backup file in case one is ever needed.
NOTE: If the HTML document is not on the system you are using but it is published on the Web or intranet, you can still edit it. Open the document in your browser, then save it as an HTML file on your local hard drive. Change your browser setting to Work Offline and open the HTML file from your hard drive. From this point on you can follow the instructions above.
So, while we may still be busy with the writing of books and documents, at least technology is making some parts of the process a bit easier. You have to wonder what the Philosopher King would have thought.