The E-Zine Approach
  Turning the (HTML) Tables: Importing Data from the Web Into Microsoft Excel

Lots of information on the Web appears in the form of tables. These tables may contain text or numbers, or both. If you happen to need that information in a document you are creating, there could be a lengthy retyping session involved. Of course, errors creep in any time information is copied through the keyboard. Extended proofreading could also be in store as a result.

Fortunately, you can avoid these problems and turn any table you find on the Web into data in a Microsoft Excel spreadsheet very quickly. In fact, you can easily summarize data from multiple Web sources into a single spreadsheet or workbook.

But wait, there's more. Information from tables on the Web can be placed side-by-side with data imported from other Excel worksheets, from databases, and from text files. All of this data can then be formatted and operated on using Excel, just as you might do with any spreadsheet.

Excel's main tools for summarizing and analyzing data from existing lists, tables, and databases are PivotTables and PivotCharts. These will be covered in a later article. But the first step in creating PivotTables and PivotCharts is to import the data into a worksheet. If the data exists in a Web page, intranet page, or HTML file, the tool used to import it is called a Web query.

Web queries are powerful even if you aren't creating a PivotTable or PivotChart. You might think of Web queries as Copy and Paste on steroids. To understand how these tools work, we need to look at how information is stored in an HTML file to be presented as a table by a browser.

Tables on the Web
You may already know that there are two basic ways to create a table in an HTML file.

The first method is the easiest to understand, but you don't see it too often. It involves using preformatted text or <PRE> tags to set off tabular data from other information on a Web page. A table author can use this tag to create a basic unadorned table on a Web page. The information is typed directly into the HTML file, using tabs and carriage returns to create columns and rows.

For example:

REGION      JANUARY      FEBRUARY      MARCH
NORTH       1500         2000          2500
SOUTH       2500         3000          3500
EAST        4000         4500          5000
WEST        5500         6000          6500

comes out looking like this in the browser:

The second method is a bit more complicated, and it allows some refinements to the appearance of the table in the browser.

To produce a table with borders around the cells, the designer uses <TABLE>, <TR>, and <TD> tags:

<table border=1>
 <tr>
 		<td width=100><p>Region</p></td>
  		<td width=100><p>January</p></td>
  		<td width=100><p>February</p></td>
<td width=100><p>March</p></td>
</tr>
 	<tr>
  		<td width=100><p>North</p></td>
  		<td width=100><p>1500</p></td>
  		<td width=100><p>2000</p></td>
  		<td width=100><p>2500</p></td>
 	</tr>
<tr>
  		<td width=100><p>South</p></td>
 		<td width=100><p>2300</p></td>
  		<td width=100><p>2800</p></td>
  		<td width=100><p>3200</p></td>
 	</tr>
 	<tr>
  		<td width=100><p>East</p></td>
  		<td width=100><p>1750</p></td>
  		<td width=100><p>2250</p></td>
  		<td width=100><p>2750</p></td>
 	</tr>
 	<tr>
  		<td width=100><p>West</p></td>
  		<td width=100><p>2500</p></td>
  		<td width=100><p>3000</p></td>
  		<td width=100><p>3500</p></td>
 	</tr>
</table>

This produces a table with evenly sized cells:

Excel can import both of the example tables directly into a worksheet. There are only two requirements. First, the tables must be created using either <TABLE> or <PRE> tags. There are other methods for presenting tabular data on a Web page, and these cannot be imported into Excel. Second, the first row in any table to be imported must contain the labels for the columns, and every column must have a label.

Importing a Table From the Web Into Excel
You will need to have an open Excel workbook when you start this process.

You will also need to know where the HTML file containing the data you want to import is located. The easiest way to do this is to simply open the file in your Web browser and leave it open on your desktop. As an alternative, you could have the URL of the Web or intranet page handy. If the HTML file is on your own system, you should know its address in standard format (e.g., "C:\My Documents\Production Figures.html").

To verify the format of the table you want to import, right-click on the table and choose View Source from the Shortcut menu. Scroll down the code listing until you find the tags that were used. They must be either <PRE> or <TABLE> tags.

In Excel, open the Data menu.

Find the item Get External Data and select it. Then on the submenu that opens, choose New Web Query…. The New Web Query dialog will open.

The dialog needs to know where to look for the data. The information goes in the text box following Item 1 in the dialog. To enter the information, you can do one of the following things:

  • Enter the URL of the Web page containing the table you want; or
  • Enter the address on your local system of the HTML file containing the table you want; or
  • Click on the Browse Web button. If you already have the Web page or the HTML file open in your browser on the desktop, the URL or address will be entered automatically for you.

Under Item 2 in the dialog, you must indicate which table(s) you want. By default, "Only the tables" is selected. In a case where there are several tables on the page, you may be able to specify the one you want if the tables have titles.

In Item 3, you will usually choose the default formatting (None). However, if you want to retain the fonts and other formatting seen on the Web, choose Rich text or Full HTML formatting.

Choose the OK button at the bottom of the dialog. The Returning External Data to Microsoft Excel dialog will open immediately.

Here you may enter the name of the spreadsheet cell where you want the top left corner of the imported table to appear. You can also decide whether to place the imported table on the current worksheet or on a new sheet. The Properties button allows you to fine-tune the query and the way the data is imported but in most cases you will probably choose OK.

Your data table will be immediately transplanted into the Excel spreadsheet.

Finishing Touches: The Round Trip
Just as with any other Microsoft Office document, you can "round trip" documents between Excel and HTML.

As discussed in the companion article this month ("Microsoft Office Round Trips"), you can save an Excel spreadsheet or workbook as a Web Page (HTML file). When you later open such a document on the Web, using MS Internet Explorer, the icon bar will show an Edit button with the Excel logo on it. By clicking on that Edit button, you can modify the contents of that page in Excel and save the revised document back to the Web. This can be very useful in dealing with price sheets or parts lists maintained online, for example.

Compared to other Office applications, MS Excel offers some additional functionality when saving spreadsheets as HTML.

  • You may save a worksheet individually as an HTML file, or you may choose to save the entire workbook;
  • If you save only the worksheet, you may add interactivity to the Web page so that users can modify the contents of the spreadsheet and see the results on line (users must have the MS Office Web Components installed to take advantage of this); and
  • You may choose to publish the page directly to the Web or to an intranet from within Excel.

Web queries, the ability to import a table from an HTML file, and "round tripping" add up to a lot less effort for professionals who must maintain large amounts of tabular information. In coming months, we'll look at how these tools, together with PivotTables and PivotCharts, give writers the ability to create smart summaries of information from many different sources.



Glencoe McGraw-Hill   A Division of the McGraw-Hill Companies
©2001 The McGraw-Hill Companies, Inc. All rights reserved.