|Turning the (HTML) Tables: Importing Data from the Web Into Microsoft Excel
Lots of information on the Web appears in the form of tables. These tables may contain text or numbers, or both. If you happen to need that information in a document you are creating, there could be a lengthy retyping session involved. Of course, errors creep in any time information is copied through the keyboard. Extended proofreading could also be in store as a result.
Fortunately, you can avoid these problems and turn any table you find on the Web into data in a Microsoft Excel spreadsheet very quickly. In fact, you can easily summarize data from multiple Web sources into a single spreadsheet or workbook.
But wait, there's more. Information from tables on the Web can be placed side-by-side with data imported from other Excel worksheets, from databases, and from text files. All of this data can then be formatted and operated on using Excel, just as you might do with any spreadsheet.
Excel's main tools for summarizing and analyzing data from existing lists, tables, and databases are PivotTables and PivotCharts. These will be covered in a later article. But the first step in creating PivotTables and PivotCharts is to import the data into a worksheet. If the data exists in a Web page, intranet page, or HTML file, the tool used to import it is called a Web query.
Web queries are powerful even if you aren't creating a PivotTable or PivotChart. You might think of Web queries as Copy and Paste on steroids. To understand how these tools work, we need to look at how information is stored in an HTML file to be presented as a table by a browser.
Tables on the Web
The first method is the easiest to understand, but you don't see it too often. It involves using preformatted text or <PRE> tags to set off tabular data from other information on a Web page. A table author can use this tag to create a basic unadorned table on a Web page. The information is typed directly into the HTML file, using tabs and carriage returns to create columns and rows.
REGION JANUARY FEBRUARY MARCH NORTH 1500 2000 2500 SOUTH 2500 3000 3500 EAST 4000 4500 5000 WEST 5500 6000 6500
comes out looking like this in the browser:
The second method is a bit more complicated, and it allows some refinements to the appearance of the table in the browser.
To produce a table with borders around the cells, the designer uses <TABLE>, <TR>, and <TD> tags:
<table border=1> <tr> <td width=100><p>Region</p></td> <td width=100><p>January</p></td> <td width=100><p>February</p></td> <td width=100><p>March</p></td> </tr> <tr> <td width=100><p>North</p></td> <td width=100><p>1500</p></td> <td width=100><p>2000</p></td> <td width=100><p>2500</p></td> </tr> <tr> <td width=100><p>South</p></td> <td width=100><p>2300</p></td> <td width=100><p>2800</p></td> <td width=100><p>3200</p></td> </tr> <tr> <td width=100><p>East</p></td> <td width=100><p>1750</p></td> <td width=100><p>2250</p></td> <td width=100><p>2750</p></td> </tr> <tr> <td width=100><p>West</p></td> <td width=100><p>2500</p></td> <td width=100><p>3000</p></td> <td width=100><p>3500</p></td> </tr> </table>
This produces a table with evenly sized cells:
Excel can import both of the example tables directly into a worksheet. There are only two requirements. First, the tables must be created using either <TABLE> or <PRE> tags. There are other methods for presenting tabular data on a Web page, and these cannot be imported into Excel. Second, the first row in any table to be imported must contain the labels for the columns, and every column must have a label.
Importing a Table From the Web Into Excel
You will also need to know where the HTML file containing the data you want to import is located. The easiest way to do this is to simply open the file in your Web browser and leave it open on your desktop. As an alternative, you could have the URL of the Web or intranet page handy. If the HTML file is on your own system, you should know its address in standard format (e.g., "C:\My Documents\Production Figures.html").
To verify the format of the table you want to import, right-click on the table and choose View Source from the Shortcut menu. Scroll down the code listing until you find the tags that were used. They must be either <PRE> or <TABLE> tags.
In Excel, open the Data menu.
Find the item Get External Data and select it. Then on the submenu that opens, choose New Web Query…. The New Web Query dialog will open.
The dialog needs to know where to look for the data. The information goes in the text box following Item 1 in the dialog. To enter the information, you can do one of the following things:
Under Item 2 in the dialog, you must indicate which table(s) you want. By default, "Only the tables" is selected. In a case where there are several tables on the page, you may be able to specify the one you want if the tables have titles.
In Item 3, you will usually choose the default formatting (None). However, if you want to retain the fonts and other formatting seen on the Web, choose Rich text or Full HTML formatting.
Choose the OK button at the bottom of the dialog. The Returning External Data to Microsoft Excel dialog will open immediately.
Here you may enter the name of the spreadsheet cell where you want the top left corner of the imported table to appear. You can also decide whether to place the imported table on the current worksheet or on a new sheet. The Properties button allows you to fine-tune the query and the way the data is imported but in most cases you will probably choose OK.
Your data table will be immediately transplanted into the Excel spreadsheet.
Finishing Touches: The Round Trip
As discussed in the companion article this month ("Microsoft Office Round Trips"), you can save an Excel spreadsheet or workbook as a Web Page (HTML file). When you later open such a document on the Web, using MS Internet Explorer, the icon bar will show an Edit button with the Excel logo on it. By clicking on that Edit button, you can modify the contents of that page in Excel and save the revised document back to the Web. This can be very useful in dealing with price sheets or parts lists maintained online, for example.
Compared to other Office applications, MS Excel offers some additional functionality when saving spreadsheets as HTML.
Web queries, the ability to import a table from an HTML file, and "round tripping" add up to a lot less effort for professionals who must maintain large amounts of tabular information. In coming months, we'll look at how these tools, together with PivotTables and PivotCharts, give writers the ability to create smart summaries of information from many different sources.