Convert an html table to excel spreadsheet
A quick and dirty post for a quick and dirty technique that I used to get the Oracle CPU Database Risk Matrix into a spreadsheet. This won’t always work (the html table I was interested in happened to have the data in the same line of html as the tag), but it’s a start.
- Do ‘View Source’ to get the html for the table.
- Find the table
- Cut and paste the table into a text file. I called mine “risk_matrix_raw.txt”
- Create an awk file (mine was called risk_matrix.awk) with the following lines:
/<tr>/{printf("\n")}
/<td/{printf("%s~", $2)}
- awk the text file, then pipe into sed to remove some of the tags
awk95 -F">" -f risk_matrix.awk risk_matrix_raw.txt |^ sed -e "s/<\/tr//g" -e "s/<\/td//g" \> risk_matrix.txt
- Open the .txt file in Excel. When I did this, Excel asks whether its a fixed width or a delimited file. I specified that the file was delimited with a ‘~’ and Excel opened it.
Note (April 2020): for some reason in the original version of this post I had the code in .jpg files. Mea culpa. I’ve typed the code in, but not been able to test as yet, so, just for reference here are the original .jpgs