I have vendors that send me reports in XLSX format. I easily convert them into data frames and write them to a csv for import into a database. However, one of them starting sending an HTML file with an xls ending. Using the pandas read_html I can read it into a list. This needs to be converting into a data frame. The list output looks very strange. I have no idea how to tackle it. I have the html fir one row shown below.
<html>
<head>
<style type="text/css">
br {mso-data-placement:same-cell;}
.formattext { mso-number-format:\@;}
.formatdate { mso-number-format:"d MMM yy"; text-align: right;}
.formatdatetime { mso-number-format:"d MMM yy HH:mm"; text-align: right;}
</style>
</head>
<div id="dvHeader">
<table>
<tr>
<td rowspan="5"><img src="https://tracking.agility.com//image/brand_logos/AgilityEmailLogo.gif" alt="Agility" /></td>
</tr>
</table>
<table id="tbl2" border="0" cellpadding="3" cellspacing="0">
<tr>
<td colspan="10" style="font-size:12px;"><b>Agility Tracking report: JP to SG Ocean Shipments view Shipments for Abbott</b></td>
</tr>
</table>
</div>
<div id="divJobList" class="jobList-container">
<table border="0" cellpadding="3" cellspacing="0" width="100%">
<thead id="thead1">
<tr>
<th id="th2" align="left"></th>
</tr>
<tr>
<th id="th1" align="left">Confirmed (1)</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<table border="1" cellpadding="0" cellspacing="0" class="printTable" >
<thead id="thead2"><tr style="height:50px;">
<th oncontextmenu="return false;" bgcolor="#dfdfdf">Agility reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Shipper</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Consignee</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Origin</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Destination</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Consignee reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Shipper reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Mode</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Pieces</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Containers</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Actual Wgt (kg)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Chg Wgt (kg)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Volume (m<sup>3</sup>)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Collection on wheels</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Delivery on wheels</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Orig Flt / Vsl</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Arr Flt / Vsl</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">House Bill</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Master Bill</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Master shipment</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Container number</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Cargo description</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Carrier</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Booking reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Routing</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Booked</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Received</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated departure</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Departed</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated arrival</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Arrived</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated delivery</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Delivered</th>
</tr></thead>
<tbody>
<tr class="normalRow">
<td style="overflow:hidden;"><span style="color:#808080">TYO0535587</span></td>
<td style="overflow:hidden;"><span style="color:#808080">CANON MEDICAL SYSTEMS CORPORATION</span></td>
<td style="overflow:hidden;"><span style="color:#808080">ABBOTT LABORATORIES (SINGAPORE)</span></td>
<td style="overflow:hidden;">Tokyo</td>
<td style="overflow:hidden;">Singapore</td>
<td style="overflow:hidden;"> </td>
<td style="overflow:hidden;">15X0ZYU </td>
<td style="overflow:hidden;">Sea FCL</td>
<td style="overflow:hidden;">4</td>
<td style="overflow:hidden;">1</td>
<td style="overflow:hidden;">1572</td>
<td style="overflow:hidden;">9762</td>
<td style="overflow:hidden;">9.762</td>
<td style="overflow:hidden;">N</td>
<td style="overflow:hidden;">N</td>
<td style="overflow:hidden;">ONE ORPHEUS 063W </td>
<td style="overflow:hidden;"> </td>
<td style="overflow:hidden;">TYO0535587</td>
<td style="overflow:hidden;">TYO210914625</td>
<td style="overflow:hidden;"> </td>
<td style="overflow:hidden;"><span class='ContainerMiscDetails'>TCLU2164763</span></td>
<td style="overflow:hidden;">MEDICAL EQUIPMENT --------------------------</td>
<td style="overflow:hidden;">Hapag Lloyd</td>
<td style="overflow:hidden;">73397670</td>
<td style="overflow:hidden;"><span style="color:#808080">TYO SIN</span></td>
<td style="overflow:hidden;" class="formatdate">7 Sep 21</td>
<td style="overflow:hidden;" class="formatdatetime">24 Sep 21 17:30</td>
<td style="overflow:hidden;" class="formatdatetime">29 Sep 21 20:00</td>
<td style="overflow:hidden;" class="formatdatetime">29 Sep 21 20:06</td>
<td style="overflow:hidden;" class="formatdatetime">6 Oct 21 08:00</td>
<td style="overflow:hidden;" class="formatdatetime">6 Oct 21 08:54</td>
<td style="overflow:hidden;" class="formatdatetime">10 Oct 21 08:00</td>
<td style="overflow:hidden;" class="formatdatetime">7 Oct 21 19:30</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<br/>
</div>
</html>
The output looks like this…
STDOUT message(s) from external script:
The command completed successfully.
[ 0
0 Agility Tracking report: JP to SG Ocean Shipme..., Unnamed: 0 \
0 Agility referenceShipperConsigneeOriginDestina...
1 Agility reference
2 TYO0535587
Confirmed (1) Agility reference \
0 Agility reference Shipper
1 Shipper Consignee
2 CANON MEDICAL SYSTEMS CORPORATION ABBOTT LABORATORIES (SINGAPORE)
Shipper Consignee Origin Destination \
0 Consignee Origin Destination Consignee reference
1 Origin Destination Consignee reference Shipper reference
2 Tokyo Singapore NaN 15X0ZYU
STDOUT message(s) from external script:
Consignee reference Shipper reference Mode ... \
0 Shipper reference Mode Pieces ...
1 Mode Pieces Containers ...
2 Sea FCL 4 1 ...
Unnamed: 57 Unnamed: 58 Unnamed: 59 Unnamed: 60 Unnamed: 61 \
0 73397670.0 TYO SIN 7 Sep 21 24 Sep 21 17:30 29 Sep 21 20:00
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
Unnamed: 62 Unnamed: 63 Unnamed: 64 Unnamed: 65 \
0 29 Sep 21 20:06 6 Oct 21 08:00 6 Oct 21 08:54 10 Oct 21 08:00
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
Unnamed: 66
0 7 Oct 21 19:30
1 NaN
2 NaN
STDOUT message(s) from external script:
[3 rows x 67 columns], Agility reference Shipper \
0 TYO0535587 CANON MEDICAL SYSTEMS CORPORATION
Consignee Origin Destination Consignee reference \
0 ABBOTT LABORATORIES (SINGAPORE) Tokyo Singapore NaN
Shipper reference Mode Pieces Containers ... \
0 15X0ZYU Sea FCL 4 1 ...
Booking reference Routing Booked Received Estimated departure \
0 73397670 TYO SIN 7 Sep 21 24 Sep 21 17:30 29 Sep 21 20:00
Departed Estimated arrival Arrived Estimated delivery \
0 29 Sep 21 20:06 6 Oct 21 08:00 6 Oct 21 08:54 10 Oct 21 08:00
Delivered
0 7 Oct 21 19:30
[1 rows x 33 columns]]
Completion time: 2022-01-26T16:30:41.8386531-06:00
Where do I start? Sorry for the format…I don’t normally post in forums.