First attempt to read HTML into DF

I have vendors that send me reports in XLSX format. I easily convert them into data frames and write them to a csv for import into a database. However, one of them starting sending an HTML file with an xls ending. Using the pandas read_html I can read it into a list. This needs to be converting into a data frame. The list output looks very strange. I have no idea how to tackle it. I have the html fir one row shown below.

<html>
<head>
    <style type="text/css">
        
        br {mso-data-placement:same-cell;}
        
        .formattext { mso-number-format:\@;}
        .formatdate { mso-number-format:"d MMM yy"; text-align: right;}
        .formatdatetime { mso-number-format:"d MMM yy HH:mm"; text-align: right;}
    </style>
</head>

<div id="dvHeader">
    <table>
	    <tr>
		    <td rowspan="5"><img src="https://tracking.agility.com//image/brand_logos/AgilityEmailLogo.gif" alt="Agility" /></td>
	    </tr>
    </table>
    <table id="tbl2" border="0" cellpadding="3" cellspacing="0">
	<tr>
		<td colspan="10" style="font-size:12px;"><b>Agility Tracking report: JP to SG Ocean Shipments view Shipments for Abbott</b></td>
	</tr>
</table>

</div>
<div id="divJobList" class="jobList-container">
    

    
    <table border="0" cellpadding="3" cellspacing="0" width="100%">
	<thead id="thead1">
			<tr>
				<th id="th2" align="left"></th>

			</tr>
			<tr>
				<th id="th1" align="left">Confirmed (1)</th>

			</tr>
		</thead>
		<tbody>
			<tr>
				<td>
					<table border="1" cellpadding="0" cellspacing="0" class="printTable" >
						<thead id="thead2"><tr style="height:50px;">
	<th oncontextmenu="return false;" bgcolor="#dfdfdf">Agility reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Shipper</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Consignee</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Origin</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Destination</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Consignee reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Shipper reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Mode</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Pieces</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Containers</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Actual Wgt (kg)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Chg Wgt (kg)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Volume (m<sup>3</sup>)</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Collection on wheels</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Delivery on wheels</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Orig Flt / Vsl</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Arr Flt / Vsl</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">House Bill</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Master Bill</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Master shipment</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Container number</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Cargo description</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Carrier</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Booking reference</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Routing</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Booked</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Received</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated departure</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Departed</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated arrival</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Arrived</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Estimated delivery</th><th oncontextmenu="return false;" bgcolor="#dfdfdf">Delivered</th>
</tr></thead>
						
								<tbody>
							<tr class="normalRow">
	<td style="overflow:hidden;"><span style="color:#808080">TYO0535587</span></td>
	<td style="overflow:hidden;"><span style="color:#808080">CANON MEDICAL SYSTEMS CORPORATION</span></td>
	<td style="overflow:hidden;"><span style="color:#808080">ABBOTT LABORATORIES (SINGAPORE)</span></td>
	<td style="overflow:hidden;">Tokyo</td>
	<td style="overflow:hidden;">Singapore</td>
	<td style="overflow:hidden;">&nbsp;</td>
	<td style="overflow:hidden;">15X0ZYU&nbsp;</td>
	<td style="overflow:hidden;">Sea FCL</td>
	<td style="overflow:hidden;">4</td>
	<td style="overflow:hidden;">1</td>
	<td style="overflow:hidden;">1572</td>
	<td style="overflow:hidden;">9762</td>
	<td style="overflow:hidden;">9.762</td>
	<td style="overflow:hidden;">N</td>
	<td style="overflow:hidden;">N</td>
	<td style="overflow:hidden;">ONE ORPHEUS 063W </td>
	<td style="overflow:hidden;">  </td>
	<td style="overflow:hidden;">TYO0535587</td>
	<td style="overflow:hidden;">TYO210914625</td>
	<td style="overflow:hidden;">&nbsp;</td>
	<td style="overflow:hidden;"><span class='ContainerMiscDetails'>TCLU2164763</span></td>
	<td style="overflow:hidden;">MEDICAL EQUIPMENT --------------------------</td>
	<td style="overflow:hidden;">Hapag Lloyd</td>
	<td style="overflow:hidden;">73397670</td>
	<td style="overflow:hidden;"><span style="color:#808080">TYO SIN</span></td>
	<td style="overflow:hidden;" class="formatdate">7 Sep 21</td>
	<td style="overflow:hidden;" class="formatdatetime">24 Sep 21 17:30</td>
	<td style="overflow:hidden;" class="formatdatetime">29 Sep 21 20:00</td>
	<td style="overflow:hidden;" class="formatdatetime">29 Sep 21 20:06</td>
	<td style="overflow:hidden;" class="formatdatetime">6 Oct 21 08:00</td>
	<td style="overflow:hidden;" class="formatdatetime">6 Oct 21 08:54</td>
	<td style="overflow:hidden;" class="formatdatetime">10 Oct 21 08:00</td>
	<td style="overflow:hidden;" class="formatdatetime">7 Oct 21 19:30</td>
</tr>

								</tbody>
							
					</table>
				</td>
			</tr>
		</tbody>
	</table>
    
    <br/>
</div>







    




    

</html>

The output looks like this…

STDOUT message(s) from external script: 
The command completed successfully.

[                                                   0
0  Agility Tracking report: JP to SG Ocean Shipme...,                                           Unnamed: 0  \
0  Agility referenceShipperConsigneeOriginDestina...   
1                                  Agility reference   
2                                         TYO0535587   

                       Confirmed (1)                Agility reference  \
0                  Agility reference                          Shipper   
1                            Shipper                        Consignee   
2  CANON MEDICAL SYSTEMS CORPORATION  ABBOTT LABORATORIES (SINGAPORE)   

     Shipper    Consignee               Origin          Destination  \
0  Consignee       Origin          Destination  Consignee reference   
1     Origin  Destination  Consignee reference    Shipper reference   
2      Tokyo    Singapore                  NaN              15X0ZYU   


STDOUT message(s) from external script: 
  Consignee reference Shipper reference        Mode       ...        \
0   Shipper reference              Mode      Pieces       ...         
1                Mode            Pieces  Containers       ...         
2             Sea FCL                 4           1       ...         

  Unnamed: 57 Unnamed: 58 Unnamed: 59      Unnamed: 60      Unnamed: 61  \
0  73397670.0     TYO SIN    7 Sep 21  24 Sep 21 17:30  29 Sep 21 20:00   
1         NaN         NaN         NaN              NaN              NaN   
2         NaN         NaN         NaN              NaN              NaN   

       Unnamed: 62     Unnamed: 63     Unnamed: 64      Unnamed: 65  \
0  29 Sep 21 20:06  6 Oct 21 08:00  6 Oct 21 08:54  10 Oct 21 08:00   
1              NaN             NaN             NaN              NaN   
2              NaN             NaN             NaN              NaN   

      Unnamed: 66  
0  7 Oct 21 19:30  
1             NaN  
2             NaN  


STDOUT message(s) from external script: 
[3 rows x 67 columns],   Agility reference                            Shipper  \
0        TYO0535587  CANON MEDICAL SYSTEMS CORPORATION   

                         Consignee Origin Destination  Consignee reference  \
0  ABBOTT LABORATORIES (SINGAPORE)  Tokyo   Singapore                  NaN   

  Shipper reference     Mode  Pieces  Containers       ...        \
0           15X0ZYU  Sea FCL       4           1       ...         

   Booking reference  Routing    Booked         Received Estimated departure  \
0           73397670  TYO SIN  7 Sep 21  24 Sep 21 17:30     29 Sep 21 20:00   

          Departed  Estimated arrival         Arrived Estimated delivery  \
0  29 Sep 21 20:06     6 Oct 21 08:00  6 Oct 21 08:54    10 Oct 21 08:00   

        Delivered  
0  7 Oct 21 19:30  

[1 rows x 33 columns]]


Completion time: 2022-01-26T16:30:41.8386531-06:00

Where do I start? Sorry for the format…I don’t normally post in forums.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.