Load jsonfile in python using mutli thread parralol processing

punit2106 · February 16, 2018, 4:01pm

Hello,

I am using python code using jsonlines libarary to upload json files. I am converting each json dictionary element to series and concatinating all the series in a dataframe.

jsonlist = []
increment = 0
filecount = 0
 with jsonlines.open("test.ndjson") as counter:
     for j in counter: filecount += 1 
        
 print(filecount)   



 with tqdm.tqdm(jsonlines.open("/home/jovyan/data/onlyyouhotels.ndjson"),total=filecount, unit="json files") as reader:
     series = [json_normalize(j) for j in reader]
        


 data_uonly = pd.concat(series)
 data_uonly.to_pickle('raw_Uonly_pickle')

This takes lots of time to load the data and CPU usages spikes up to 100% due to this. Please suggest a way where i can use parralol processing in this code to load the data and it takes less time and less memory usages. NOw it is taking 2 to 3 hours to load the data for 1.3 gb file.