Hello,
I am using python code using jsonlines libarary to upload json files. I am converting each json dictionary element to series and concatinating all the series in a dataframe.
jsonlist = []
increment = 0
filecount = 0
with jsonlines.open("test.ndjson") as counter:
for j in counter: filecount += 1
print(filecount)
with tqdm.tqdm(jsonlines.open("/home/jovyan/data/onlyyouhotels.ndjson"),total=filecount, unit="json files") as reader:
series = [json_normalize(j) for j in reader]
data_uonly = pd.concat(series)
data_uonly.to_pickle('raw_Uonly_pickle')
This takes lots of time to load the data and CPU usages spikes up to 100% due to this. Please suggest a way where i can use parralol processing in this code to load the data and it takes less time and less memory usages. NOw it is taking 2 to 3 hours to load the data for 1.3 gb file.