Hi Campers,
I’m currently working with a very complicated .csv file and I have a problem with extracting data. One cell could have the following content:
test_string2 = 'jam = 12, jom = kom, dim_cm = 33,5 x 33,5, mark_o = STRU L EDA P ... hjhkj kdk, hjd hdjk; d-hd-kj dhkhdkjdh '
I would like to extract the data as follows:
k = ['jam', 'jom', 'dim_cm', 'mark_o']
v = ['12', 'kom', '33, 5 x 33,5 ',' STRU L EDA P ... hjhkj kdk, hjd hdjk; d-hd-kj dhkhdkjdh ']
And the final result would be:
0 1
0 jam 12
1 jom kom
2 dim_cm 33,5 x 33,5
3 mark_o STRU L EDA P...hjhkj kdk,hjd hdjk; d-hd-kj dhk...
At the moment I can only get a list of ‘k’, so I would like to ask you to help me to get the list of ‘v’.
My code so far:
>>> import pandas as pd
>>> import re
>>> test_string2 = 'jam=12,jom=kom,dim_cm=33,5 x 33,5,mark_o=STRU L EDA P...hjhkj kdk,hjd hdjk; d-hd-kj dhkhdkjdh '
>>> regex2 = '(\w+?)='
>>> k = re.findall(regex2, test_string2)
>>> k
['jam', 'jom', 'dim_cm', 'mark_o']
>>> k = ['jam', 'jom', 'dim_cm', 'mark_o']
>>> v = ['12', 'kom', '33,5 x 33,5', 'STRU L EDA P...hjhkj kdk,hjd hdjk; d-hd-kj dhkhdkjdh ']
>>> pd.DataFrame(list(map(list, zip(k,v))))
0 1
0 jam 12
1 jom kom
2 dim_cm 33,5 x 33,5
3 mark_o STRU L EDA P...hjhkj kdk,hjd hdjk; d-hd-kj dhk...
Thanks!