I am not getting desire result - please help!

Hello,

I have multiple JSON files and want to extract all keys from it into a CSV. Below is my code but unfortunately I am not getting my desired result. After execution, I am getting the output in a file but only the first level of keys from the JSON file where I am expecting all keys.
Please see, where is the issue,

import json  # For JSON loading
import csv  # For CSV dict writer


def get_leaves(item, key=None, key_prefix=""):
    """
    This function converts nested dictionary structure to flat
    """
    if isinstance(item, dict):
        leaves = {}
        """Iterates the dictionary and go to leaf node after that calls to get_leaves function recursively to go to leaves level"""
        for item_key in item.keys():
            """Some times leaves and parents or some other leaves might have same key that's why adding leave node key to distinguish"""
            temp_key_prefix = (
                item_key if (key_prefix == "") else (key_prefix + "_" + str(item_key))
            )
            leaves.update(get_leaves(item[item_key], item_key, temp_key_prefix))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        elements = []
        """Iterates the list and go to leaf node after that if it is leave then simply add value to current key's list or 
        calls to get_leaves function recursively to go to leaves level"""
        for element in item:
            if isinstance(element, dict) or isinstance(element, list):
                leaves.update(get_leaves(element, key, key_prefix))
            else:
                elements.append(element)
        if len(elements) > 0:
            leaves[key] = elements
        return leaves
    else:
        return {key_prefix: item}


with open("4.json") as f_input, open("output.csv", "w", newline="") as f_output:
    json_data = json.load(f_input, strict=False)
    """'First parse all entries to get the unique fieldnames why because already we have file in RAM level and
    if we put each dictionary after parsing in list or some data structure it will crash your system due to memory constraint
    that's why first we will get the keys first then we convert each dictionary and put it to CSV"""
    fieldnames = set()
    for entry in json_data:
        fieldnames.update(get_leaves(entry).keys())
    csv_output = csv.DictWriter(f_output, delimiter=";", fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)

Hello there,

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

1 Like

I am sorry I was not aware. Thank you

Can you post an example of input and desired output? I’m honestly not sure what you’re trying to accomplish here.

1 Like

This is an example that I have successfully extract the first level of key name “behavior”“apistats” but unfortunately rest of the keys have nested dict and list.

Please let me know how I can share the sample of json file here so it will make sense to test.

Still unclear. What was the input? What are the numbers?

Can you just open the json and print it - then copy paste the output?
Also consider printing your result to test it, before exporting the csv.

Personally, I just created a nested dict/list thing to test your function.
It looks fine to me - except it didn’t create the correct “dict_d” key but only “d”, so that’s a minor issue.
So maybe tell us what different output you would expect from this:

thing = {
    "dict": {
        "b":"c",
        "c":"b",
        "d":["list1", "list2"]
    },
    "list": [
             {
                 "l_dict":"e"
             },
             "list_entry"
    ]
}

get_leaves(thing)
# output
{'dict_b': 'c',
 'dict_c': 'b',
 'd': ['list1', 'list2'], 
 'list_l_dict': 'e',
 'list': ['list_entry']
 }

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.