xorbits.pandas.json_normalize#

Normalize semi-structured JSON data into a flat table.

Parameters

data (dict or list of dicts) – Unserialized JSON objects.
record_path (str or list of str, default None) – Path in each object to list of records. If not passed, data will be assumed to be an array of records.
meta (list of paths (str or list of str), default None) – Fields to use as metadata for each record in resulting table.
meta_prefix (str, default None) – If True, prefix records with dotted (?) path, e.g. foo.bar.field if meta is [‘foo’, ‘bar’].
record_prefix (str, default None) – If True, prefix records with dotted (?) path, e.g. foo.bar.field if path to records is [‘foo’, ‘bar’].
errors ({'raise', 'ignore'}, default 'raise') –
Configures error handling.
- ’ignore’ : will ignore KeyError if keys listed in meta are not always present.
- ’raise’ : will raise KeyError if keys listed in meta are not always present.
sep (str, default '.') – Nested records will generate names separated by sep. e.g., for sep=’.’, {‘foo’: {‘bar’: 0}} -> foo.bar.
max_level (int, default None) –
Max number of levels(depth of dict) to normalize. if None, normalizes all levels.

New in version 0.25.0.

Returns

frame (DataFrame)
Normalize semi-structured JSON data into a flat table.

Examples

>>> data = [  
...     {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
...     {"name": {"given": "Mark", "family": "Regner"}},
...     {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)  
    id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker

>>> data = [  
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=0)  
    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

Normalizes nested data up to level 1.

>>> data = [  
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)  
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60

>>> data = [  
...     {
...         "state": "Florida",
...         "shortname": "FL",
...         "info": {"governor": "Rick Scott"},
...         "counties": [
...             {"name": "Dade", "population": 12345},
...             {"name": "Broward", "population": 40000},
...             {"name": "Palm Beach", "population": 60000},
...         ],
...     },
...     {
...         "state": "Ohio",
...         "shortname": "OH",
...         "info": {"governor": "John Kasich"},
...         "counties": [
...             {"name": "Summit", "population": 1234},
...             {"name": "Cuyahoga", "population": 1337},
...         ],
...     },
... ]
>>> result = pd.json_normalize(  
...     data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result  
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

>>> data = {"A": [1, 2]}  
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")  
    Prefix.0
0          1
1          2

Returns normalized data with columns prefixed with the given string.

This docstring was copied from pandas.