Skip to content Skip to sidebar Skip to footer

How To Export Dataframe To_json In Append Mode - Python Pandas?

I have an existing json file in a format of list of dicts. $cat output.json [{'a':1, 'b':2}, {'a':2, 'b':3}] And I have a DataFrame df = pd.DataFrame({'a':pd.Series([1,2], index=l

Solution 1:

No, you can't append to a json file without re-writing the whole file using pandas or the json module. You might be able to modify the file "manually" by opening the file in a mode and seeking to the correct position and inserting your data. I wouldn't recommend this though. Better to just use a file format other than json if your file is going to be larger than your RAM.

This answer also might help. It doesn't create valid json files (instead each line is a json string), but its goal is very similar to yours.

Solution 2:

May be you need to think in terms of orient='records':

defto_json_append(df,file):
    '''
    Load the file with
    pd.read_json(file,orient='records',lines=True)
    '''
    df.to_json('tmp.json',orient='records',lines=True)
    #append
    f=open('tmp.json','r')
    k=f.read()
    f.close()
    f=open(file,'a')
    f.write('\n') #Prepare next data entry
    f.write(k)
    f.close()

df=pd.read_json('output.json')
#Save again as lines
df.to_json('output.json',orient='records',lines=True)
#new data
df = pd.DataFrame({'a':pd.Series([1,2], index=list('CD')), \
              "b":pd.Series([3,4], index=list('CD')})
#append:
to_json_append(df,'output.json')

To load full data

pd.read_json('output.json',orient='records',lines=True)

Solution 3:

I've solved it just by using in built pandas.DataFrame methods. You need to remember about the performance in case of huge dataframes (there are ways to deal with it). Code:

if os.path.isfile(dir_to_json_file):
    # if exist open read it
    df_read = pd.read_json(dir_to_json_file, orient='index')
    # add data that you want to save
    df_read = pd.concat([df_read, df_to_append], ignore_index=True)
    # in case of adding to much unnecessery data (if you need)
    df_read.drop_duplicates(inplace=True)

    # save it to json file in AppData.bin
    df_read.to_json(dir_to_json_file, orient='index')
else:
    df_to_append.to_json(dir_to_json_file, orient='index')

Solution 4:

Usecase, write big amount of data to JSON file with small memory:

Let's say we have 1,000 dataframe, each dataframe is like 1000,000 line of json. Each dataframe needs 100MB, the total file size would be 1000 * 100MB = 100GB.

Answer :

  1. use buffer to store content of each dataframe
  2. use pandas to dump it to text
  3. use append mode to write text to the end of file
import io
import pandas as pd
from pathlib_mate import Path

n_lines_per_df = 10
n_df = 3
columns = ["id", "value"]
value = "alice@example.com"
f = Path(__file__).change(new_basename="big-json-file.json")
ifnot f.exists():
    for nth_df inrange(n_df):
        data = list()
        for nth_line inrange(nth_df * n_lines_per_df, (nth_df + 1) * n_lines_per_df):
            data.append((nth_line, value))
        df = pd.DataFrame(data, columns=columns)
        buffer = io.StringIO()
        df.to_json(
            buffer,
            orient="records",
            lines=True,
        )
        withopen(f.abspath, "a") as file:
            file.write(buffer.getvalue())

Post a Comment for "How To Export Dataframe To_json In Append Mode - Python Pandas?"