operations
convert_csv_to_pq(csv_file=None, pq_file=None, dedupe=False)
Read a CSV file into a DataFrame, then write the DataFrame to a Parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
csv_file |
str | Path
|
Path to a CSV file to read from |
None
|
pq_file |
str | Path
|
Path to a Parquet file to write to |
None
|
dedupe |
bool
|
Whether to run .drop_duplicates() on the DataFrame |
False
|
Raises
(Exception): If file cannot be saved, an `Exception` is raised instead of returning
a bool value
Returns
(bool): `True` if `csv_file` is converted to `pq_file` successfully
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
convert_pq_to_csv(pq_file=None, csv_file=None, dedupe=False)
Read a Parquet file into a DataFrame, then write the DataFrame to a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pq_file |
str | Path
|
Path to a Parquet file to read from |
None
|
csv_file |
str | Path
|
Path to a CSV file to write to |
None
|
dedupe |
bool
|
Whether to run .drop_duplicates() on the DataFrame |
False
|
Raises
(Exception): If file cannot be saved, an `Exception` is raised instead of returning
a bool value
Returns
(bool): `True` if `pq_file` is converted to `csv_file` successfully
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
count_df_rows(df=None)
Return count of the number of rows in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
Returns
(int): Count of rows in a `DataFrame`
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
get_oldest_newest(df=None, date_col=None, filter_cols=None)
Get the oldest and newest rows in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
Pandas DataFrame to work on |
None
|
date_col |
str
|
Name of the column to sort by |
None
|
filter_cols |
list[str]
|
List of column names to return with the oldest/newest record. |
None
|
Returns
(pandas.Series|pandas.DataFrame): A Pandas `DataFrame` or `Series` containing oldest & newest records
in the input `DataFrame`.
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_csv(csv_file=None, delimiter=',')
Load a CSV file into a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
csv_file |
str | Path
|
The path to a |
None
|
delimiter |
str
|
The delimiter symbol the |
','
|
Returns
(pandas.DataFrame): A Pandas `DataFrame` with data loaded from the `csv_file`
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_pq(pq_file=None)
Return a DataFrame from a previously saved .parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pq_file |
str | Path
|
Path to a |
None
|
Returns
(pandas.DataFrame): A Pandas `DataFrame` loaded from a `.parquet` file
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_pqs_to_df(search_dir=None, filetype='.parquet')
Load data export files in search_dir into list of DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_dir |
str
|
The directory to search for files in |
None
|
filetype |
str
|
The file extension to filter results by |
'.parquet'
|
Returns
(list[pandas.DataFrame]): A list of Pandas `DataFrame`s created from files in `search_dir`
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
rename_df_cols(df=None, col_rename_map=None)
Return a DataFrame with columns renamed based on input col_rename_map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
col_rename_map |
dict[str, str]
|
A Python |
None
|
Returns
(pandas.DataFrame): A renamed Pandas `DataFrame`.
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
save_csv(df=None, csv_file=None, columns=None, dedupe=False)
Save DataFrame to a .csv file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
csv_file |
str | Path
|
The path to a |
None
|
columns |
list[str]
|
A list of string values representing column names for the |
None
|
dedupe |
bool
|
If |
False
|
Raises
Raises
(Exception): If file cannot be saved, an `Exception` is raised
Returns
(bool): `True` if `DataFrame` is saved to `csv_file` successfully
(bool): `False` if `DataFrame` is not saved to `csv_file` successfully
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
save_pq(df=None, pq_file=None, dedupe=False)
Save DataFrame to a .parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
pq_file |
str | Path
|
The path to a |
None
|
dedupe |
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
Exception
|
If file cannot be saved, an |
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
|