operations
convert_csv_to_pq(csv_file=None, pq_file=None, dedupe=False)
Read a CSV file into a DataFrame, then write the DataFrame to a Parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
csv_file |
str | Path
|
Path to a CSV file to read from |
None
|
pq_file |
str | Path
|
Path to a Parquet file to write to |
None
|
dedupe |
bool
|
Whether to run .drop_duplicates() on the DataFrame |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
If file cannot be saved, an |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
convert_pq_to_csv(pq_file=None, csv_file=None, dedupe=False)
Read a Parquet file into a DataFrame, then write the DataFrame to a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pq_file |
str | Path
|
Path to a Parquet file to read from |
None
|
csv_file |
str | Path
|
Path to a CSV file to write to |
None
|
dedupe |
bool
|
Whether to run .drop_duplicates() on the DataFrame |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
If file cannot be saved, an |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
count_df_rows(df=None)
Return count of the number of rows in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Count of rows in a |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
get_oldest_newest(df=None, date_col=None, filter_cols=None)
Get the oldest and newest rows in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
Pandas DataFrame to work on |
None
|
date_col |
str
|
Name of the column to sort by |
None
|
filter_cols |
list[str]
|
List of column names to return with the oldest/newest record. |
None
|
Returns:
| Type | Description |
|---|---|
Series | DataFrame
|
A Pandas |
Union[Series, DataFrame]
|
in the input |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_csv(csv_file=None, delimiter=',')
Load a CSV file into a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
csv_file |
str | Path
|
The path to a |
None
|
delimiter |
str
|
The delimiter symbol the |
','
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Pandas |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_pq(pq_file=None)
Return a DataFrame from a previously saved .parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pq_file |
str | Path
|
Path to a |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Pandas |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
load_pqs_to_df(search_dir=None, filetype='.parquet')
Load data export files in search_dir into list of DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_dir |
str
|
The directory to search for files in |
None
|
filetype |
str
|
The file extension to filter results by |
'.parquet'
|
Returns:
| Type | Description |
|---|---|
list[DataFrame]
|
A list of Pandas |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
rename_df_cols(df=None, col_rename_map=None)
Return a DataFrame with columns renamed based on input col_rename_map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
col_rename_map |
dict[str, str]
|
A Python |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A renamed Pandas |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
save_csv(df=None, csv_file=None, columns=None, dedupe=False)
Save DataFrame to a .csv file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
csv_file |
str | Path
|
The path to a |
None
|
columns |
list[str]
|
A list of string values representing column names for the |
None
|
dedupe |
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
If file cannot be saved, an |
Source code in src/red_utils/ext/dataframe_utils/pandas_utils/operations.py
save_pq(df=None, pq_file=None, dedupe=False)
Save DataFrame to a .parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
A Pandas |
None
|
pq_file |
str | Path
|
The path to a |
None
|
dedupe |
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
If file cannot be saved, an |