tools.helpers.advanced_utils package¶
Submodules¶
tools.helpers.advanced_utils.dataframe_utils module¶
Simple functions to manipulate dataframes with a user interface.
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_column_selection
(df, multiple_selection=True) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series]¶ Select certain certain columns of a dataframe
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_describe
(df, columns=None)¶ Apply describe method to certain columns of a dataframe
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_groupby
(df, synthesis_col=None, agg_dict=None, agg_methods=None)¶ Group by the columns selected by the user.
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_merge
(df1, df2, how_methods='outer', **kwargs)¶ Merge two dataframes
- Parameters
df1 – left dataframe
df2 – right dataframe
how_methods – string or list of strings in ‘outer’, ‘inner’, ‘left’ or ‘right’
kwargs – keyword arguments for pandas.merge method
- Returns
list of merged dataframes
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_quick_analysis
(df)¶ Filter certain columns and aggregate results.
-
tools.helpers.advanced_utils.dataframe_utils.
dataframe_values_selection
(df, columns=None)¶ Select certain lines of a dataframe considering specific values of certain columns
tools.helpers.advanced_utils.date_utils module¶
Utils to manipulate dates: - reset time of a date - add a period of time to a date - get the first date of a period - get the number of periods separating two dates - get dates of a specified period - get a list of multiple periods - support of datetime objects, pandas timestamps, series and dataframes (most cases)
-
tools.helpers.advanced_utils.date_utils.
add_month
(date, number_of_months=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)¶ Add number_of_months months to date. number_of_months can be negative or positive.
-
tools.helpers.advanced_utils.date_utils.
add_period
(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], number_of_period=0, period_type=None, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, inplace=False, **kwargs)¶ Add a certain number of periods (year, month, week, day, hour, minute, second) to the input date.
>>> add_period = handle_datetime_pandas_objects(add_period) # apply decorator for doctest >>> date = datetime.datetime(2019, 1, 5, 8, 2, 3) >>> add_period(date, 2, period_type='week', reset_time=True) Timestamp('2019-01-19 00:00:00')
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> add_period(date, 3, period_type='week', reset_time=True) 0 2019-01-26 dtype: datetime64[ns]
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> add_period(date, 4, period_type='week', reset_time=True) 0 0 2019-02-02 >>> add_period(None, 4, period_type='week', reset_time=True) NaT
- Parameters
date – initial date
number_of_period – number of periods to add
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.
reset_time – if True, time is reset to 00:00:00.
output_type – type of the output / function to apply to the output. pd.Timestamp by default.
inplace – if date is a dataframe and inplace is True, convert columns inplace
kwargs – keyword arguments (unused)
- Returns
date of type output_type
-
tools.helpers.advanced_utils.date_utils.
add_week
(date, number_of_weeks=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)¶ Add number_of_weeks weeks to date. number_of_months can be negative or positive. Monday is the first day of week (following ISO 8601).
-
tools.helpers.advanced_utils.date_utils.
datetime_delta
(date_1: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], date_2, period_type='day', abs_val=False)¶ Get the number of periods separating date_1 from date_2. Monday is the first day of week (following ISO 8601).
>>> d1 = datetime.datetime(2018, 12, 30) >>> d2 = datetime.datetime(2018, 12, 31) >>> d3 = datetime.datetime(2019, 1, 1) >>> d4 = datetime.datetime(2019, 1, 14) >>> datetime_delta(d1, d2, period_type='week') # difference of less than 7 days but different week 1 >>> datetime_delta(d2, d1, period_type='week') # the contrary -1 >>> datetime_delta(d2, d3, period_type='week') # different year but same week (ISO 8601) 0 >>> datetime_delta(d3, d4, period_type='week') # difference of 13 days, but 2 weeks of difference 2 >>> datetime_delta(d4, d3, period_type='week') # the contrary -2 >>> df1 = pd.DataFrame(columns=['a'], data=[[datetime.datetime(2019, 1, 8)]]) >>> df2 = pd.DataFrame(columns=['b'], data=[[datetime.datetime(2018, 3, 4)]]) >>> datetime_delta(df1, d1, period_type='month') 2018-12-30 00:00:00 - a 0 -1 >>> datetime_delta(df1['a'], df2['b'], period_type='year') 0 -1 dtype: int64 >>> datetime_delta(df1, df2, period_type='month') b - a 0 -10
- Parameters
date_1 – first date
date_2 – second date. if date_2 >= date_1 then output >=0
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.
abs_val – if True, output is an absolute value
- Returns
number of periods between date_1 and date_2 (subtraction)
-
tools.helpers.advanced_utils.date_utils.
get_month_periods
(date_start=None, date_end=None, nb_period=1)¶ Returns nb_period month periods between date_start and date_end. See get_periods.
-
tools.helpers.advanced_utils.date_utils.
get_period
(nb_period=1, period_type='week', date_start=None, date_end=None, first_day_of_period=True, reset_time=True, **askdate_kwargs)¶ >>> get_period(period_type='week', date_start='ask', first_day_of_period=True, default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True) (Timestamp('2018-04-30 00:00:00'), Timestamp('2018-05-07 00:00:00'))
>>> get_period(period_type='month', date_start='ask', first_day_of_period=True, default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True) (Timestamp('2018-05-01 00:00:00'), Timestamp('2018-06-01 00:00:00'))
>>> get_period(period_type='month', date_start=pd.Series([datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
>>> get_period(period_type='month', date_start=pd.DataFrame(data=[datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
- Parameters
nb_period – number of periods
period_type – week or month
date_start – first day of the period. If None, use date_end to compute date_start. If date_start and date_end are None, return None, None If ‘ask’, ask the user
date_end – last day of the period. If None, use date_start to compute date_end. If date_start and date_end are None, return None, None If ‘ask’, ask the user If ‘latest’, ‘today’ or ‘now’, use today date as date_end
first_day_of_period – if True, the first days of the periods are used for date_start and date_end
reset_time – if True, all times are reset to 00:00:00
askdate_kwargs – keyword arguments for simpledialog.askdate function (in case of use)
- Returns
date_start, date_end
-
tools.helpers.advanced_utils.date_utils.
get_periods
(date_start=None, date_end=None, nb_period=None, period_type=None, reset_periods=True)¶ Get a list of nb_period periods from date_start and/or to date_end. Monday is the first day of week (following ISO 8601).
>>> from datetime import datetime as dt >>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week') [(Timestamp('2019-01-28 00:00:00'), Timestamp('2019-02-04 00:00:00'))] >>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week', reset_periods=False) [(Timestamp('2019-01-29 00:00:00'), Timestamp('2019-02-05 00:00:00'))] >>> get_periods(date_start = dt(2019, 2, 3), date_end = dt(2019, 2, 5)) [(Timestamp('2019-02-03 00:00:00'), Timestamp('2019-02-05 00:00:00'))] >>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month') # None nb_period: auto [(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00')), (Timestamp('2018-04-01 00:00:00'), Timestamp('2018-05-01 00:00:00'))] >>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month', nb_period=1) [(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00'))]
- Parameters
date_start – minimum date
date_end – maximum date
nb_period – number of periods. If None, nb_period is set to 1 if date_start or date_end are None or set to datetime delta between date_start and date_end
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’. If None, date_start and date_end must be different from None (otherwise, an error is raised), nb_period and full_periods arguments are ignored and then [(date_start, date_end)] is returned.
reset_periods – if True, get first date of periods
- Returns
list of tuples of date_start (to include), date_end (to exclude)
-
tools.helpers.advanced_utils.date_utils.
get_quarter
(date: datetime.datetime, inplace=False)¶ Returns the quarter of a date
# >>> get_quarter = handle_datetime_dataframe(get_quarter) # apply decorator for doctest >>> date = datetime.datetime(2019, 4, 5, 8, 2, 3) >>> get_quarter(date) 2
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> get_quarter(date) 0 1 dtype: int64
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 12, 5, 8, 2, 3)]) >>> get_quarter(date) 0 0 4 >>> get_quarter(date, inplace=True) >>> date 0 0 4
-
tools.helpers.advanced_utils.date_utils.
month_delta
(date_1, date_2, abs_val=False)¶
-
tools.helpers.advanced_utils.date_utils.
reset_month
(date: datetime.datetime, month_offset: int = 0, reset_time=True) → datetime.datetime¶ Get the first day of the month of ‘date’ with an offset of ‘month_offset’ month(s).
>>> date_1 = datetime.datetime(2017, 12, 20) # support of datetime.datetime >>> reset_month(date_1) Timestamp('2017-12-01 00:00:00') >>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584) >>> reset_month(date_2, month_offset=1) Timestamp('2018-01-01 00:00:00') >>> reset_month(date_2, month_offset=1, reset_time=False) Timestamp('2018-01-01 23:54:59.092584') >>> date_3 = pd.Timestamp(2017, 12, 20) # support of pandas TimeStamp >>> first_day = reset_month(date_3, month_offset=-12) >>> first_day Timestamp('2016-12-01 00:00:00') >>> isinstance(first_day, datetime.datetime) True >>> isinstance(first_day, pd.Timestamp) True
- Parameters
date – selected date in the month.
month_offset – add or remove a certain number of months. Default: 0.
reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00
- Returns
first day in the month of ‘date’ (pd.TimeStamp object).
-
tools.helpers.advanced_utils.date_utils.
reset_period
(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], period_type: str, offset: int = 0, reset_time=True, inplace=False) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None]¶ Get the first day of the period of ‘date’ with an offset of ‘offset’ period(s). Monday is the first day of week (following ISO 8601).
- Parameters
date – selected date in the week.
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.
offset – add or remove a certain number of periods. Default: 0.
reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00
inplace – if date is a dataframe and inplace is True, convert columns inplace
- Returns
first day in the period of ‘date’ (pd.TimeStamp object).
-
tools.helpers.advanced_utils.date_utils.
reset_timing
(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime], inplace=False) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime]¶ Set time to 00:00:00
-
tools.helpers.advanced_utils.date_utils.
reset_week
(date: datetime.datetime, week_offset: int = 0, reset_time=True) → datetime.datetime¶ Get the first day of the week of ‘date’ with an offset of ‘week_offset’ week(s). Monday is the first day of week (following ISO 8601).
>>> date_1 = datetime.datetime(2017, 12, 20) # support of datetime.datetime >>> reset_week(date_1, week_offset=0) Timestamp('2017-12-18 00:00:00') >>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584) >>> reset_week(date_2, week_offset=+8) Timestamp('2018-02-12 00:00:00') >>> reset_week(date_2, week_offset=+8, reset_time=False) Timestamp('2018-02-12 23:54:59.092584') >>> date_3 = pd.Timestamp(2017, 12, 20) # support of pandas TimeStamp >>> first_day = reset_week(date_3, week_offset=-1) >>> first_day Timestamp('2017-12-11 00:00:00') >>> isinstance(first_day, datetime.datetime) True >>> isinstance(first_day, pd.Timestamp) True
- Parameters
date – selected date in the week.
week_offset – add or remove a certain number of weeks. Default: 0.
reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00
- Returns
first day in the week of ‘date’ (pd.TimeStamp object).
-
tools.helpers.advanced_utils.date_utils.
week_delta
(date_1, date_2, abs_val=False)¶ Get the number of weeks separating date_1 from date_2. Monday is the first day of week (following ISO 8601).
tools.helpers.advanced_utils.text_utils module¶
Module contents¶
Advanced utils that may need import of other modules.
Modules: date_utils, dataframe_utils
-
tools.helpers.advanced_utils.
get_period
(nb_period=1, period_type='week', date_start=None, date_end=None, first_day_of_period=True, reset_time=True, **askdate_kwargs)¶ >>> get_period(period_type='week', date_start='ask', first_day_of_period=True, default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True) (Timestamp('2018-04-30 00:00:00'), Timestamp('2018-05-07 00:00:00'))
>>> get_period(period_type='month', date_start='ask', first_day_of_period=True, default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True) (Timestamp('2018-05-01 00:00:00'), Timestamp('2018-06-01 00:00:00'))
>>> get_period(period_type='month', date_start=pd.Series([datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
>>> get_period(period_type='month', date_start=pd.DataFrame(data=[datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
- Parameters
nb_period – number of periods
period_type – week or month
date_start – first day of the period. If None, use date_end to compute date_start. If date_start and date_end are None, return None, None If ‘ask’, ask the user
date_end – last day of the period. If None, use date_start to compute date_end. If date_start and date_end are None, return None, None If ‘ask’, ask the user If ‘latest’, ‘today’ or ‘now’, use today date as date_end
first_day_of_period – if True, the first days of the periods are used for date_start and date_end
reset_time – if True, all times are reset to 00:00:00
askdate_kwargs – keyword arguments for simpledialog.askdate function (in case of use)
- Returns
date_start, date_end
-
tools.helpers.advanced_utils.
get_periods
(date_start=None, date_end=None, nb_period=None, period_type=None, reset_periods=True)¶ Get a list of nb_period periods from date_start and/or to date_end. Monday is the first day of week (following ISO 8601).
>>> from datetime import datetime as dt >>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week') [(Timestamp('2019-01-28 00:00:00'), Timestamp('2019-02-04 00:00:00'))] >>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week', reset_periods=False) [(Timestamp('2019-01-29 00:00:00'), Timestamp('2019-02-05 00:00:00'))] >>> get_periods(date_start = dt(2019, 2, 3), date_end = dt(2019, 2, 5)) [(Timestamp('2019-02-03 00:00:00'), Timestamp('2019-02-05 00:00:00'))] >>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month') # None nb_period: auto [(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00')), (Timestamp('2018-04-01 00:00:00'), Timestamp('2018-05-01 00:00:00'))] >>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month', nb_period=1) [(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00'))]
- Parameters
date_start – minimum date
date_end – maximum date
nb_period – number of periods. If None, nb_period is set to 1 if date_start or date_end are None or set to datetime delta between date_start and date_end
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’. If None, date_start and date_end must be different from None (otherwise, an error is raised), nb_period and full_periods arguments are ignored and then [(date_start, date_end)] is returned.
reset_periods – if True, get first date of periods
- Returns
list of tuples of date_start (to include), date_end (to exclude)
-
tools.helpers.advanced_utils.
get_quarter
(date: datetime.datetime, inplace=False)¶ Returns the quarter of a date
# >>> get_quarter = handle_datetime_dataframe(get_quarter) # apply decorator for doctest >>> date = datetime.datetime(2019, 4, 5, 8, 2, 3) >>> get_quarter(date) 2
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> get_quarter(date) 0 1 dtype: int64
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 12, 5, 8, 2, 3)]) >>> get_quarter(date) 0 0 4 >>> get_quarter(date, inplace=True) >>> date 0 0 4
-
tools.helpers.advanced_utils.
reset_week
(date: datetime.datetime, week_offset: int = 0, reset_time=True) → datetime.datetime¶ Get the first day of the week of ‘date’ with an offset of ‘week_offset’ week(s). Monday is the first day of week (following ISO 8601).
>>> date_1 = datetime.datetime(2017, 12, 20) # support of datetime.datetime >>> reset_week(date_1, week_offset=0) Timestamp('2017-12-18 00:00:00') >>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584) >>> reset_week(date_2, week_offset=+8) Timestamp('2018-02-12 00:00:00') >>> reset_week(date_2, week_offset=+8, reset_time=False) Timestamp('2018-02-12 23:54:59.092584') >>> date_3 = pd.Timestamp(2017, 12, 20) # support of pandas TimeStamp >>> first_day = reset_week(date_3, week_offset=-1) >>> first_day Timestamp('2017-12-11 00:00:00') >>> isinstance(first_day, datetime.datetime) True >>> isinstance(first_day, pd.Timestamp) True
- Parameters
date – selected date in the week.
week_offset – add or remove a certain number of weeks. Default: 0.
reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00
- Returns
first day in the week of ‘date’ (pd.TimeStamp object).
-
tools.helpers.advanced_utils.
reset_month
(date: datetime.datetime, month_offset: int = 0, reset_time=True) → datetime.datetime¶ Get the first day of the month of ‘date’ with an offset of ‘month_offset’ month(s).
>>> date_1 = datetime.datetime(2017, 12, 20) # support of datetime.datetime >>> reset_month(date_1) Timestamp('2017-12-01 00:00:00') >>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584) >>> reset_month(date_2, month_offset=1) Timestamp('2018-01-01 00:00:00') >>> reset_month(date_2, month_offset=1, reset_time=False) Timestamp('2018-01-01 23:54:59.092584') >>> date_3 = pd.Timestamp(2017, 12, 20) # support of pandas TimeStamp >>> first_day = reset_month(date_3, month_offset=-12) >>> first_day Timestamp('2016-12-01 00:00:00') >>> isinstance(first_day, datetime.datetime) True >>> isinstance(first_day, pd.Timestamp) True
- Parameters
date – selected date in the month.
month_offset – add or remove a certain number of months. Default: 0.
reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00
- Returns
first day in the month of ‘date’ (pd.TimeStamp object).
-
tools.helpers.advanced_utils.
add_period
(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], number_of_period=0, period_type=None, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, inplace=False, **kwargs)¶ Add a certain number of periods (year, month, week, day, hour, minute, second) to the input date.
>>> add_period = handle_datetime_pandas_objects(add_period) # apply decorator for doctest >>> date = datetime.datetime(2019, 1, 5, 8, 2, 3) >>> add_period(date, 2, period_type='week', reset_time=True) Timestamp('2019-01-19 00:00:00')
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> add_period(date, 3, period_type='week', reset_time=True) 0 2019-01-26 dtype: datetime64[ns]
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 1, 5, 8, 2, 3)]) >>> add_period(date, 4, period_type='week', reset_time=True) 0 0 2019-02-02 >>> add_period(None, 4, period_type='week', reset_time=True) NaT
- Parameters
date – initial date
number_of_period – number of periods to add
period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.
reset_time – if True, time is reset to 00:00:00.
output_type – type of the output / function to apply to the output. pd.Timestamp by default.
inplace – if date is a dataframe and inplace is True, convert columns inplace
kwargs – keyword arguments (unused)
- Returns
date of type output_type
-
tools.helpers.advanced_utils.
add_week
(date, number_of_weeks=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)¶ Add number_of_weeks weeks to date. number_of_months can be negative or positive. Monday is the first day of week (following ISO 8601).
-
tools.helpers.advanced_utils.
add_month
(date, number_of_months=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)¶ Add number_of_months months to date. number_of_months can be negative or positive.