tools.helpers.advanced_utils package

Submodules

tools.helpers.advanced_utils.dataframe_utils module

Simple functions to manipulate dataframes with a user interface.

tools.helpers.advanced_utils.dataframe_utils.dataframe_column_selection(df, multiple_selection=True) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series]

Select certain certain columns of a dataframe

tools.helpers.advanced_utils.dataframe_utils.dataframe_describe(df, columns=None)

Apply describe method to certain columns of a dataframe

tools.helpers.advanced_utils.dataframe_utils.dataframe_groupby(df, synthesis_col=None, agg_dict=None, agg_methods=None)

Group by the columns selected by the user.

tools.helpers.advanced_utils.dataframe_utils.dataframe_merge(df1, df2, how_methods='outer', **kwargs)

Merge two dataframes

Parameters
  • df1 – left dataframe

  • df2 – right dataframe

  • how_methods – string or list of strings in ‘outer’, ‘inner’, ‘left’ or ‘right’

  • kwargs – keyword arguments for pandas.merge method

Returns

list of merged dataframes

tools.helpers.advanced_utils.dataframe_utils.dataframe_quick_analysis(df)

Filter certain columns and aggregate results.

tools.helpers.advanced_utils.dataframe_utils.dataframe_values_selection(df, columns=None)

Select certain lines of a dataframe considering specific values of certain columns

tools.helpers.advanced_utils.date_utils module

Utils to manipulate dates: - reset time of a date - add a period of time to a date - get the first date of a period - get the number of periods separating two dates - get dates of a specified period - get a list of multiple periods - support of datetime objects, pandas timestamps, series and dataframes (most cases)

tools.helpers.advanced_utils.date_utils.add_month(date, number_of_months=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)

Add number_of_months months to date. number_of_months can be negative or positive.

tools.helpers.advanced_utils.date_utils.add_period(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], number_of_period=0, period_type=None, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, inplace=False, **kwargs)

Add a certain number of periods (year, month, week, day, hour, minute, second) to the input date.

>>> add_period = handle_datetime_pandas_objects(add_period)  # apply decorator for doctest
>>> date = datetime.datetime(2019, 1, 5, 8, 2, 3)
>>> add_period(date, 2, period_type='week', reset_time=True)
Timestamp('2019-01-19 00:00:00')
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> add_period(date, 3, period_type='week', reset_time=True)
0   2019-01-26
dtype: datetime64[ns]
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> add_period(date, 4, period_type='week', reset_time=True)
           0
0 2019-02-02
>>> add_period(None, 4, period_type='week', reset_time=True)
NaT
Parameters
  • date – initial date

  • number_of_period – number of periods to add

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.

  • reset_time – if True, time is reset to 00:00:00.

  • output_type – type of the output / function to apply to the output. pd.Timestamp by default.

  • inplace – if date is a dataframe and inplace is True, convert columns inplace

  • kwargs – keyword arguments (unused)

Returns

date of type output_type

tools.helpers.advanced_utils.date_utils.add_week(date, number_of_weeks=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)

Add number_of_weeks weeks to date. number_of_months can be negative or positive. Monday is the first day of week (following ISO 8601).

tools.helpers.advanced_utils.date_utils.datetime_delta(date_1: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], date_2, period_type='day', abs_val=False)

Get the number of periods separating date_1 from date_2. Monday is the first day of week (following ISO 8601).

>>> d1 = datetime.datetime(2018, 12, 30)
>>> d2 = datetime.datetime(2018, 12, 31)
>>> d3 = datetime.datetime(2019, 1, 1)
>>> d4 = datetime.datetime(2019, 1, 14)
>>> datetime_delta(d1, d2, period_type='week')  # difference of less than 7 days but different week
1
>>> datetime_delta(d2, d1, period_type='week')  # the contrary
-1
>>> datetime_delta(d2, d3, period_type='week')  # different year but same week (ISO 8601)
0
>>> datetime_delta(d3, d4, period_type='week')  # difference of 13 days, but 2 weeks of difference
2
>>> datetime_delta(d4, d3, period_type='week')  # the contrary
-2
>>> df1 = pd.DataFrame(columns=['a'], data=[[datetime.datetime(2019, 1, 8)]])
>>> df2 = pd.DataFrame(columns=['b'], data=[[datetime.datetime(2018, 3, 4)]])
>>> datetime_delta(df1, d1, period_type='month')
   2018-12-30 00:00:00 - a
0                       -1
>>> datetime_delta(df1['a'], df2['b'], period_type='year')
0   -1
dtype: int64
>>> datetime_delta(df1, df2, period_type='month')
   b - a
0    -10
Parameters
  • date_1 – first date

  • date_2 – second date. if date_2 >= date_1 then output >=0

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.

  • abs_val – if True, output is an absolute value

Returns

number of periods between date_1 and date_2 (subtraction)

tools.helpers.advanced_utils.date_utils.get_month_periods(date_start=None, date_end=None, nb_period=1)

Returns nb_period month periods between date_start and date_end. See get_periods.

tools.helpers.advanced_utils.date_utils.get_period(nb_period=1, period_type='week', date_start=None, date_end=None, first_day_of_period=True, reset_time=True, **askdate_kwargs)
>>> get_period(period_type='week', date_start='ask', first_day_of_period=True,     default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True)
(Timestamp('2018-04-30 00:00:00'), Timestamp('2018-05-07 00:00:00'))
>>> get_period(period_type='month', date_start='ask', first_day_of_period=True,     default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True)
(Timestamp('2018-05-01 00:00:00'), Timestamp('2018-06-01 00:00:00'))
>>> get_period(period_type='month', date_start=pd.Series([datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
>>> get_period(period_type='month', date_start=pd.DataFrame(data=[datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
Parameters
  • nb_period – number of periods

  • period_type – week or month

  • date_start – first day of the period. If None, use date_end to compute date_start. If date_start and date_end are None, return None, None If ‘ask’, ask the user

  • date_end – last day of the period. If None, use date_start to compute date_end. If date_start and date_end are None, return None, None If ‘ask’, ask the user If ‘latest’, ‘today’ or ‘now’, use today date as date_end

  • first_day_of_period – if True, the first days of the periods are used for date_start and date_end

  • reset_time – if True, all times are reset to 00:00:00

  • askdate_kwargs – keyword arguments for simpledialog.askdate function (in case of use)

Returns

date_start, date_end

tools.helpers.advanced_utils.date_utils.get_periods(date_start=None, date_end=None, nb_period=None, period_type=None, reset_periods=True)

Get a list of nb_period periods from date_start and/or to date_end. Monday is the first day of week (following ISO 8601).

>>> from datetime import datetime as dt
>>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week')
[(Timestamp('2019-01-28 00:00:00'), Timestamp('2019-02-04 00:00:00'))]
>>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week', reset_periods=False)
[(Timestamp('2019-01-29 00:00:00'), Timestamp('2019-02-05 00:00:00'))]
>>> get_periods(date_start = dt(2019, 2, 3), date_end = dt(2019, 2, 5))
[(Timestamp('2019-02-03 00:00:00'), Timestamp('2019-02-05 00:00:00'))]
>>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month')  # None nb_period: auto
[(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00')), (Timestamp('2018-04-01 00:00:00'), Timestamp('2018-05-01 00:00:00'))]
>>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month', nb_period=1)
[(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00'))]
Parameters
  • date_start – minimum date

  • date_end – maximum date

  • nb_period – number of periods. If None, nb_period is set to 1 if date_start or date_end are None or set to datetime delta between date_start and date_end

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’. If None, date_start and date_end must be different from None (otherwise, an error is raised), nb_period and full_periods arguments are ignored and then [(date_start, date_end)] is returned.

  • reset_periods – if True, get first date of periods

Returns

list of tuples of date_start (to include), date_end (to exclude)

tools.helpers.advanced_utils.date_utils.get_quarter(date: datetime.datetime, inplace=False)

Returns the quarter of a date

# >>> get_quarter = handle_datetime_dataframe(get_quarter) # apply decorator for doctest >>> date = datetime.datetime(2019, 4, 5, 8, 2, 3) >>> get_quarter(date) 2

>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> get_quarter(date)
0    1
dtype: int64
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 12, 5, 8, 2, 3)])
>>> get_quarter(date)
   0
0  4
>>> get_quarter(date, inplace=True)
>>> date
   0
0  4
tools.helpers.advanced_utils.date_utils.month_delta(date_1, date_2, abs_val=False)
tools.helpers.advanced_utils.date_utils.reset_month(date: datetime.datetime, month_offset: int = 0, reset_time=True) → datetime.datetime

Get the first day of the month of ‘date’ with an offset of ‘month_offset’ month(s).

>>> date_1 = datetime.datetime(2017, 12, 20)  # support of datetime.datetime
>>> reset_month(date_1)
Timestamp('2017-12-01 00:00:00')
>>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584)
>>> reset_month(date_2, month_offset=1)
Timestamp('2018-01-01 00:00:00')
>>> reset_month(date_2, month_offset=1, reset_time=False)
Timestamp('2018-01-01 23:54:59.092584')
>>> date_3 = pd.Timestamp(2017, 12, 20)  # support of pandas TimeStamp
>>> first_day = reset_month(date_3, month_offset=-12)
>>> first_day
Timestamp('2016-12-01 00:00:00')
>>> isinstance(first_day, datetime.datetime)
True
>>> isinstance(first_day, pd.Timestamp)
True
Parameters
  • date – selected date in the month.

  • month_offset – add or remove a certain number of months. Default: 0.

  • reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00

Returns

first day in the month of ‘date’ (pd.TimeStamp object).

tools.helpers.advanced_utils.date_utils.reset_period(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], period_type: str, offset: int = 0, reset_time=True, inplace=False) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None]

Get the first day of the period of ‘date’ with an offset of ‘offset’ period(s). Monday is the first day of week (following ISO 8601).

Parameters
  • date – selected date in the week.

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.

  • offset – add or remove a certain number of periods. Default: 0.

  • reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00

  • inplace – if date is a dataframe and inplace is True, convert columns inplace

Returns

first day in the period of ‘date’ (pd.TimeStamp object).

tools.helpers.advanced_utils.date_utils.reset_timing(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime], inplace=False) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime]

Set time to 00:00:00

tools.helpers.advanced_utils.date_utils.reset_week(date: datetime.datetime, week_offset: int = 0, reset_time=True) → datetime.datetime

Get the first day of the week of ‘date’ with an offset of ‘week_offset’ week(s). Monday is the first day of week (following ISO 8601).

>>> date_1 = datetime.datetime(2017, 12, 20)  # support of datetime.datetime
>>> reset_week(date_1, week_offset=0)
Timestamp('2017-12-18 00:00:00')
>>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584)
>>> reset_week(date_2, week_offset=+8)
Timestamp('2018-02-12 00:00:00')
>>> reset_week(date_2, week_offset=+8, reset_time=False)
Timestamp('2018-02-12 23:54:59.092584')
>>> date_3 = pd.Timestamp(2017, 12, 20)  # support of pandas TimeStamp
>>> first_day = reset_week(date_3, week_offset=-1)
>>> first_day
Timestamp('2017-12-11 00:00:00')
>>> isinstance(first_day, datetime.datetime)
True
>>> isinstance(first_day, pd.Timestamp)
True
Parameters
  • date – selected date in the week.

  • week_offset – add or remove a certain number of weeks. Default: 0.

  • reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00

Returns

first day in the week of ‘date’ (pd.TimeStamp object).

tools.helpers.advanced_utils.date_utils.week_delta(date_1, date_2, abs_val=False)

Get the number of weeks separating date_1 from date_2. Monday is the first day of week (following ISO 8601).

tools.helpers.advanced_utils.text_utils module

Module contents

Advanced utils that may need import of other modules.

Modules: date_utils, dataframe_utils

tools.helpers.advanced_utils.get_period(nb_period=1, period_type='week', date_start=None, date_end=None, first_day_of_period=True, reset_time=True, **askdate_kwargs)
>>> get_period(period_type='week', date_start='ask', first_day_of_period=True,     default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True)
(Timestamp('2018-04-30 00:00:00'), Timestamp('2018-05-07 00:00:00'))
>>> get_period(period_type='month', date_start='ask', first_day_of_period=True,     default_date=datetime.datetime(2018, 5, 2, 8, 2, 1), bypass_dialog=True)
(Timestamp('2018-05-01 00:00:00'), Timestamp('2018-06-01 00:00:00'))
>>> get_period(period_type='month', date_start=pd.Series([datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
>>> get_period(period_type='month', date_start=pd.DataFrame(data=[datetime.datetime(2019, 1, 7)]), first_day_of_period=True)
Parameters
  • nb_period – number of periods

  • period_type – week or month

  • date_start – first day of the period. If None, use date_end to compute date_start. If date_start and date_end are None, return None, None If ‘ask’, ask the user

  • date_end – last day of the period. If None, use date_start to compute date_end. If date_start and date_end are None, return None, None If ‘ask’, ask the user If ‘latest’, ‘today’ or ‘now’, use today date as date_end

  • first_day_of_period – if True, the first days of the periods are used for date_start and date_end

  • reset_time – if True, all times are reset to 00:00:00

  • askdate_kwargs – keyword arguments for simpledialog.askdate function (in case of use)

Returns

date_start, date_end

tools.helpers.advanced_utils.get_periods(date_start=None, date_end=None, nb_period=None, period_type=None, reset_periods=True)

Get a list of nb_period periods from date_start and/or to date_end. Monday is the first day of week (following ISO 8601).

>>> from datetime import datetime as dt
>>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week')
[(Timestamp('2019-01-28 00:00:00'), Timestamp('2019-02-04 00:00:00'))]
>>> get_periods(date_end = dt(2019, 2, 5), nb_period=1, period_type='week', reset_periods=False)
[(Timestamp('2019-01-29 00:00:00'), Timestamp('2019-02-05 00:00:00'))]
>>> get_periods(date_start = dt(2019, 2, 3), date_end = dt(2019, 2, 5))
[(Timestamp('2019-02-03 00:00:00'), Timestamp('2019-02-05 00:00:00'))]
>>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month')  # None nb_period: auto
[(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00')), (Timestamp('2018-04-01 00:00:00'), Timestamp('2018-05-01 00:00:00'))]
>>> get_periods(date_start = dt(2018, 3, 1), date_end = dt(2018, 5, 1), period_type='month', nb_period=1)
[(Timestamp('2018-03-01 00:00:00'), Timestamp('2018-04-01 00:00:00'))]
Parameters
  • date_start – minimum date

  • date_end – maximum date

  • nb_period – number of periods. If None, nb_period is set to 1 if date_start or date_end are None or set to datetime delta between date_start and date_end

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’. If None, date_start and date_end must be different from None (otherwise, an error is raised), nb_period and full_periods arguments are ignored and then [(date_start, date_end)] is returned.

  • reset_periods – if True, get first date of periods

Returns

list of tuples of date_start (to include), date_end (to exclude)

tools.helpers.advanced_utils.get_quarter(date: datetime.datetime, inplace=False)

Returns the quarter of a date

# >>> get_quarter = handle_datetime_dataframe(get_quarter) # apply decorator for doctest >>> date = datetime.datetime(2019, 4, 5, 8, 2, 3) >>> get_quarter(date) 2

>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> get_quarter(date)
0    1
dtype: int64
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 12, 5, 8, 2, 3)])
>>> get_quarter(date)
   0
0  4
>>> get_quarter(date, inplace=True)
>>> date
   0
0  4
tools.helpers.advanced_utils.reset_week(date: datetime.datetime, week_offset: int = 0, reset_time=True) → datetime.datetime

Get the first day of the week of ‘date’ with an offset of ‘week_offset’ week(s). Monday is the first day of week (following ISO 8601).

>>> date_1 = datetime.datetime(2017, 12, 20)  # support of datetime.datetime
>>> reset_week(date_1, week_offset=0)
Timestamp('2017-12-18 00:00:00')
>>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584)
>>> reset_week(date_2, week_offset=+8)
Timestamp('2018-02-12 00:00:00')
>>> reset_week(date_2, week_offset=+8, reset_time=False)
Timestamp('2018-02-12 23:54:59.092584')
>>> date_3 = pd.Timestamp(2017, 12, 20)  # support of pandas TimeStamp
>>> first_day = reset_week(date_3, week_offset=-1)
>>> first_day
Timestamp('2017-12-11 00:00:00')
>>> isinstance(first_day, datetime.datetime)
True
>>> isinstance(first_day, pd.Timestamp)
True
Parameters
  • date – selected date in the week.

  • week_offset – add or remove a certain number of weeks. Default: 0.

  • reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00

Returns

first day in the week of ‘date’ (pd.TimeStamp object).

tools.helpers.advanced_utils.reset_month(date: datetime.datetime, month_offset: int = 0, reset_time=True) → datetime.datetime

Get the first day of the month of ‘date’ with an offset of ‘month_offset’ month(s).

>>> date_1 = datetime.datetime(2017, 12, 20)  # support of datetime.datetime
>>> reset_month(date_1)
Timestamp('2017-12-01 00:00:00')
>>> date_2 = datetime.datetime(2017, 12, 20, 23, 54, 59, 92584)
>>> reset_month(date_2, month_offset=1)
Timestamp('2018-01-01 00:00:00')
>>> reset_month(date_2, month_offset=1, reset_time=False)
Timestamp('2018-01-01 23:54:59.092584')
>>> date_3 = pd.Timestamp(2017, 12, 20)  # support of pandas TimeStamp
>>> first_day = reset_month(date_3, month_offset=-12)
>>> first_day
Timestamp('2016-12-01 00:00:00')
>>> isinstance(first_day, datetime.datetime)
True
>>> isinstance(first_day, pd.Timestamp)
True
Parameters
  • date – selected date in the month.

  • month_offset – add or remove a certain number of months. Default: 0.

  • reset_time – if True, time (hours, minutes, seconds, milliseconds) are set to 00:00:00

Returns

first day in the month of ‘date’ (pd.TimeStamp object).

tools.helpers.advanced_utils.add_period(date: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, datetime.datetime, None], number_of_period=0, period_type=None, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, inplace=False, **kwargs)

Add a certain number of periods (year, month, week, day, hour, minute, second) to the input date.

>>> add_period = handle_datetime_pandas_objects(add_period)  # apply decorator for doctest
>>> date = datetime.datetime(2019, 1, 5, 8, 2, 3)
>>> add_period(date, 2, period_type='week', reset_time=True)
Timestamp('2019-01-19 00:00:00')
>>> date = pd.Series([datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> add_period(date, 3, period_type='week', reset_time=True)
0   2019-01-26
dtype: datetime64[ns]
>>> date = pd.DataFrame(data=[datetime.datetime(2019, 1, 5, 8, 2, 3)])
>>> add_period(date, 4, period_type='week', reset_time=True)
           0
0 2019-02-02
>>> add_period(None, 4, period_type='week', reset_time=True)
NaT
Parameters
  • date – initial date

  • number_of_period – number of periods to add

  • period_type – one of: ‘year’, ‘quarter’, ‘month’, ‘week’, ‘day’, ‘hour’, ‘minute’, ‘second’.

  • reset_time – if True, time is reset to 00:00:00.

  • output_type – type of the output / function to apply to the output. pd.Timestamp by default.

  • inplace – if date is a dataframe and inplace is True, convert columns inplace

  • kwargs – keyword arguments (unused)

Returns

date of type output_type

tools.helpers.advanced_utils.add_week(date, number_of_weeks=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)

Add number_of_weeks weeks to date. number_of_months can be negative or positive. Monday is the first day of week (following ISO 8601).

tools.helpers.advanced_utils.add_month(date, number_of_months=0, reset_time=False, output_type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>, **kwargs)

Add number_of_months months to date. number_of_months can be negative or positive.