graphpkg.static package#

Module contents#

graphpkg.static.grid_classification_boundary(models_list: list, data: Optional[numpy.ndarray] = None, size: int = 4, n_plot_cols: int = 3, figsize: tuple = (5, 5), canvas_details: int = 50, canvas_opacity: float = 0.4, canvas_palette='coolwarm') None[source]#

Plot multiple plots of clasification boundaries for mulitple ml models.

Only models are allowed with 1D prediction.

Parameters
  • models_list (list) – Models list of dictionary.

  • data (np.ndarray, optional) – source data. restricted to 2 features and 1 target, in total 3 columns. Defaults to None.

  • size (int, optional) – Size of canvas. Defaults to 4.

  • n_plot_cols (int, optional) – number of plot columns. Defaults to 3.

  • figsize (tuple, optional) – figure size. Defaults to (5, 5).

  • canvas_details (int, optional) – detailing in canvas. Defaults to 50.

  • canvas_opacity (float, optional) – Canvas transparency parameter. Defaults to 0.4.

  • canvas_palette (str, optional) – palette from matplotlib. Defaults to coolwarm.

Raises

ValueError – Only 3 dimensional data, 2 features, 1 target is allowed.

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.datasets import make_classification
>>> import matplotlib.pyplot as plt
>>> X, y = make_classification(n_samples=500, n_features=2, random_state=25,
>>>                             n_informative=1, n_classes=2, n_clusters_per_class=1,
>>>                             n_repeated=0, n_redundant=0)
>>> lr_model = LogisticRegression().fit(X, y)
>>> dt_model = DecisionTreeClassifier().fit(X, y)
>>> models_list = [{
>>>     "name": "Logistic Regression Classifier",
>>>     "function": lr_model.predict
>>> },{
>>>     "name": "Decision Tree Classifier",
>>>     "function": dt_model.predict
>>> }]
>>> grid_classification_boundary(models_list=models_list, data=np.hstack((X, y.reshape(-1, 1))),
>>>                             figsize=(7,5), canvas_details=100)
>>> plt.show()
graphpkg.static.multi_distplots(df: pandas.core.frame.DataFrame, n_cols: int = 4, bins: int = 20, kde: bool = True, class_col: Optional[str] = None, legend: bool = True, legend_loc: str = 'best', figsize: Optional[tuple] = None, palette: str = 'dark', grid_flag: bool = True, xticks_rotation: int = 60) None[source]#

Mulitple Distribution Plots using pandas dataframe.

Seaborn’s histplot is used for distribution with additional functionality to have multiple distributions in one grid.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • n_cols (int, optional) – Number of columns in the grid. Defaults to 4.

  • bins (int, optional) – number of bins in distribution. Defaults to 20.

  • kde (bool, optional) – kde estimation line & plot. Defaults to True.

  • class_col (str, optional) – class column name for distribution separation and legend. Defaults to None.

  • legend (bool, optional) – put legend or not. Defaults to True.

  • legend_loc (str, optional) – where to put legend, takes inputs similar to matplotlib.pyplot. Defaults to ‘best’.

  • figsize (tuple, optional) – figure size, similar to matplotlib.pyplot. Defaults to None.

  • palette (str, optional) – color palette, property from seaborn. Defaults to ‘dark’.

  • grid_flag (bool, optional) – put grid or not. Defaults to True.

  • xticks_rotation (int, optional) – xticks rotation angle. Defaults to 60.

Examples

>>> from sklearn.datasets import fetch_california_housing
>>> import pandas as pd
>>> import numpy as np
>>> dataset = fetch_california_housing()
>>> df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
>>> df['target'] = dataset.target
>>> multi_distplots(df, n_cols=2)
>>> plt.show()
graphpkg.static.plot_boxed_timeseries(df: pandas.core.frame.DataFrame, ts_col: str, data_col: str, box: Optional[str] = 'MONTH', figsize: Optional[tuple] = None)[source]#

Plot timeseries data integrated with boxplot to see window based data variation.

Parameters
  • df (pd.DataFrame) – pandas dataframe.

  • ts_col (str) – timeseries column name.

  • data_col (str) – data column name.

  • box (Optional[str], optional) – time box. Defaults to ‘MONTH’.

  • figsize (Optional[tuple], optional) – figure size. Defaults to None.

Returns

Matplotlib figure and axes.

Return type

Figure, Axes

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> from graphpkg.static import plot_boxed_timeseries
>>> size = 1000
>>> df = pd.DataFrame({
>>>     "data": np.random.normal(size=(size,)) * 100,
>>>     "timestamps": pd.date_range(start='1/1/2018', periods=size, freq='MIN')
>>> })
>>> fig, ax = plot_boxed_timeseries(df, data_col='data', ts_col='timestamps', box='hour', figsize=(10, 5))
>>> plt.tight_layout()
>>> plt.show()
graphpkg.static.plot_classification_boundary(func: Callable, data: Optional[numpy.ndarray] = None, size: int = 4, n_plot_cols: int = 1, figsize: tuple = (5, 5), canvas_details: int = 50, canvas_opacity: float = 0.5, canvas_palette: str = 'coolwarm')[source]#

Plot classification model’s decision boundary.

Parameters
  • func (function) – Prediction function of ML model that.

  • data (np.ndarray, optional) – source data. restricted to 2 features and 1 target, in total 3 columns. Defaults to None.

  • size (int, optional) – size of canvas. Defaults to 4.

  • n_plot_cols (int, optional) – number of columns for number of plots. Defaults to 1.

  • figsize (tuple, optional) – matplotlib figure size. Defaults to (5, 5).

  • canvas_details (int, optional) – how detailed the boundary should be. Defaults to 50.

  • canvas_opacity (float, optional) – Canvas transparency parameter. Defaults to 0.3.

  • canvas_palette (str, optional) – palette of canvas. Defaults to ‘coolwarm’.

Raises

ValueError – If the input data’s shape is not (k,3), k=number of rows.

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.datasets import make_classification
>>> import matplotlib.pyplot as plt
>>> X, y = make_classification(n_samples=500, n_features=2, random_state=25,
>>>                             n_informative=1, n_classes=2, n_clusters_per_class=1,
>>>                             n_repeated=0, n_redundant=0)
>>> model = LogisticRegression().fit(X, y)
>>> plot_classification_boundary(func=model.predict, data=np.hstack((X,y.reshape(-1,1))),canvas_details=100)
>>> plt.show()
graphpkg.static.plot_distribution(x: numpy.ndarray, kde: Optional[bool] = True, indicate_data: Optional[Union[list, numpy.ndarray]] = None, figsize: Optional[tuple] = None) None[source]#

Plot distribution with additional informations.

distribution and box plot from matplotlib and seaborn.

Parameters
  • x (np.ndarray) – input 1D array.

  • kde (Optional[bool], optional) – kde parameter from seaborn. Defaults to True.

  • indicate_data (Optional[Union[list, np.ndarray]], optional) – data points to observe/indicate in plot. Defaults to None.

  • figsize (Optional[tuple], optional) – figure size from matplotlib. Defaults to None.

Raises

AssertionError – only 1d arrays are allowed for input.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from graphpkg.static import plot_distribution
>>> x = np.random.normal(size=(200,))
>>> plot_distribution(x, indicate_data=[0.6])
>>> plt.show()