-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Describe the bug
When trying to create a multi-group paired estimation plot with datasets where each individual groups have different number of paired values (and thus NaN values are present for some groups), DABEST only takes the minimal shared amount of values for all groups into account. This does not happen when the same dataset is used in the online tool on estimationstats.com.
This problem is similar to closed issue #79
To Reproduce
import numpy as np
import pandas as pd
import dabest
print("We're using DABEST v{}".format(dabest.__version__))
from scipy.stats import norm
np.random.seed(9999)
c1DF = pd.DataFrame({'Test 1_pre':norm.rvs(loc=3, scale=0.4, size=20)})
t1DF = pd.DataFrame({'Test 1_post': norm.rvs(loc=3.5, scale=0.5, size=20)})
t2DF = pd.DataFrame({'Test 2_pre': norm.rvs(loc=2.5, scale=0.6, size=10)})
t3DF = pd.DataFrame({'Test 2_post': norm.rvs(loc=3, scale=0.75, size=10)})
t4DF = pd.DataFrame({'Test 3_pre': norm.rvs(loc=3.5, scale=0.75, size=40)})
t5DF = pd.DataFrame({'Test 3_post': norm.rvs(loc=3.25, scale=0.4, size=40)})
df = pd.concat([c1DF,t1DF,t2DF,t3DF,t4DF,t5DF],axis=1)
df["ID"] = pd.Series(range(1, len(df)+1))
multi_paired = dabest.load(df, idx=(("Test 1_pre", "Test 1_post"),
("Test 2_pre", "Test 2_post"),
("Test 3_pre", "Test 3_post")), paired="baseline", id_col="ID"
)
multi_paired.mean_diff.plot();
Expected behavior
I would expect for DABEST to take all available values for each group into account (e.g. 20, 10 and 40) - however the python version only uses 10 values for each group. The online version does not show this problem.
Screenshots
Output of python version:
Output of web version:
Your package version (please complete the following information):
- dabest: v2025.10.20
- pandas: 2.2.3
- numpy: 2.1.3
- matplotlib: 3.10.5
- seaborn: 0.13.2
- scipy: 1.16.2
- python: 3.11.13