Improve 80/20 train/test split to ensure consistent ratio #25

shayan74 · 2025-11-07T12:53:49Z

Dear Jadi,

Thank you for creating such a wonderful machine learning course — I’ve been recommending it to Persian-speaking students who are eager to learn ML.

While reviewing the code, I noticed a small detail in the train/test split logic that might cause slight variations in the ratio. The current approach:

np.random.rand(len(df)) < 0.8

works well in general, but due to randomness, it may yield ratios anywhere between roughly 77% to 82% for training data. This is perfectly acceptable for large datasets, but in smaller datasets it can lead to noticeable deviations and potential confusion for learners.

To make the ratio more consistent, I suggest using:

def random_boolean_array(x, true_ratio=0.8):
n_true = int(x * true_ratio)
n_false = x - n_true
arr = np.array([True] * n_true + [False] * n_false)
np.random.shuffle(arr)
return arr

This approach tends to produce a more stable 80/20 distribution.

Or, for keeping the inline coding style (and avoiding manual splitting functions) we can improve this by using:

np.random.choice([True, False], size=len(df), p=[0.8, 0.2])

Thank you for your time and for the excellent educational content you share.

Damet Garm!
Shayan

…lute ratio with consistent randomness

jadijadi · 2025-11-15T00:31:38Z

Thanks for the contribution. Your logic looks valid and an important point. But this is a basic educational lesson and a one line choice is good enough. I think changing it to something which needs lots of explanation will frighten the students.

But it would be great if you can add this point as a comment below the actual code. Its ok to have a multi line comment describing the issue with my one line simple code and proposing the fix. but all commented.

fix(data-split): correct 80/20 train-test distribution to ensure abso…

32d1d9f

…lute ratio with consistent randomness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve 80/20 train/test split to ensure consistent ratio #25

Improve 80/20 train/test split to ensure consistent ratio #25

Uh oh!

shayan74 commented Nov 7, 2025

Uh oh!

jadijadi commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve 80/20 train/test split to ensure consistent ratio #25

Are you sure you want to change the base?

Improve 80/20 train/test split to ensure consistent ratio #25

Uh oh!

Conversation

shayan74 commented Nov 7, 2025

Uh oh!

jadijadi commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants