Skip to content

Conversation

@sarahcreighton
Copy link

Numpy Slides

Overview

A few suggestions for improving the clarity of the lecture notes for python/01_materials/slides/10_numpy.ipynb. Feel free to use or discard as you see fit.

Content clarity

  1. Numpy documentation (line 182): was not clear what the list was without clicking the link
  2. Section: logic and filtering (line 1234): changed to clearer, more intuitive wording that matches the second code chunk.
To filter an array, we pass the mask (e.g., `tens % 3 == 0`) as the array index:
# also works 
mask = tens % 3 == 0
tens[mask]

Alternative wording:

To filter an array, we apply the Boolean expression (e.g., `tens % 3 == 0`) as a mask over the entire array by passing the mask as the array index (e.g., `tens[tens % 3 == 0`]).
  1. The intention of the paragraph + code (see below) isn't entirely clear - are the code blocks demonstrating pandas or numpy? Some inline code comments might be helpful here. Perhaps also a link to pandas documentation?
In this case, `genfromtxt` returned a _structured array_, a different type of array than the ones we have seen 
so far. We can refer to fields by putting the field name as a string in square brackets, similar to referencing 
a dictionary key. However, we will soon see a data type in another package, `pandas`, that is even better 
suited to working with columns in tabular data like this.
np.median(housing_data['housing_median_age'])
housing_data['median_income'].mean()
  1. Typo in the last code block - either the function should it be .median() or the column reference should be [mean_income]. Note that if you change it to .median() it throws an error (presumably because pandas has not been imported?) If .mean() was intended, it seems odd given the line above used np.median()

Minor typos

  1. changed "until" to "to" in a comment for consistency (line 242)
  2. housing_data['median_income'].mean() should either be .median() or ['mean_income'] (last code block)
housing_data['median_income'].mean()

Other notes

Modifications were made on a system running macOS, using Python 3.11.14 (python-env) as the kernel.

  • execution_count may need to be set back to NULL (line 97)
  • display_name may need to be set back to Python 3 (ipykernal) (line 1549)
  • version may need to be set back to 3.11.8 (line 1563)

Checklist

  • I can confirm that my changes are working as intended

@sarahcreighton
Copy link
Author

Re: Point 4 - changing the column name from median_income to mean_income throws an error as the column does not exist. Having code that references a median column while using the mean function is still confusing, and it might be worth adding a note if it is intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant