Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions learning_curve.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,21 @@
from sklearn.linear_model import LogisticRegression

data = load_digits()
print data.DESCR
num_trials = 10
num_trials = 100
train_percentages = range(5,95,5)
test_accuracies = numpy.zeros(len(train_percentages))

# train a model with training percentages between 5 and 90 (see train_percentages) and evaluate
# the resultant accuracy.
# You should repeat each training percentage num_trials times to smooth out variability
# for consistency with the previous example use model = LogisticRegression(C=10**-10) for your learner
for i in range(len(train_percentages)):
for j in range(num_trials):
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, train_size=(train_percentages[i]))
model = LogisticRegression(C=5**-5)
model.fit(X_train, y_train)
test_accuracies[i] = test_accuracies[i]+model.score(X_test,y_test)

# TODO: your code here

fig = plt.figure()
plt.plot(train_percentages, test_accuracies)
plt.xlabel('Percentage of Data Used for Training')
plt.ylabel('Accuracy on Test Set')
plt.show()

4 changes: 4 additions & 0 deletions questions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
1. The general trend is upward.
2. The beginning and end of the curve appear to be noisiest. This is likely because testing and training both require a certain amount of data to work well, so when almost all the data is being used for one or the other the curve can get messed up.
3.The curve was pretty smooth around 5000 trials. A good way to smooth the curve even more at large numbers of trials was to change the step size in the percentages from 5 to 1.
4. Higher C values emphasize noise, and lower C values smooth.