Matplotlib, How To Loop?
Solution 1:
Probably the simplest way to display your data is with a single plot containing multiple colors.
The key is to label the data more efficiently. You have the right idea with np.intersect1d(Y, Y)
, but though clever, this not the best way to set up unique values. Instead, I recommend using np.unique
. Not only will that remove the need to hard-code the argument to plt.legend
, but the return_inverse
argument will allow you to construct attributes directly.
A minor point is that you can index single columns with a single index, rather than a slice.
For example,
X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)
labels, indices = np.unique(Y, return_inverse=True)
scatter = plt.scatter(X[:, 0], X[:, 2], color=indices)
The array indices
indexes into the three unique values in labels
to get the original array back. You can therefore supply the index as a label for each element.
Constructing a legend for such a labeled dataset is something that matplotlib fully supports out of the box, as I learned from matplotlib add legend with multiple entries for a single scatter plot, which was inspired by this solution. The gist of it is that the object that plt.scatter
returns has a method legend_elements
which does all the work for you:
plt.legend(scatter.legend_elements()[0], labels)
legend_elements
returns a tuple with two items. The first is handle to a collection of elements with distinct labels that can be used as the first argument to legend
. The second is a set of default text labels based on the
numerical labels you supplied. We discard these in favor of our actual text labels.
Solution 2:
You can do a much better job with the indexing by splitting the data properly.
The indexing expression X[:, 0:1][Y == n]
extracts a view of the first column of X
. It then applies the boolean mask Y == n
to the view. Both steps can be done more concisely as a single step: X[Y == n, 0]
. This is a bit inefficient since you will do this for every unique value in Y
.
My other solution called for np.unique
to group the labels. But np.unique
works by sorting the array. We can do that ourselves:
X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)
ind = np.argsort(Y)
X = X[ind, :]
Y = Y[ind]
To find where Y
changes, you can apply an operation like np.diff
, but tailored to strings:
diffs = Y[:-1] != Y[1:]
The mask can be converted to split indices with np.flatnonzero
:
inds = np.flatnonzero(diffs) + 1
And finally, you can split the data:
data = np.split(X, inds, axis= 0)
For good measure, you can even convert the split data into a dictionary instead of a list:
labels = np.concatenate(([Y[0]], Y[inds]))
data = dict(zip(labels, data))
You can plot with a loop, but much more efficiently now.
for label, groupin data.items():
plt.scatter(group[:, 0], group[:, 2], label=label)
plt.legend(labels)
Post a Comment for "Matplotlib, How To Loop?"