You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/04-vectorSpace.md
-46Lines changed: 0 additions & 46 deletions
Original file line number
Diff line number
Diff line change
@@ -69,13 +69,6 @@ corpus = np.array([[1,10],[8,8],[2,2],[2,2]])
69
69
print(corpus)
70
70
```
71
71
72
-
```txt
73
-
[[ 1 10]
74
-
[ 8 8]
75
-
[ 2 2]
76
-
[ 2 2]]
77
-
```
78
-
79
72
### Graphing our model
80
73
81
74
We don't just have to think of our words as columns. We can also think of them as dimensions, and the values as coordinates for each document.
@@ -87,11 +80,6 @@ corpusT = np.transpose(corpus)
87
80
print(corpusT)
88
81
```
89
82
90
-
```txt
91
-
[[ 1 8 2 2]
92
-
[10 8 2 2]]
93
-
```
94
-
95
83
```python
96
84
X = corpusT[0]
97
85
Y = corpusT[1]
@@ -154,10 +142,6 @@ origin = np.zeros([1,4])
154
142
print(origin)
155
143
```
156
144
157
-
```txt
158
-
[[0. 0. 0. 0.]]
159
-
```
160
-
161
145
```python
162
146
# draw our vectors
163
147
plt.quiver(origin, origin, X, Y, color=mycolors, angles='xy', scale_units='xy', scale=1)
@@ -166,8 +150,6 @@ plt.ylim(0, 12)
166
150
plt.show()
167
151
```
168
152
169
-
{alt='png'}
170
-
171
153
Document A and document D are headed in exactly the same direction, which matches our intution that both documents are in some way similar to each other, even though they differ in length.
172
154
173
155
#### Cosine Similarity
@@ -183,13 +165,6 @@ from sklearn.metrics.pairwise import cosine_similarity as cs
183
165
cs(corpus, D)
184
166
```
185
167
186
-
```txt
187
-
array([[0.7739573],
188
-
[1. ],
189
-
[1. ],
190
-
[1. ]])
191
-
```
192
-
193
168
Both A and D are considered similar by this metric. Cosine similarity is used by many models as a measure of similarity between documents and words.
194
169
195
170
### Generalizing over more dimensions
@@ -218,13 +193,6 @@ corpus = np.hstack((corpus, np.zeros((4,2))))
218
193
print(corpus)
219
194
```
220
195
221
-
```txt
222
-
[[ 1. 10. 0. 0.]
223
-
[ 8. 8. 0. 0.]
224
-
[ 2. 2. 0. 0.]
225
-
[ 2. 2. 0. 0.]]
226
-
```
227
-
228
196
```python
229
197
E = np.array([[0,2,1,1]])
230
198
F = np.array([[2,2,1,1]])
@@ -234,27 +202,13 @@ corpus = np.vstack((corpus, E))
234
202
print(corpus)
235
203
```
236
204
237
-
```txt
238
-
[[ 1. 10. 0. 0.]
239
-
[ 8. 8. 0. 0.]
240
-
[ 2. 2. 0. 0.]
241
-
[ 2. 2. 0. 0.]
242
-
[ 0. 2. 1. 1.]]
243
-
```
244
205
245
206
What do you think the most similar document is to document F?
246
207
247
208
```python
248
209
cs(corpus, F)
249
210
```
250
211
251
-
```txt
252
-
array([[0.69224845],
253
-
[0.89442719],
254
-
[0.89442719],
255
-
[0.89442719],
256
-
[0.77459667]])
257
-
```
258
212
259
213
This new document seems most similar to the documents B,C and D.
0 commit comments