Skip to content

Commit 4663a10

Browse files
committed
add errata
1 parent 6a20626 commit 4663a10

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# *Machine Learning and AI Beyond the Basics* Book
1+
# *Machine Learning Q and AI Beyond the Basics* Book
22

33

44

errata/README.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Errata
2+
3+
4+
#### Chapter 8
5+
6+
The following sentence in Chapter 8
7+
8+
> Transformers are easy to parallelize because they take a fixed-length sequence of word or image tokens as input.
9+
10+
Is misleading because we only work with fixed-size sequences specifically during pretraining, finetuning, and batched inference. I.e., where we collect multiple sequences in a batch. A better explanation could be the following:
11+
12+
> Like other deep learning architectures, transformers facilitate parallelization in batch training by handling sequences of word or image tokens. Although they can process variable-length sequences, in practice, sequences are often padded or truncated to fixed lengths for efficient parallel computation across multiple sequences.

0 commit comments

Comments
 (0)