Member-only story

Sequence Encoder Summary

2 min readAug 20, 2022

Take in a sequence of words that correspond to the n-words and map each of these words to a vector (word embeddings), where word embeddings = meaning of the words.
Positional embeddings (d dimensions) are applied to account for word order as word embeddings in their basic form do not consider the contextual information of the surrounding words and the order of the words.
Combining both positional and word embeddings, we obtain words that characterize the word's meaning via word embedding and position via positional embedding.
Each word vector is now considered as a Key (K), Value (V) and Query (Q), which is passed through an attention network (the attention network considers the context of surrounding words).
At this stage, we have the word's meaning via the word embedding, the context of surrounding words from the attention network and the position of the word via the positional embeddings.
Skip connection to avoid losing the original embedded vectors by skipping the word vectors around the attention network and adding to the output of the attention network.
A feedforward network is then applied to the word vectors. The network helps to regularize the vectors and, via the tanh activation function, restricts the output to -1 and 1.

Written by LZP Data Science