Open
Description
- This implementation is the same as
Transformers.Bert
with a tiny embeddings tweaks. - RoBERTa has the same architecture as BERT, but uses a byte-level BPE(implemented in
BPE.jl
) as a tokenizer (same as GPT-2) and uses a different pre-training scheme. - RoBERTa doesn’t have token_type_ids, you don’t need to indicate which token belongs to which segment. Just separate your segments with the separation (or
</s>
)
we can also wrapper Camembert (or the french version of BERT) around RoBERT.
Metadata
Metadata
Assignees
Labels
No labels