Skip to content

Implementation of ROBERT on top of Transformers and Flux #15

Open
@tejasvaidhyadev

Description

@tejasvaidhyadev
  • This implementation is the same as Transformers.Bert with a tiny embeddings tweaks.
  • RoBERTa has the same architecture as BERT, but uses a byte-level BPE(implemented in BPE.jl) as a tokenizer (same as GPT-2) and uses a different pre-training scheme.
  • RoBERTa doesn’t have token_type_ids, you don’t need to indicate which token belongs to which segment. Just separate your segments with the separation (or </s>)

we can also wrapper Camembert (or the french version of BERT) around RoBERT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions