Skip to content

License/Source Materials for BERT checkpoint files/vocab/settings #477

Closed
@ylockerman

Description

@ylockerman

Hi,

It seems like the BERT benchmark requires a number of ancillary files in addition to the Wikipedia data (i.e. Model checkpoints, vocab file, settings file) that are needed to reproduce the closed benchmark. However, I can't find any definitive source to the license of these files. Nor can I find the provenance of the checkpoint (i.e. what data is was trained on).

It would be very helpful if the above information was available so we could evaluate any legal risk of performing the benchmark.

Thank You

p.s. My assumption is that the model was trained from Wikipedia and the rest of the files are either CC or Apache 2.0. However, I could not find that documented anywhere and the license file in the google drive is ambiguous if it includes those files.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions