|
| 1 | +# Quickstart |
| 2 | + |
| 3 | +Before you install *bbw*, check our short tutorial in binder: |
| 4 | +[](https://mybinder.org/v2/gh/UB-Mannheim/bbw/main?filepath=bbw.ipynb) |
| 5 | + |
| 6 | +## Basic usage of the main functions |
| 7 | + |
| 8 | +To test the main functions, import *bbw* in Python: |
| 9 | +```python |
| 10 | +from bbw import bbw |
| 11 | +``` |
| 12 | + |
| 13 | +### annotate() |
| 14 | +*The easiest way* to annotate the dataframe `Y=bbw.pd.DataFrame([['0','1'],['Mannheim','Rhine']])` is: |
| 15 | +```python |
| 16 | +[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(Y) |
| 17 | +``` |
| 18 | +It returns a list of six dataframes. The first three dataframes contain the annotations in the form of HTML-links, URLs and labels of the entities in Wikidata correspondingly. The dataframes have two more rows than Y. These two rows contain the annotations for types and properties. The last three dataframes contain the annotations in the format required by [SemTab2020](https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020) challenge. |
| 19 | + |
| 20 | +### preprocessing(), contextual_matching() & postprocessing() |
| 21 | +*The fastest way* to annotate the dataframe Y is: |
| 22 | +```python |
| 23 | +[cpa_list, cea_list, nomatch] = bbw.contextual_matching(bbw.preprocessing(Y)) |
| 24 | +[cpa, cea, cta] = bbw.postprocessing(cpa_list, cea_list) |
| 25 | +``` |
| 26 | +The dataframes ```cpa```, ```cea``` and ```cta``` contain the annotations in [SemTab2020](https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020)-format. The list ```nomatch``` contains the labels which are not matched. The unprocessed and possibly non-unique annotations are in the lists ```cpa_list``` and ```cea_list```. |
| 27 | + |
| 28 | +## GUI (graphical user interface) |
| 29 | + |
| 30 | +If you need to annotate only one table, use the simple GUI: |
| 31 | +```shell |
| 32 | +streamlit run bbw_gui.py |
| 33 | +``` |
| 34 | + |
| 35 | +Open the browser at http://localhost:8501 and choose a CSV-file. The annotation process starts automatically. It outputs the six tables of the annotate function. |
| 36 | + |
| 37 | +You can test GUI (without SearX support) at: |
| 38 | +[](https://mybinder.org/v2/gh/UB-Mannheim/bbw/main?urlpath=proxy/8501/) |
| 39 | + |
| 40 | +## CLI (command line tool) |
| 41 | + |
| 42 | +If you need to annotate a few tables, use the CLI-tool: |
| 43 | +```shell |
| 44 | +python3 bbw_cli.py --amount 100 --offset 0 |
| 45 | +``` |
| 46 | +## Fast annotations with GNU parallel |
| 47 | + |
| 48 | +If you need to annotate hundreds or thousands of tables, use the script with GNU parallel: |
| 49 | +```shell |
| 50 | +./bbw_parallel.py |
| 51 | +``` |
0 commit comments