Skip to content

Commit 0dae544

Browse files
committed
add docs
1 parent 678dfa3 commit 0dae544

File tree

3 files changed

+87
-0
lines changed

3 files changed

+87
-0
lines changed

docs/index.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# bbw docs
2+
3+
*bbw* is a semantic annotator "*b*oosted *b*y *w*iki" for linking tabular data without metadata to a [Wikibase](https://wikiba.se) instance (e.g., [Wikidata](https://www.wikidata.org)) via contextual matching and meta-lookup (metasearch).
4+
5+
* Annotates tabular data with the entities, types and properties in [Wikidata](https://www.wikidata.org).
6+
* Easy to use: `bbw.annotate()`.
7+
* Resolves even tricky spelling mistakes via meta-lookup through the [SearX](https://github.com/searx/searx) metasearch engine.
8+
* Matches to the up-to-date values in [Wikidata](https://www.wikidata.org) without the dump files.
9+
* Ranked in third place at [SemTab2020](https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020).
10+
11+
## Installation
12+
13+
You can use pip to install *bbw*:
14+
```
15+
pip install bbw
16+
```
17+
18+
The latest version can be installed directly from github:
19+
```
20+
pip install git+https://github.com/UB-Mannheim/bbw
21+
```
22+
23+
Install also [SearX](https://github.com/searx/searx), because *bbw* meta-lookups through it.
24+
```shell
25+
export PORT=80
26+
docker pull searx/searx
27+
docker run --rm -d -v ${PWD}/searx:/etc/searx -p $PORT:8080 -e BASE_URL=http://localhost:$PORT/ searx/searx
28+
```
29+
SearX is running on http://localhost:80. *bbw* sends GET requests to it.

docs/quickstart.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Quickstart
2+
3+
Before you install *bbw*, check our short tutorial in binder:
4+
[![badge](https://img.shields.io/badge/tutorial-binder-579ACA.svg?logo=)](https://mybinder.org/v2/gh/UB-Mannheim/bbw/main?filepath=bbw.ipynb)
5+
6+
## Basic usage of the main functions
7+
8+
To test the main functions, import *bbw* in Python:
9+
```python
10+
from bbw import bbw
11+
```
12+
13+
### annotate()
14+
*The easiest way* to annotate the dataframe `Y=bbw.pd.DataFrame([['0','1'],['Mannheim','Rhine']])` is:
15+
```python
16+
[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(Y)
17+
```
18+
It returns a list of six dataframes. The first three dataframes contain the annotations in the form of HTML-links, URLs and labels of the entities in Wikidata correspondingly. The dataframes have two more rows than Y. These two rows contain the annotations for types and properties. The last three dataframes contain the annotations in the format required by [SemTab2020](https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020) challenge.
19+
20+
### preprocessing(), contextual_matching() & postprocessing()
21+
*The fastest way* to annotate the dataframe Y is:
22+
```python
23+
[cpa_list, cea_list, nomatch] = bbw.contextual_matching(bbw.preprocessing(Y))
24+
[cpa, cea, cta] = bbw.postprocessing(cpa_list, cea_list)
25+
```
26+
The dataframes ```cpa```, ```cea``` and ```cta``` contain the annotations in [SemTab2020](https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020)-format. The list ```nomatch``` contains the labels which are not matched. The unprocessed and possibly non-unique annotations are in the lists ```cpa_list``` and ```cea_list```.
27+
28+
## GUI (graphical user interface)
29+
30+
If you need to annotate only one table, use the simple GUI:
31+
```shell
32+
streamlit run bbw_gui.py
33+
```
34+
35+
Open the browser at http://localhost:8501 and choose a CSV-file. The annotation process starts automatically. It outputs the six tables of the annotate function.
36+
37+
You can test GUI (without SearX support) at:
38+
[![badge](https://img.shields.io/badge/GUI-binder-579ACA.svg?logo=)](https://mybinder.org/v2/gh/UB-Mannheim/bbw/main?urlpath=proxy/8501/)
39+
40+
## CLI (command line tool)
41+
42+
If you need to annotate a few tables, use the CLI-tool:
43+
```shell
44+
python3 bbw_cli.py --amount 100 --offset 0
45+
```
46+
## Fast annotations with GNU parallel
47+
48+
If you need to annotate hundreds or thousands of tables, use the script with GNU parallel:
49+
```shell
50+
./bbw_parallel.py
51+
```

mkdocs.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
site_name: bbw
2+
site_url: https://ub-mannheim.github.io/bbw/
3+
site_description: bbw documentation
4+
nav:
5+
- Home: index.md
6+
- Quickstart: quickstart.md
7+
theme: readthedocs

0 commit comments

Comments
 (0)