Skip to content

Commit 3731e50

Browse files
michaelnchinAustin KlineDave Bechbergerkrlawrence
authored
openCypher Query and Visualization Support (#153)
* copy from github v2.0.3 * port dbechbe@ working version, add syntax highlighting, add %%oc which calls into %%opencypher * opencypher iam auth * bolt support * rebase from github, refactor opencypher * get tests working for opencypher endpoint * Updated data processing code to make edge ids static so that the load is idempotent * rebase from 2.1.2 * - port dbechbe@ working version, add syntax highlighting, add %%oc which calls into %%opencypher - get tests working for opencypher endpoint - pull in client and add OC methods * Initial versions of notebooks updated for Neptune GA * Updated Neptune ML notebooks, utils, and pretrained models config * add support for modeltransform commands in %neptune_ml * Updated OC widget to handle new JSON format * Updated ML notebooks with feedback from Annupriya * Added back in missing init file for the Gremlin Network * Added support for openCypher syntax highlighting * Added missing init files and updated files that incorrectly referenced SPARQL instead of OC * WIP - Adding visualization to OC * WIP - Intiial rough visualization of OC results * WIP - updated to handle group vars passed in as json * Rebase on v2.1.3 and changes due for v2.1.4 * Resolve remaining merge conflicts from v2.1.2 rebase * Added comments and cleaned up code for initial OC visualization * Revert unintended changes to Gremlin tests * WIP - Adding visualization to OC * Cleaned up merge conflicts after merge from akline/OC * Fixed additional merge conflicts * Finally fixed merge conflicts from akline/OC branch * Copied code to set label display and label length * Fix Sparql tab widgets being displayed incorrectly, some PEP8 fixes * Changed the seed command to use 'Property Graph/RDF' as the data models instead of 'Gremlin/SPARQL' in order to support OC release * Removed tmp file used for building * Added opencypher support for bulk load * Cleaned up last few merge conflicts * PEP8 fixes * More PEP8 fixes * Update notebooks unit test with the new notebook paths * Fixed issue with seed command as well as default grouping not working correctly * Fixed issue where parsed lists of dictionaries were not remaining ordered * Updated Notebooks to refer to new seed command * Add '-de' param to Gremlin magic for specifying edge labels * Fix bug in adding dict type edges to graph, rearrange recent tests * Initial upload of new notebooks * Additional cleanup/tweaks * Add variable injection decorator to OC magics * Fixed casing on seed command labels for consistency * Introduce new features via text and tweak examples * Add --edge-display-property to OC magic for specifying edge labels * Update OC notebooks hints sections with -de param * Additional improvements to intro section * Additional examples and prose * Initial updates for the README - more needed * Additional README updates - more needed * Update URL for openCypher * Initial upload of sample OC images * Add link to OC sample image * Add another example using the -d hint * Update Gremlin sample image to show color * Tweak examples to use more color * Add a colorful graph image to the README * Additional pointers to notebooks * Additional updates * Fix bug where Gremlin node tooltips were not being changed when using the -d option * Additional examples that showcase new features * Rename some variables * More variable renaming * Additional small improvements * Add an example showing how to sample airports * Minor tweak to random sample example * Making use if verbs consistent * Verb consistency and clean up graphics reset * Fix incorrect option * Verb consistency * Improved a couple of examples * Add visualization support for elementMap() Gremlin step * Remove Direction.BOTH check * Remove merged redundancies * Updated ML notebooks based on feedback from Ankit * Additional discussion of elementMap usage * Update Visualization-Grouping-Coloring-Gremlin notebook with elementMap * [lakelvin@] Refactor %load form display to fix some descriptions being cut off * Rename Gremlin Grouping-Coloring sample notebook * Minor changes and rename files * Add examples of -d and -de without a map * Fix typo * Fix -d option not working in OC queries for string format values * Clean up debug statement * Fix OC metadata results count metric * Update ChangeLog for OC Release * Add ML updates to ChangeLog * Remove identity graph seed files * Remove extra chars from notebooks * Pin neo4j version * Styling fixes * More styling fixes * Update notebook directory validation unit test Co-authored-by: Austin Kline <[email protected]> Co-authored-by: Dave Bechberger <[email protected]> Co-authored-by: Michael Chin <[email protected]> Co-authored-by: Kelvin Lawrence <[email protected]>
1 parent 0a9d24b commit 3731e50

File tree

83 files changed

+8565
-983
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+8565
-983
lines changed

.gitignore

+4-1
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,7 @@ src/graph_notebook/widgets/lib/
2525
# npm
2626
node_modules/
2727
node_modules/.package-lock.json
28-
src/graph_notebook/widgets/package-lock.json
28+
src/graph_notebook/widgets/package-lock.json
29+
blazegraph.jnl
30+
rules.log
31+
*.env

ChangeLog.md

+45-24
Original file line numberDiff line numberDiff line change
@@ -3,47 +3,68 @@
33
Starting with v1.31.6, this file will contain a record of major features and updates made in each release of graph-notebook.
44

55
## Upcoming
6-
- Add visualization support for elementMap Gremlin step ([Link to PR](https://github.com/aws/graph-notebook/pull/140))
7-
- Support additional customization of edge node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/132))
8-
- Include index operations metrics in metadata results tab for Gremlin Profile queries([Link to PR](https://github.com/aws/graph-notebook/pull/150))
9-
- Update SPARQL EPL seed dataset file ([Link to PR](https://github.com/aws/graph-notebook/pull/134))
10-
- Update documentation on using `%%graph_notebook_config` with an IAM enabled Neptune cluster ([Link to PR](https://github.com/aws/graph-notebook/pull/136))
11-
- Fix improper handling of Blazegraph status response ([Link to PR](https://github.com/aws/graph-notebook/pull/137))
12-
- Fix Gremlin node tooltips being displayed incorrectly ([Link to PR](https://github.com/aws/graph-notebook/pull/139))
13-
- Fix bug in using Gremlin explain/profile with large result sets ([Link to PR](https://github.com/aws/graph-notebook/pull/141))
14-
- Pin RDFLib version ([Link to PR](https://github.com/aws/graph-notebook/pull/151))
6+
7+
**openCypher Support**:
8+
9+
With the release of support for the openCypher query language in Amazon Neptune's lab mode, graph-notebook can now be used to execute and visualize openCypher queries with any compatible graph database.
10+
11+
Two new magic commands have been added:
12+
- `%%oc`/`%%opencypher`
13+
- `%%oc_status`/`%%opencypher_status`
14+
15+
These openCypher magic commands inherit the majority of the query and visualization customization features that are already available in the Gremlin and SPARQL magics.
16+
17+
For more detailed information and examples of how you can execute and visualize openCypher queries through graph-notebook, please refer to the new `Air-Routes-openCypher` and `EPL-openCypher` sample notebooks.
18+
19+
**Other major updates**:
20+
- Added visualization support for elementMap Gremlin step ([Link to PR](https://github.com/aws/graph-notebook/pull/140))
21+
- Added support for additional customization of edge node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/132))
22+
- Refactored %load form display code for flexibility; fixes some descriptions being cut off
23+
- Updated Neptune ML notebooks, utils, and pretrained models config
24+
- Added support for `modeltransform` commands in `%neptune_ml`
25+
26+
**Minor updates**:
27+
- Included index operations metrics in metadata results tab for Gremlin Profile queries([Link to PR](https://github.com/aws/graph-notebook/pull/150))
28+
- Updated SPARQL EPL seed dataset file ([Link to PR](https://github.com/aws/graph-notebook/pull/134))
29+
- Updated documentation on using `%%graph_notebook_config` with an IAM enabled Neptune cluster ([Link to PR](https://github.com/aws/graph-notebook/pull/136))
30+
31+
**Bugfixes**:
32+
- Fixed improper handling of Blazegraph status response ([Link to PR](https://github.com/aws/graph-notebook/pull/137))
33+
- Fixed Gremlin node tooltips being displayed incorrectly ([Link to PR](https://github.com/aws/graph-notebook/pull/139))
34+
- Fixed bug in using Gremlin explain/profile with large result sets ([Link to PR](https://github.com/aws/graph-notebook/pull/141))
35+
- Pinned RDFLib version ([Link to PR](https://github.com/aws/graph-notebook/pull/151))
1536

1637
## Release 2.1.4 (June 27, 2021)
17-
- Support for additional customization of graph node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/127))
38+
- Added support for additional customization of graph node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/127))
1839

1940
## Release 2.1.3 (June 18, 2021)
20-
- Support dictionary value access in variable injection([Link to PR](https://github.com/aws/graph-notebook/pull/126))
41+
- Added support for dictionary value access in variable injection([Link to PR](https://github.com/aws/graph-notebook/pull/126))
2142

2243
## Release 2.1.2 (May 10, 2021)
2344

24-
- Pin gremlinpython to `<3.5.*` ([Link to PR](https://github.com/aws/graph-notebook/pull/123))
25-
- Add support for notebook variables in Sparql/Gremlin magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/113))
26-
- Add support for grouping by different properties per label in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/115))
27-
- Fix missing Boto3 dependency in setup.py ([Link to PR](https://github.com/aws/graph-notebook/pull/118))
28-
- Update %load execution time to HH:MM:SS format if over a minute ([Link to PR](https://github.com/aws/graph-notebook/pull/121))
45+
- Pinned gremlinpython to `<3.5.*` ([Link to PR](https://github.com/aws/graph-notebook/pull/123))
46+
- Added support for notebook variables in Sparql/Gremlin magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/113))
47+
- Added support for grouping by different properties per label in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/115))
48+
- Fixed missing Boto3 dependency in setup.py ([Link to PR](https://github.com/aws/graph-notebook/pull/118))
49+
- Updated %load execution time to HH:MM:SS format if over a minute ([Link to PR](https://github.com/aws/graph-notebook/pull/121))
2950

3051
## Release 2.1.1 (April 22, 2021)
3152

32-
- Fix bug in `%neptune_ml export ...` logic where the iam setting for the exporter endpoint wasn't getting picked up properly
53+
- Fixed bug in `%neptune_ml export ...` logic where the iam setting for the exporter endpoint wasn't getting picked up properly
3354

3455
## Release 2.1.0 (April 15, 2021)
3556

36-
- Add support for Mode, queueRequest, and Dependencies parameters when running %load command ([Link to PR](https://github.com/aws/graph-notebook/pull/91))
37-
- Add support for list and dict as map keys in Python Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/100))
38-
- Refactor modules that call to Neptune or other SPARQL/Gremlin endpoints to use a unified client object ([Link to PR](https://github.com/aws/graph-notebook/pull/104))
57+
- Added support for Mode, queueRequest, and Dependencies parameters when running %load command ([Link to PR](https://github.com/aws/graph-notebook/pull/91))
58+
- Added support for list and dict as map keys in Python Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/100))
59+
- Refactored modules that call to Neptune or other SPARQL/Gremlin endpoints to use a unified client object ([Link to PR](https://github.com/aws/graph-notebook/pull/104))
3960
- Added an additional notebook under [02-Visualization](src/graph_notebook/notebooks/02-Visualization) demonstrating how to use the visualzation grouping and coloring options in Gremlin. ([Link to PR](https://github.com/aws/graph-notebook/pull/107))
40-
- Add metadata output tab for magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/108))
61+
- Added metadata output tab for magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/108))
4162

4263
## Release 2.0.12 (Mar 25, 2021)
4364

44-
- Add default parameters for `get_load_status` ([Link to PR](https://github.com/aws/graph-notebook/pull/96))
45-
- Add ipython as a dependency in `setup.py` ([Link to PR](https://github.com/aws/graph-notebook/pull/95))
46-
- Add parameters in `load_status` for `details`, `errors`, `page`, and `errorsPerPage` ([Link to PR](https://github.com/aws/graph-notebook/pull/88))
65+
- Added default parameters for `get_load_status` ([Link to PR](https://github.com/aws/graph-notebook/pull/96))
66+
- Added ipython as a dependency in `setup.py` ([Link to PR](https://github.com/aws/graph-notebook/pull/95))
67+
- Added parameters in `load_status` for `details`, `errors`, `page`, and `errorsPerPage` ([Link to PR](https://github.com/aws/graph-notebook/pull/88))
4768

4869
## Release 2.0.10 (Mar 18, 2021)
4970

README.md

+27-10
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,26 @@
1-
## Graph Notebook: easily query and visualize graphs
1+
## Graph Notebook: easily query and visualize graphs
2+
3+
The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/), [openCypher](https://github.com/opencypher/openCypher) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph models. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/).
4+
5+
![A colorful graph picture](./images/ColorfulGraph.png)
26

3-
The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph model. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/).
47

58
### Visualizing Gremlin queries:
69

710
![Gremlin query and graph](./images/GremlinQueryGraph.png)
811

12+
### Visualizing openCypher queries
13+
14+
![openCypher query and graph](./images/OCQueryGraph.png)
15+
916
### Visualizing SPARQL queries:
1017

1118
![SPARL query and graph](./images/SPARQLQueryGraph.png)
1219

1320
Instructions for connecting to the following graph databases:
1421

1522
| Endpoint | Graph model | Query language |
16-
| :-----------------------------: | :---------------------: | :-----------------: |
23+
| :-----------------------------: | :---------------------: | :-----------------: |
1724
|[Gremlin Server](#gremlin-server)| property graph | Gremlin |
1825
| [Blazegraph](#blazegraph) | RDF | SPARQL |
1926
|[Amazon Neptune](#amazon-neptune)| property graph or RDF | Gremlin or SPARQL |
@@ -25,7 +32,9 @@ We encourage others to contribute configurations they find useful. There is an [
2532
#### Notebook cell 'magic' extensions in the IPython 3 kernel
2633
`%%sparql` - Executes a SPARQL query against your configured database endpoint.
2734

28-
`%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to what the Gremlin console would return.
35+
`%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to those a Gremlin console would return.
36+
37+
`%%opencypher` or `%%oc` Executes an openCypher query against your database.
2938

3039
`%%graph_notebook_config` - Sets the executing notebook's database configuration to the JSON payload provided in the cell body.
3140

@@ -41,18 +50,20 @@ We encourage others to contribute configurations they find useful. There is an [
4150

4251
`%sparql_status` - Obtain the status of SPARQL queries. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/sparql-api-status.html)
4352

53+
`%opencypher_status` or `%oc_status` - Obtain the status of openCypher queries.
54+
4455
`%load` - Generate a form to submit a bulk loader job. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html)
4556

4657
`%load_ids` - Get ids of bulk load jobs. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-status-examples.html)
4758

4859
`%load_status` - Get the status of a provided `load_id`. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-status-examples.html)
4960

50-
`%neptune_ml` - Set of commands to integrate with NeptuneML functionality. You can find a set of tutorial notebooks [here](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/04-Machine-Learning).
61+
`%neptune_ml` - Set of commands to integrate with NeptuneML functionality. You can find a set of tutorial notebooks [here](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/04-Machine-Learning).
5162
[Documentation](https://aws.amazon.com/neptune/machine-learning/)
5263

5364
`%status` - Check the Health Status of the configured host endpoint. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-status.html)
5465

55-
`%seed` - Provides a form to add data to your graph without the use of a bulk loader. both SPARQL and Gremlin have an airport routes dataset.
66+
`%seed` - Provides a form to add data to your graph without the use of a bulk loader. Supports both RDF and Property Graph data models.
5667

5768
`%graph_notebook_config` - Returns a JSON payload that contains connection information for your host.
5869

@@ -64,6 +75,13 @@ We encourage others to contribute configurations they find useful. There is an [
6475

6576
**TIP** :point_right: You can list all the magics installed in the Python 3 kernel using the `%lsmagic` command.
6677

78+
**TIP** :point_right: Many of the magic commands support a `--help` option in order to provide additional information.
79+
80+
## Example notebooks
81+
This project includes many example Jupyter notebooks. It is recommended to explore them. All of the commands and features supported by `graph-notebook` are explained in detail with examples within the sample notebooks. You can find them [here](./src/graph_notebook/notebooks/). As this project has evolved, many new features have been added. If you are already familiar with graph-notebook but want a quick summary of new features added, a good place to start is the Air-Routes notebooks in the [02-Visualization](./src/graph_notebook/notebooks/02-Visualization) folder.
82+
83+
## Keeping track of new features
84+
It is recommended to check the [ChangeLog.md](ChangeLog.md) file periodically to keep up to date as new features are added.
6785

6886
## Prerequisites
6987

@@ -74,7 +92,6 @@ You will need:
7492
* [Tornado](https://pypi.org/project/tornado/) 4.5.3
7593
* A graph database that provides a SPARQL 1.1 Endpoint or a Gremlin Server
7694

77-
7895
## Installation
7996

8097
```
@@ -102,7 +119,7 @@ jupyter notebook ~/notebook/destination/dir
102119

103120
## Connecting to a graph database
104121

105-
### Gremlin Server
122+
### Gremlin Server
106123

107124
In a new cell in the Jupyter notebook, change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. For a local Gremlin server (HTTP or WebSockets), you can use the following command:
108125

@@ -154,7 +171,7 @@ You can also make use of namespaces for Blazegraph by specifying the path `graph
154171
}
155172
```
156173

157-
This will result in the url `localhost:9999/blazegraph/namespace/foo/sparql` being used when executing any `%%sparql` magic commands.
174+
This will result in the url `localhost:9999/blazegraph/namespace/foo/sparql` being used when executing any `%%sparql` magic commands.
158175

159176
To setup a new local Blazegraph database for use with the graph notebook, check out the [Quick Start](https://github.com/blazegraph/database/wiki/Quick_Start) from Blazegraph.
160177

@@ -175,7 +192,7 @@ Change the configuration using `%%graph_notebook_config` and modify the defaults
175192
```
176193
To setup a new Amazon Neptune cluster, check out the [AWS documentation](https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-launch.html).
177194

178-
When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune).
195+
When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune).
179196

180197
## Authentication (Amazon Neptune)
181198

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Connecting graph notebook to Blazegraph SPARQL Endpoint
2+
3+
The official SPARQL endpoint for DBPedia is available from https://dbpedia.org/sparql and is based on a Virtuoso engine.
4+
5+
It is possible to connect to this endpoint using the following configuration:
6+
7+
```
8+
%%graph_notebook_config
9+
{
10+
"host": "dbpedia.org",
11+
"port": 443,
12+
"auth_mode": "DEFAULT",
13+
"iam_credentials_provider_type": "ROLE",
14+
"load_from_s3_arn": "",
15+
"ssl": true,
16+
"aws_region": ""
17+
}
18+
```

images/ColorfulGraph.png

128 KB
Loading

images/GremlinQueryGraph.png

70.4 KB
Loading

images/OCQueryGraph.png

234 KB
Loading

requirements.txt

+7-3
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,15 @@ notebook==5.7.10
77
ipywidgets==7.5.1
88
jupyter-contrib-nbextensions
99
widgetsnbextension
10-
gremlinpython
10+
gremlinpython<=3.4.*
1111
requests==2.24.0
1212
ipython==7.16.1
13+
neo4j==4.2.1
14+
rdflib~=5.0.0
15+
traitlets~=4.3.3
16+
setuptools~=40.6.2
1317

1418
# requirements for testing
15-
boto3==1.15.15
16-
botocore==1.18.18
19+
botocore~=1.18.18
20+
boto3~=1.15.15
1721
pytest==6.2.2

setup.py

+1
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ def get_version():
8181
'botocore>=1.19.37',
8282
'boto3>=1.17.58',
8383
'ipython>=7.16.1',
84+
'neo4j==4.3.2',
8485
'rdflib==5.0.0'
8586
],
8687
package_data={

src/graph_notebook/magics/completers/graph_completer.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@
1414
'GRAPH',
1515
'FILTER',
1616
'ASK',
17-
'DESCRIBE']
17+
'DESCRIBE',
18+
'UNLOAD']
1819
GREMLIN_OPTIONS = [
1920
'.toString',
2021
'.tx',

0 commit comments

Comments
 (0)