Skip to content

feat: proposed Dataset API changes #3060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
# https://docs.readthedocs.io/en/stable/config-file/v2.html
version: 2
# NOTE: not builing epub because epub does not know how to handle .ico files
# NOTE: not building epub because epub does not know how to handle .ico files
# which results in a warning which causes the build to fail due to
# `sphinx.fail_on_warning`
# https://github.com/sphinx-doc/sphinx/issues/10350
Expand Down
28 changes: 27 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
## 2025-01-17 RELEASE 7.1.3

A fix-up release that re-adds support for Python 3.8 after it was accidentally
removed in Release 7.1.2.

This release cherrypicks many additions to 7.1.2 added to 7.1.1 but leaves out
typing changes that are not compatible
with Python 3.8.

Also not carried over from 7.1.2 is the change from Poetry 1.x to 2.0.

Included are PRs such as _Defined Namespace warnings fix_, _sort longturtle
blank nodes_, _deterministic longturtle serialisation_ and _Dataset documentation
improvements_.

For the full list of included PRs, see the preparatory PR:
<https://github.com/RDFLib/rdflib/pull/3036>.

## 2025-01-10 RELEASE 7.1.2

A minor release that inadvertently removed support for Python 3.8. This release
how now been deleted.

All the improved features initially made available in this release that were
compatible with Python 3.8 have been preserved in the 7.1.3 release. The main
additions to 7.1.2 not preserved in 7.1.3 are updated type hints.

## 2024-10-17 RELEASE 7.1.1

This minor release removes the dependency on some only Python packages, in particular
Expand Down Expand Up @@ -31,7 +58,6 @@ Merged PRs:
* 2024-10-23 - build(deps-dev): bump ruff from 0.6.9 to 0.7.0
[PR #2942](https://github.com/RDFLib/rdflib/pull/2942)


## 2024-10-17 RELEASE 7.1.0

This minor release incorporates just over 100 substantive PRs - interesting
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ authors:
- family-names: "Stuart"
given-names: "Veyndan"
title: "RDFLib"
version: 7.1.1
date-released: 2024-10-28
version: 7.1.3
date-released: 2024-01-18
url: "https://github.com/RDFLib/rdflib"
doi: 10.5281/zenodo.6845245
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
BSD 3-Clause License

Copyright (c) 2002-2024, RDFLib Team
Copyright (c) 2002-2025, RDFLib Team
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ RDFLib

RDFLib is a pure Python package for working with [RDF](http://www.w3.org/RDF/). RDFLib contains most things you need to work with RDF, including:

* parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD
* parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig, JSON-LD and even HexTuples
* a Graph interface which can be backed by any one of a number of Store implementations
* store implementations for in-memory, persistent on disk (Berkeley DB) and remote SPARQL endpoints
* Store implementations for in-memory, persistent on disk (Berkeley DB) and remote SPARQL endpoints
* additional Stores can be supplied via plugins
* a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and Update statements
* SPARQL function extension mechanisms

Expand All @@ -29,10 +30,8 @@ The RDFlib community maintains many RDF-related Python code repositories with di

* [rdflib](https://github.com/RDFLib/rdflib) - the RDFLib core
* [sparqlwrapper](https://github.com/RDFLib/sparqlwrapper) - a simple Python wrapper around a SPARQL service to remotely execute your queries
* [pyLODE](https://github.com/RDFLib/pyLODE) - An OWL ontology documentation tool using Python and templating, based on LODE.
* [pyrdfa3](https://github.com/RDFLib/pyrdfa3) - RDFa 1.1 distiller/parser library: can extract RDFa 1.1/1.0 from (X)HTML, SVG, or XML in general.
* [pymicrodata](https://github.com/RDFLib/pymicrodata) - A module to extract RDF from an HTML5 page annotated with microdata.
* [pySHACL](https://github.com/RDFLib/pySHACL) - A pure Python module which allows for the validation of RDF graphs against SHACL graphs.
* [pyLODE](https://github.com/RDFLib/pyLODE) - An OWL ontology documentation tool using Python and templating, based on LODE
* [pySHACL](https://github.com/RDFLib/pySHACL) - A pure Python module which allows for the validation of RDF graphs against SHACL graphs
* [OWL-RL](https://github.com/RDFLib/OWL-RL) - A simple implementation of the OWL2 RL Profile which expands the graph with all possible triples that OWL RL defines.

Please see the list for all packages/repositories here:
Expand All @@ -43,8 +42,11 @@ Help with maintenance of all of the RDFLib family of packages is always welcome

## Versions & Releases

* `main` branch in this repository is the unstable release
* `7.1.1` current stable release, bugfixes to 7.1.0
* `main` branch in this repository is the current unstable release
* `7.1.3` current stable release, small improvements to 7.1.1
* `7.1.2` previously deleted release
* `7.1.1` previous stable release
* see <https://github.com/RDFLib/rdflib/releases/tag/7.1.1>
* `7.0.0` previous stable release, supports Python 3.8.1+ only.
* see [Releases](https://github.com/RDFLib/rdflib/releases)
* `6.x.y` supports Python 3.7+ only. Many improvements over 5.0.0
Expand All @@ -68,8 +70,6 @@ Some features of RDFLib require optional dependencies which may be installed usi
Alternatively manually download the package from the Python Package
Index (PyPI) at https://pypi.python.org/pypi/rdflib

The current version of RDFLib is 7.1.1, see the ``CHANGELOG.md`` file for what's new in this release.

### Installation of the current main branch (for developers)

With *pip* you can also install rdflib from the git repository with one of the following options:
Expand Down
6 changes: 5 additions & 1 deletion admin/get_merged_prs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@
print(f"Getting {url}")
with urllib.request.urlopen(url) as response:
response_text = response.read()
link_headers = response.info()["link"].split(",") if response.info()["link"] is not None else None
link_headers = (
response.info()["link"].split(",")
if response.info()["link"] is not None
else None
)

json_data = json.loads(response_text)
ITEMS.extend(json_data["items"])
Expand Down
155 changes: 155 additions & 0 deletions dataset_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
Incorporate the changes proposed from Martynas, with the exception of graphs(), which would now return a dictionary of graph names (URIRef or BNode) to Graph objects (as the graph's identifier would be removed).

```
add add_named_graph(uri: IdentifiedNode, graph: Graph) method
add has_named_graph(uri: IdentifiedNode) method
add remove_named_graph(uri: IdentifiedNode) method
add replace_named_graph(uri: IdentifiedNode, graph: Graph)) method
add graphs() method as an alias for contexts()
add default_graph property as an alias for default_context
add get_named_graph as an alias for get_graph
deprecate graph(graph) method
deprecate remove_graph(graph) method
deprecate contexts() method
Using IdentifiedNode as a super-interface for URIRef and BNode (since both are allowed as graph names in RDF 1.1).
```

Make the following enhancements to the triples, quads, and subject/predicate/object APIs.

Major changes:
P1. Remove `default_union` attribute and make the Dataset inclusive.
P2. Remove the Default Graph URI ("urn:x-rdflib:default").
P3. Remove Graph class's "identifier" attribute to align with the W3C spec, impacting Dataset methods which use the Graph class.
P4. Make the graphs() method of Dataset return a dictionary of named graph names to Graph objects.
Enhancements:
P5. Support passing of iterables of Terms to triples, quads, and related methods, similar to the triples_choices method.
P6. Default the triples method to iterate with `(None, None, None)`

With all of the above changes, including those changes proposed by Martynas, here are some examples:

```python
from rdflib import Dataset, Graph, URIRef, Literal
from rdflib.namespace import RDFS

# ============================================
# Adding Data to the Dataset
# ============================================

# Initialize the dataset
d = Dataset()

# Add a single triple to the Default Graph, and a single triple to a Named Graph
g1 = Graph()
g1.add(
(
URIRef("http://example.com/subject-a"),
URIRef("http://example.com/predicate-a"),
Literal("Triple A")
)
)
d.add_graph(g1)

# Add a Graph to a Named Graph in the Dataset.
g2 = Graph()
g2.add(
(
URIRef("http://example.com/subject-b"),
URIRef("http://example.com/predicate-b"),
Literal("Triple B")
)
)
d.add_named_graph(uri=URIRef("http://example.com/graph-B"), g2)

# ============================================
# Iterate over the entire Dataset returning triples
# ============================================

for triple in d.triples():
print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'))

# ============================================
# Iterate over the entire Dataset returning quads
# ============================================

for quad in d.quads():
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Get the Default graph
# ============================================

dg = d.default_graph # same as current default_context

# ============================================
# Iterate on triples in the Default Graph only
# ============================================

for triple in d.triples(graph="default"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the usefulness of this. Why not simply:

d.default_graph.triples()

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing "default_graph" as a convenience necessarily means there will be more than one way to iterate over the triples. There's no functional change from the current classes here, just name changes, you can already Dataset.triples(context=) and you can also Dataset.default_context.triples()

print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))

# ============================================
# Access quads in Named Graphs only
# ============================================

for quad in d.quads(graph="named"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be equivalent to simply d.quads()? Since the default graph does not produce quads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the graph element of the default graph None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the proposal is to have the "graph" of triples in the default graph set to None.

print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Equivalent to iterating over graphs()
# ============================================

for ng_name, ng_object in d.graphs().items():
for quad in d.quads(graph=ng_name):
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Access triples in the Default Graph and specified Named Graphs.
# ============================================

for triple in d.triples(graph=["default", URIRef("http://example.com/graph-B")]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.triples() doesn't really make sense? There should be Graph.triples() and Dataset.quads() only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with it - SPARQL queries in triplestores where named graphs are used frequently omit the graph, only having basic graph patterns, and we understand this to be across all graphs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union graph is an extension feature though, not a feature of an RDF dataset.

print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'))

# ============================================
# Access quads in the Default Graph and specified Named Graphs.
# ============================================

for quad in d.quads(graph=["default", URIRef("http://example.com/graph-B")]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for quad in (q for q in d.quads() if q[3] in (None, URIRef("http://example.com/graph-B"))): 

not much longer really.

Copy link
Contributor Author

@recalcitrantsupplant recalcitrantsupplant Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is the point to get a broader consensus on. The way I see it, if including the graph parameter:
Pros:

  • can restrict "named", "default" enums to only be used in the graph= attribute, and not in the quads methods.
  • can separate concerns a bit better, similar to how dataset clauses are used in SPARQL. E.g. set up an instance with graph= to restrict the scope to certain named graphs, then at runtime graphs can be passed in using quads
  • provides a convenience/clean interface for what is a common pattern (for me at least!)

Cons:

  • two ways to do the same thing, as you've pointed out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm too used to Jena where Dataset is used via getDefaultModel and getNamedModel, but I don't really see myself needing the new parameters 🤷‍♂️

print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# "Slice" the dataset on specified predicates. Same can be done on subjects, objects, graphs
# ============================================

filter_preds = [URIRef("http://example.com/predicate-a"), RDFS.label]
for quad in d.quads((None, filter_preds, None, None)):
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)

# ============================================
# Serialize the Dataset in a quads format.
# ============================================

print(d.serialize(format="nquads"))
# Output:
<http://example.com/subject-a> <http://example.com/predicate-a> "Triple A" .
<http://example.com/subject-b> <http://example.com/predicate-b> "Triple B" <http://example.com/graph-B> .
```
2 changes: 1 addition & 1 deletion devtools/requirements-poetry.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Fixing this here as readthedocs can't use the compiled requirements-poetry.txt
# due to conflicts.
poetry==1.8.4
poetry==1.8.5
4 changes: 1 addition & 3 deletions docker/latest/requirements.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# This file is used for building a docker image of the latest rdflib release. It
# will be updated by dependabot when new releases are made.
rdflib==7.1.0
rdflib==7.1.3
html5rdf==1.2.0
# html5lib-modern is required to allow the Dockerfile to build on with pre-RDFLib-7.1.1 releases.
html5lib-modern==1.2.0
10 changes: 4 additions & 6 deletions docker/latest/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
#
# This file is autogenerated by pip-compile with Python 3.12
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile docker/latest/requirements.in
#
html5rdf==1.2
# via
# -r docker/latest/requirements.in
# rdflib
html5lib-modern==1.2
# via -r docker/latest/requirements.in
isodate==0.7.2
# via rdflib
pyparsing==3.0.9
# via rdflib
rdflib==7.1.0
rdflib==7.1.3
# via -r docker/latest/requirements.in
14 changes: 11 additions & 3 deletions docs/apidocs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,18 @@ examples Package

These examples all live in ``./examples`` in the source-distribution of RDFLib.

:mod:`~examples.conjunctive_graphs` Module
------------------------------------------
:mod:`~examples.datasets` Module
--------------------------------

.. automodule:: examples.datasets
:members:
:undoc-members:
:show-inheritance:

:mod:`~examples.jsonld_serialization` Module
--------------------------------------------

.. automodule:: examples.conjunctive_graphs
.. automodule:: examples.jsonld_serialization
:members:
:undoc-members:
:show-inheritance:
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@

# General information about the project.
project = "rdflib"
copyright = "2009 - 2024, RDFLib Team"
copyright = "2002 - 2025, RDFLib Team"

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down
Loading
Loading