Skip to content

Add Cyclic Group Representation (CGR) class #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Zeldax64
Copy link
Contributor

@Zeldax64 Zeldax64 commented May 28, 2025

Introduce a new VSA class. Solves #181.

Description

CGR is very similar to MCR, but differs in bundling. This pull request introduces:

  • A novel BaseMCRTensor parent class, which is used by MCRTensor and CGRTensor.
  • CGRTensor, which uses a different bundling than MCRTensor.
  • Tests for CGR. It requires just a custom bundling test.

Additionally, this commit also solves a bug in the MCR: __torch_function__() implementation. This function parses args to search for MCRTensors and ensures they all have the same block_sizes.

block_sizes = set(a.block_size for a in args if hasattr(a, "block_size"))
if len(block_sizes) != 1:
raise RuntimeError(
f"Call to {func} must contain exactly one block size, got {list(block_sizes)}"
)

However, it treats args as a shallow 1D container. In this case, the following snippet breaks the code:

import torch
from torchhd import embeddings

id = embeddings.Random(4, 10, vsa='MCR', block_size=4).weight
t = torch.stack((id[0], id[1]))

In this minimal example, torch.stack() calls __torch_function__ with a 2D tuple that can't be parsed.

Checklist

  • I added/updated documentation for the changes.
  • I have thoroughly tested the changes.

The only problem at the moment is in the documentation. Due to the addition of BaseMCRTensor, the documentation of MCR and CGR is split, requiring users to read two mnual pages. Also, I wasn't able to add new CGRTensor and BaseMCRTensor classes into the proper table of contents.

Zeldax64 added 6 commits May 27, 2025 10:11
Add a new base class to support future implementation of variations of
MCR proposals.
There is a bug when args is a collection of collections instead of a
plain tuple. In this case, the old args parsing was unable to search for
block_size in nested structures containing BaseMCRTensor.

For instance, let `a` and `b` be BaseMCRTensor variables. Calling
`torch.stack((a, b))` results in error, as the `args` received in
`__torch_function__()` is a nested tuple.
Add a new VSA class named Cyclic Group Representation (CGR). This class
is similar to MCR, but differs in bundling.
Allow its usage in level and circular embeddings as done with MCRTensor.
Ensure both inputs are in the same shape.
The CGR should behave almost the same as MCR, but diverges in bundling.
Implement a custom bundling test for it.
Copy link
Member

@mikeheddes mikeheddes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR, this would be a great addition to the library. Your implementation looks good, I just added some minor comments. Good catch of the __torch_function__ bug btw.

@mikeheddes
Copy link
Member

It would also be good to add a link to the documentation page of the Cyclic Group Representation to the README. Note that the link will not work now but will work as soon as this PR gets merged (assuming you add the documentation page).

@mikeheddes
Copy link
Member

I can think of two approaches to making the documentation for each VSATensor in one page. One is to not have the BaseMCRTensor class and simply duplicate the implementation for both the MCRTensor and the CGRTensor. The second is to implement all the methods that need documentation on both child classes simply by calling super like so:

    def multibind(self) -> "MCRTensor":
        """Bind multiple hypervectors"""
        return super().multibind()

This will add the method and docstring to the documentation page. Then all you have to do to include the CGRTensor in the documentation is to add it to docs/torchhd.rst. I would probably go for the second option.

Zeldax64 added 5 commits May 29, 2025 12:50
Shorter syntax and more readable.
No need to check block size in CGR/MCR functions as the
__torch_function__() in BaseMCRTensor already checks it.
- Raise ValueError instead of RuntimeError
Include CGRTensor in the README and in the built docs.
@Zeldax64
Copy link
Contributor Author

I've just pushed some commits. Regarding your comments,

Parameter checking

The use of assertions and RuntimeError stems from the current MCRTensor implementation. I just copied and pasted the code into BaseMCRTensor.

Do self and other require the same block_size for the similarity to make sense? If so, we should add the same error handling as in the bind method. However, it might be that the error handling in __torch_function__ already takes care of this, in which case there is no need for error handling in here or in bind

CGR and MCR operations (bind, bundle, and similarity) must always be done with hypervectors adopting the same block_size. After reading your comments, I took a deeper look into __torch_function__ calls, and I believe it is safe to remove any type checking from bind() and bundle() functions, as __torch_function__ already excessively checks the parameters.

In fact, there might be a slight performance issue since __torch_function__ is called on any PyTorch operation, including functions as __get__ or shape, which wouldn't require block_size checking.

Documentation

Then all you have to do to include the CGRTensor in the documentation is to add it to docs/torchhd.rst. I would probably go for the second option.

After some tests considering this option, I feel like repeating function signatures also isn't the way to go, as the code isn't DRY. Maybe let users check two doc pages and link CGR and MCR with BaseMCRTensor?

@mikeheddes
Copy link
Member

mikeheddes commented Jun 2, 2025

Thank you for the edits.

After some tests considering this option, I feel like repeating function signatures also isn't the way to go, as the code isn't DRY. Maybe let users check two doc pages and link CGR and MCR with BaseMCRTensor?

When weighing between user experience or code style, I would suggest we lean towards a better user experience in favor of the coding best practices. Said plainly, the users don't care if our code is DRY but they get annoyed if they have to read the documentation of one class in two places. So I would suggest we implement all the methods that need documentation on the child classes and have them call super.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants