Skip to content

Specifying class names for files when using generalize-tsvs #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ptgolden opened this issue Apr 23, 2025 · 0 comments
Open

Specifying class names for files when using generalize-tsvs #158

ptgolden opened this issue Apr 23, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@ptgolden
Copy link
Contributor

ptgolden commented Apr 23, 2025

We're trying to create a workflow using Schema Automator that requires as little intervention as possible after generating a schema from multiple TSVs. An issue we've run into is being able to specify the class names derived from different files.

When running schemauto generalize-tsvs filea.tsv fileb.tsv filec.tsv, the names of the resulting classes are derived from the individual filenames. That derivation happens here:

for file in files:
c = os.path.splitext(os.path.basename(file))[0]

I decided to write a small function as a replacement to CSVDataGeneralizer.convert_multiple that allows me to set class names explicitly. (And also set some metadata like id, name and description which are not configurable themselves).

This works fine, except that I'm not able to infer foreign keys using this method-- the reason being that CSVDataGeneralizer.infer_linkages uses the same method for deriving class names:

for file in files:
c = os.path.splitext(os.path.basename(file))[0]

Should it be possible to explicitly specify class names here? I'm not sure what a CLI flag would look like that allows this. Maybe something like schemauto generalize-tsvs --class ClassA=filea.tsv --class ClassB=fileb.tsv --class ClassC=filec.tsv.

For the time being, I can run the builtin convert_multiple function and then replace any values in the resulting schema in code. (EDIT: That was a poor idea. The better workaround is just to create a soft link of the file where the file name is the desired class name).

@ptgolden ptgolden added the enhancement New feature or request label Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant