Skip to content

Reading relatively big contexts #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TareqAlbeeshG opened this issue Jan 15, 2025 · 2 comments
Open

Reading relatively big contexts #162

TareqAlbeeshG opened this issue Jan 15, 2025 · 2 comments

Comments

@TareqAlbeeshG
Copy link

TareqAlbeeshG commented Jan 15, 2025

So i Have a relatively big context (8124 x 119). and it's in a Burmeister format.
it's taking forever to read it every time. what would be the fastest format to store such a context in for the purpose of a faster read ?

@tomhanika
Copy link
Owner

Dear Tareq,

I am sorry to hear that. Could you provide some more details about your workflow?

Given the shape of the context, I assume you are loading the/a scaled version of the mushroom data set.

Using the read-context method see here and Burmeister representation the data set should be available in less than a second:

conexp.analysis> (time (context-size (read-context "mushroom.ctx")))
"Elapsed time: 258.607234 msecs"
;; => [8124 119 0.1932773109243697]

I therefore suspect you are loading the data set using the GUI? I just tested it and can confirm that it takes about 30 seconds. However, most of the time is spent on rendering the data into the GUI. I am not sure if this can be remedied soon.

Best regards

Tom

@itchy2385
Copy link

itchy2385 commented Feb 25, 2025

Dear Tom,

I too am experiencing a timeout issue when using the REST API context endpoint: REST-API#context-JSON. The problem seems to be that the length of the incidence data array grows very quickly.

To reproduce this problem, compare the internal read request with the API request. Consider the following two records. Firstly, seasoningplanner_de.cxt – a publicly accessible artificial data set. And secondly, clothing.cxt.txt (please rename to *.cxt), a more realistic dataset of the same size, but with denser incidence structures.
Reading the second data set via the REST-API-json endpoint will result in the following request text, which will result in a timeout exception.
body.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants