-
-
Notifications
You must be signed in to change notification settings - Fork 84
Support SQL? #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have played with this quite a bit and my current thinking is written down here: tozd/node-reactive-postgres#7 (What would be needed to make this efficiently.) |
@mitar can we not use a subset of functions from an SQL ORM, to extract the fields that are going to be changed from an SQL statement? |
Which SQL ORM? So of course, you can always limit the SQL language to support only a subset. But I think only with joins and some aggregations (like GROUP BY) you will already have complications. Once you have support for joins and aggregations, the rest is easy. |
@mitar I understand what you are referring to. How to "follow those changes" for reactive graphs. That's a hard problem and it has been solved in publish-composite. We can employ the same strategy ? But without actual "joins" in SQL. |
I mean, I think it really depends what type of support for SQL you want to provide. BTW, properly supporting "following changes" is a known problem in SQL community and is called "incremental materialized view maintenance". There are research papers on this topic providing a lot of useful information. So it is not that this is impossible, just one has to sit down and implement the logic how this changes pass through. You can use triggers to detect any change on base tables and then you just propagate those changes into redis oplog (or just use LISTEN/NOTIFY which PostgreSQL already supports, so no need for redis at all). But then you have to compute how those changes influence reactive queries. And if you see those reactive queries as materialized views (only that they are materialized on the client side) then you just have to "incrementally maintain" them. In contrast with just re-running the query. |
So for mysql there is this package by @vlasky. I checked it but could not figure out if it propagates changes or if it just reruns the query. If it is based on numtel's package and if numtel's package for MySQL is similar to one for PostgreSQL, then it reruns. And then my package is faster. :-) At least for PostgreSQL is. |
With the MySQL package, you provide your own customer trigger functions for each reactive query. These are evaluated for every event in the binlog. If the trigger function returns true for a given binlog event, it will re-run the SQL query, subject to the minimum result fetch interval minInterval having elapsed. The data published to the client will be a key-based diff of the latest query result and the last triggered query result. The key can be derived from a single column, a combination of columns or the row index based on the options you choose. |
OK, so this is what I mentioned here as an option:
Also:
Incremental materialized view maintenance here is better, because it computes directly how to update the results, without having to re-run the whole query. |
I updated my previous comment to mention the throttling effect of the minimum result fetch interval setting minInterval on triggering a reactive MySQL query. In my unreleased experimental version of vlasky:mysql, one can instantiate a new object called a TableCache which is an in-memory copy of a specified MySQL table in its entirety i.e. initialised using SELECT * FROM the_table. MySQL binlog events are then used to keep that in-memory copy up-to-date without having to re-query the database. If I understood @mitar correctly, TableCache would be a highly simplified form of the "incremental materialized view maintenance" he described, but I can't imagine how one would make a comprehensive solution that supports queries like JOINs, aggregate functions and nested queries without going to the great effort of implementing what is a essentially a JavaScript MySQL engine. Such a complex engine would then consume additional CPU time and memory in Node.js. To date, we have found re-querying MySQL not to be such a big deal. vlasky:mysql uses prepared statements to reduce execution planning overhead and connection pooling so it can leverage the power of a MySQL database server with multiple CPU cores and we always ensure our queries are engineered to use indexes efficiently. |
It is not a MYSQL engine. It is just a dirty-flag propagation computation. You just have to compute which rows are changed by which operations. It can be done by compiling a query into a corresponding trigger, which then runs on every change and outputs which rows should be refetched. But yes, there is a reason why just commercial SQL products offer incremental materialized view maintenance. Because it is hard to implement. But they offer it because it is a performance advantage. I would claim that your approach of having in-memory copy is closest to a JavaScript MySQL engine. :-) Especially if you would want to properly support queries doing joins across those in-memory copies. I do something similar with reactive-postgresql, but instead of keeping a copy in memory in JavaScript, I use a temporary table in PostgreSQL and then use triggers to update it. So there is no need to use oplog. You just use triggers. SQL databases have those. |
Let's discuss
The text was updated successfully, but these errors were encountered: