Skip to content

Optimize binary operators with equal children even if side effect #7460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
29 changes: 16 additions & 13 deletions src/passes/OptimizeInstructions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -856,13 +856,11 @@ struct OptimizeInstructions
if (auto* ret = replaceZeroBitsWithZero(curr)) {
return replaceCurrent(ret);
}
// finally, try more expensive operations on the curr in
// the case that they have no side effects
if (!effects(curr->left).hasSideEffects()) {
if (ExpressionAnalyzer::equal(curr->left, curr->right)) {
if (auto* ret = optimizeBinaryWithEqualEffectlessChildren(curr)) {
return replaceCurrent(ret);
}
// finally, try more expensive operations on the curr
// regardless of whether they have side effects or not.
if (areConsecutiveInputsEqual(curr->left, curr->right)) {
if (auto* ret = optimizeBinaryWithEqualChildren(curr)) {
return replaceCurrent(ret);
}
}

Expand Down Expand Up @@ -5177,16 +5175,18 @@ struct OptimizeInstructions
return nullptr;
}

// given a binary expression with equal children and no side effects in
// either, we can fold various things
Expression* optimizeBinaryWithEqualEffectlessChildren(Binary* binary) {
// given a binary expression with equal children, we can fold various things
// regardless of side effects.
Expression* optimizeBinaryWithEqualChildren(Binary* binary) {
// TODO add: perhaps worth doing 2*x if x is quite large?
switch (binary->op) {
case SubInt32:
case XorInt32:
case SubInt64:
case XorInt64:
return LiteralUtils::makeZero(binary->left->type, *getModule());
return getDroppedChildrenAndAppend(
binary->left,
LiteralUtils::makeZero(binary->left->type, *getModule()));
case NeInt32:
case LtSInt32:
case LtUInt32:
Expand All @@ -5197,7 +5197,8 @@ struct OptimizeInstructions
case LtUInt64:
case GtSInt64:
case GtUInt64:
return LiteralUtils::makeZero(Type::i32, *getModule());
return getDroppedChildrenAndAppend(
binary->left, LiteralUtils::makeZero(Type::i32, *getModule()));
case AndInt32:
case OrInt32:
case AndInt64:
Expand All @@ -5213,7 +5214,9 @@ struct OptimizeInstructions
case LeUInt64:
case GeSInt64:
case GeUInt64:
return LiteralUtils::makeFromInt32(1, Type::i32, *getModule());
return getDroppedChildrenAndAppend(
binary->left,
LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));
default:
return nullptr;
}
Expand Down
56 changes: 9 additions & 47 deletions test/lit/ctor-eval/return_call.wast
Original file line number Diff line number Diff line change
Expand Up @@ -448,40 +448,15 @@
;; CHECK-NEXT: )
(module
;; Return call to self with different params, then stop evaluating.
;; CHECK: (type $0 (func (param i32)))
;; CHECK: (type $0 (func))

;; CHECK: (type $1 (func))
;; CHECK: (type $1 (func (param i32)))

;; CHECK: (import "env" "import" (func $import (type $1)))
;; CHECK: (import "env" "import" (func $import (type $0)))
(import "env" "import" (func $import))

;; CHECK: (global $g (mut i32) (i32.const 42))
(global $g (mut i32) (i32.const 0))

;; CHECK: (export "test" (func $test_2))

;; CHECK: (func $test (type $0) (param $0 i32)
;; CHECK-NEXT: (global.set $g
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: )
;; CHECK-NEXT: (if
;; CHECK-NEXT: (i32.eq
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: (i32.const 42)
;; CHECK-NEXT: )
;; CHECK-NEXT: (then
;; CHECK-NEXT: (call $import)
;; CHECK-NEXT: )
;; CHECK-NEXT: (else
;; CHECK-NEXT: (return_call $test
;; CHECK-NEXT: (i32.add
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization looks wrong - perhaps related to the issue with binary in my other comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this as soon as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code and tests as you suggested. However, it still the same for the test/lit/ctor-eval/return_call.wast. Thus this looks-wrong optimization is not related to the issue you mentioned (already fixed now). Is there any way to make it clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to debug it is to see which part of this PR causes that change. Try doing just one part of the PR and seeing if it happens. Sort of like bisection. Once we know the specific part of the PR, figuring it out may be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After debugging, the issue was caused by the logic handling the case EqInt32:

      case EqInt32:    // Issue Caused Here
      ... fallthrough...
        return getDroppedChildrenAndAppend(
          binary, LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));

In more detail, the issue lies in the logic dealing with the EqInt32 (side_effect, pure).
The following is the debugging process code for EqInt32, please focus on line6-11:

  1 case EqInt32: {
  2 if (effects(binary->left).hasSideEffects()) {
  3   if (effects(binary->right).hasSideEffects()) {
  4     return getDroppedChildrenAndAppend(
  5       binary, LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));
  6   } else {
  7     // This part causes that issue!
  8     // If we replace it to `return nullptr`, that's ok
  9     return getDroppedChildrenAndAppend(
 10       binary->left, LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));
 11   }
 12 } else { 
 13   return getDroppedChildrenAndAppend(
 14     binary->right, LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));
 15 }
 16 return LiteralUtils::makeFromInt32(1, Type::i32, *getModule());
 17 };

The logic in the else block (lines6-11) appears to be the cause of the issue. Replacing it with return nullptr resolves the problem, and everything works as expected. However, I’m unclear that why the pattern EqInt32 (side_effect, pure) specifically causes failures in the wasm2js checks.

In my understanding, the code logic

return getDroppedChildrenAndAppend(
  binary, LiteralUtils::makeFromInt32(1, Type::i32, *getModule()));

is able to handle all the other cases, such as EqInt64. Why doesn’t it work for EqInt32 (side_effect, pure)?

Is it my PR that is wrong, or does the wasm2js check really need to be updated? Could you please check it carefully?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's wrong. Your code looks right.

Perhaps see if the optimizations in ctor-eval matter,

// Do some useful optimizations after the evalling
{
PassRunner passRunner(&wasm);
passRunner.add("memory-packing"); // we flattened it, so re-optimize
// TODO: just do -Os for the one function
passRunner.add("remove-unused-names");
passRunner.add("dce");
passRunner.add("merge-blocks");
passRunner.add("vacuum");
passRunner.add("remove-unused-module-elements");
passRunner.run();
}
}

I would check if removing them makes a difference.

(func $test (export "test") (param i32)
(global.set $g
(local.get 0)
Expand All @@ -506,24 +481,11 @@
)
)

;; CHECK: (func $test_2 (type $0) (param $0 i32)
;; CHECK-NEXT: (if
;; CHECK-NEXT: (i32.eq
;; CHECK-NEXT: (local.tee $0
;; CHECK-NEXT: (i32.const 42)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 42)
;; CHECK-NEXT: )
;; CHECK-NEXT: (then
;; CHECK-NEXT: (call $import)
;; CHECK-NEXT: )
;; CHECK-NEXT: (else
;; CHECK-NEXT: (return_call $test
;; CHECK-NEXT: (i32.add
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK: (export "test" (func $test_2))

;; CHECK: (func $test_2 (type $1) (param $0 i32)
;; CHECK-NEXT: (local.set $0
;; CHECK-NEXT: (i32.const 42)
;; CHECK-NEXT: )
;; CHECK-NEXT: (call $import)
;; CHECK-NEXT: )
38 changes: 20 additions & 18 deletions test/lit/passes/optimize-instructions-ignore-traps.wast
Original file line number Diff line number Diff line change
Expand Up @@ -563,9 +563,11 @@
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 3)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.ne
;; CHECK-NEXT: (local.tee $1
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: (block (result i32)
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.tee $1
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
Expand Down Expand Up @@ -618,26 +620,26 @@
;; CHECK: (func $invalidate-conditionalizeExpensiveOnBitwise-ok (type $0) (param $0 i32) (param $1 i32) (result i32)
;; CHECK-NEXT: (if
;; CHECK-NEXT: (i32.eqz
;; CHECK-NEXT: (if (result i32)
;; CHECK-NEXT: (local.tee $1
;; CHECK-NEXT: (i32.and
;; CHECK-NEXT: (block (result i32)
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.tee $1
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
;; CHECK-NEXT: (then
;; CHECK-NEXT: (i32.lt_u
;; CHECK-NEXT: (i32.and
;; CHECK-NEXT: (i32.extend8_s
;; CHECK-NEXT: (i32.sub
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.lt_u
;; CHECK-NEXT: (i32.and
;; CHECK-NEXT: (i32.extend8_s
;; CHECK-NEXT: (i32.sub
;; CHECK-NEXT: (local.get $0)
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 255)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 3)
;; CHECK-NEXT: (i32.const 255)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (else
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: (i32.const 3)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
Expand Down
31 changes: 16 additions & 15 deletions test/lit/passes/optimize-instructions-mvp.wast
Original file line number Diff line number Diff line change
Expand Up @@ -10541,11 +10541,13 @@
)
;; CHECK: (func $add-sub-zero-reorder-2 (param $temp i32) (result i32)
;; CHECK-NEXT: (i32.add
;; CHECK-NEXT: (i32.sub
;; CHECK-NEXT: (local.tee $temp
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: (block (result i32)
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.tee $temp
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (local.get $temp)
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
;; CHECK-NEXT: (i32.const 2)
;; CHECK-NEXT: )
Expand All @@ -10556,8 +10558,8 @@
(local.tee $temp ;; in this order, the tee already comes first, so all is good for the optimization
(i32.const 1)
)
(i32.sub
(i32.const 0)
(i32.sub ;; replace optimized sub with a const zero because the operations are identical
(i32.const 0) ;; while preserving the side effect
(local.get $temp)
)
)
Expand Down Expand Up @@ -13874,11 +13876,8 @@
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (i32.or
;; CHECK-NEXT: (local.get $x)
;; CHECK-NEXT: (i32.or
;; CHECK-NEXT: (local.tee $x
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: (local.get $x)
;; CHECK-NEXT: (local.tee $x
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
Expand All @@ -13896,11 +13895,13 @@
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (i32.xor
;; CHECK-NEXT: (local.get $x)
;; CHECK-NEXT: (i32.xor
;; CHECK-NEXT: (local.tee $x
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: (block (result i32)
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.tee $x
;; CHECK-NEXT: (i32.const 1)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (local.get $x)
;; CHECK-NEXT: (i32.const 0)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
Expand Down
Loading