Fold shaders doing "a * b + c" on integers from the pattern generated by
Nvidia's GL compiler.
On a somewhat complex compute shader it reduces the code size by 16
instructions from 2 matches on Turing GPUs.
On Intel as extracted from KHR_pipeline_executable_properties:
Before the optimization:
```
Instruction Count: 2057
Basic Block Count: 45
Scratch Memory Size: 14752
Spill Count: 232
Fill Count: 261
SEND Count: 610
Cycle Count: 11325
```
After the optimization:
```
Instruction Count: 2046
Basic Block Count: 44
Scratch Memory Size: 13728
Spill Count: 219
Fill Count: 268
SEND Count: 604
Cycle Count: 11367
```