Why did the compiler do that - the one with VRP and integer size

Here’s a simple program - it right shifts a signed int by 9 bits, truncates it to an unsigned char, and returns 1 or 0 after comparing the result with 0x74.

$ cat test.c
extern int e;
int main() {
unsigned char result = e >> 9; // result = 0x74;

if ((int)result != 0x74)
  return 1;
return 0;

Let’s compile this for the avr target, choosing to optimize for size.

$ avr-gcc -mmcu=avr51 -Os test.c -S

Well, this can’t be that complicated - in the generated code, we wll probably see code for right shifting followed by a compare and branch.

$ cat test.s
        ldi r24,lo8(1)
        ldi r25,0

All the code does is return 1 unconditionally - no comparison to be seen anywhere. That’s all the code there is.

Wat? Why did the compiler do that?

Let’s dig in deeper. Is it the frontend/middle-end or the backend that is doing this? Let’s try dumping output from the final pass of the middle-end.

$ avr-gcc -mmcu=avr51 -Os test.c -S -fdump-tree-optimized
$ cat test.c.219t.optimized 

;; Function main (main, funcdef_no=0, decl_uid=1494, cgraph_uid=0, symbol_order=0) (executed once)

main ()
  <bb 2>:
  return 1;


No code for the if statement there, so that rules out backend passes and the avr target code. How do we find out which of the many front-end/middle-end passes did this?

Binary search (or git-bisect) style analysis is how. Let’s dump output from all the tree passes.

$ avr-gcc -mmcu=avr51 -Os test.c -S -fdump-tree-all
$ ls
test.c                           test.c.034t.esra               test.c.094t.forwprop2   test.c.117t.isolate-paths  test.c.172t.thread3
test.c.001t.tu                   test.c.035t.ealias             test.c.095t.objsz2      test.c.118t.phicprop1      test.c.173t.dom3
test.c.002t.class                test.c.036t.fre1               test.c.096t.alias       test.c.119t.dse2           test.c.175t.thread4
test.c.003t.original             test.c.037t.evrp               test.c.097t.retslot     test.c.120t.reassoc1       test.c.176t.vrp2
test.c.004t.gimple               test.c.038t.mergephi1          test.c.098t.fre3        test.c.121t.dce3           test.c.177t.phicprop2
test.c.006t.omplower             test.c.039t.dse1               test.c.099t.mergephi2   test.c.122t.forwprop3      test.c.178t.dse3
test.c.007t.lower                test.c.040t.cddce1             test.c.100t.thread1     test.c.123t.phiopt2        test.c.179t.cddce2
test.c.010t.eh                   test.c.041t.eipa_sra           test.c.101t.vrp1        test.c.124t.ccp3           test.c.180t.forwprop4
test.c.011t.cfg                  test.c.042t.tailr1             test.c.103t.dce2        test.c.125t.sincos         test.c.181t.phiopt3
test.c.012t.ompexp               test.c.043t.switchconv         test.c.104t.stdarg      test.c.126t.bswap          test.c.182t.fab1
test.c.018t.fixup_cfg1           test.c.045t.profile_estimate   test.c.105t.cdce        test.c.127t.laddress       test.c.183t.widening_mul
test.c.019t.ssa                  test.c.046t.local-pure-const1  test.c.107t.copyprop1   test.c.128t.lim2           test.c.184t.tailc
test.c.021t.nothrow              test.c.047t.fnsplit            test.c.108t.ifcombine   test.c.129t.crited1        test.c.185t.dce7
test.c.026t.fixup_cfg3           test.c.048t.release_ssa        test.c.109t.mergephi3   test.c.130t.pre            test.c.186t.crited2
test.c.027t.inline_param1        test.c.049t.inline_param2      test.c.110t.phiopt1     test.c.131t.sink           test.c.188t.uncprop1
test.c.028t.einline              test.c.086t.fixup_cfg4         test.c.111t.tailr2      test.c.135t.dce4           test.c.189t.local-pure-const2
test.c.029t.early_optimizations  test.c.088t.oaccdevlow         test.c.112t.ch2         test.c.136t.fix_loops      test.c.218t.nrv
test.c.030t.objsz1               test.c.090t.ccp2               test.c.113t.cplxlower1  test.c.162t.no_loop        test.c.219t.optimized
test.c.031t.ccp1                 test.c.091t.cunrolli           test.c.114t.sra         test.c.165t.veclower21     test.c.300t.statistics
test.c.032t.forwprop1            test.c.092t.backprop           test.c.115t.thread2     test.c.168t.reassoc2       testresults
test.c.033t.ethread              test.c.093t.phiprop            test.c.116t.dom2        test.c.169t.slsr           test.s

Let’s pick an early pass (test.c.003t.original) dump and see if it has code for the if statement.

$ cat *.original

  unsigned char result = (unsigned char) (e >> 9);

    unsigned char result = (unsigned char) (e >> 9);
  if (result != 116)
      return 1;
  return 0;
return 0;

It clearly is there, so let’s pick a later pass that runs half way done the line, say test.c.116t.dom2, because 220 divided by 2 is 116 in hexadecimal. Just kidding!

$ cat *.dom2

main ()
  unsigned char result;

  <bb 2>:
  return 1;


Gone now, so let’s pick one that runs half way between between 003t.original and 116t.dom2. Rinse and repeat, and eventually we find that test.c.101t.vrp1 removes it.

$ cat *.vrp1

Value ranges after VRP:

e.0_1: VARYING
_2: [-64, 63]
_3: [1, 1]
result_5: ~[64, 191]

Folding predicate result_5 != 116 to 1
Removing basic block 3
Merging blocks 2 and 4
main ()
  unsigned char result;
  int e.0_1;
  int _2;

  <bb 2>:
  e.0_1 = e;
  _2 = e.0_1 >> 9;
  result_5 = (unsigned char) _2;
  return 1;


VRP (value range propagation) pass generates this dump, and you can clearly see “Folding predicate result_5 != 116 to 1” there. It somehow figured out the comparison will always succeed and optimized it away.

But why did it do that?

Again, the dump explains why - VRP also says

result_5: ~[64, 191]

VRP figured out result_5 cannot take values between 64, 191. The tilde represents an anti-range; as the name implies, it’s the range of values a variable cannot take. 0x74 (116 decimal) is in the anti-range, so VRP rightly concluded that the if statement is always true.

How did the VRP pass deduce the anti-range though?

Well, we could dig into the code (gcc/gcc/tree-vrp.c), or we could try to reason it out ourselves. Let’s try the latter - if we’re stuck, we can always fall back to the source.

Let’s see, the code rights shifts a signed integer (e) by 9 bits and assigns it to and unsigned char (result). A signed integer is 16 bits on the AVR. If you right shift that by 9 bits, you have to end up with one of the below patterns for the integer, because of the way sign extension works.

non-negative int  0000 0000 00xx xxxx
negative int      1111 1111 11xx xxxx

If you had a non-negative integer, the top 9 bits will be zeros, because the MSB would have been zero before shifting. For a negative integer, the MSB would have been one, so the top 9 bits will end up as ones. Ok, we’ve made some headway - what could we deduce further?

Well, the C code truncates the integer to an unsigned char before promoting it back as an int, so we really only need to look at the lower 8 bits.

non-negative int   00xx xxxx
negative int       11xx xxxx

If it must be one those two bit patterns, then it cannot be the bit patterns 01xx xxxx and 10xx xxxx, for any combination of 0’s and 1’s for x. Let’s try substituting all zeros and all ones to find out what range those bit patterns represent.

0100 0000 = 0x40 = 64
1011 1111 = 0xBF = 191

Bingo - that is exactly the same range that the VRP pass deduced. The VRP pass is right, that variable can never hold the value 0x74. so it is perfectly ok to fold the conditional and eventually delete the if statement.

Whew, now we know why the compiler did that!

All this came from a testcase (gcc/gcc/testsuite/gcc.dg/torture/pr69941.c) in the gcc testsuite that fails only for the avr target.

$ cat 
/* { dg-do run } */

int a = 0;
int b = 0;
int c = 0;
int e = 0;
int f = 0;
int *g = &e;

int fn1() { return b ? a : b; }

int main() {
  int h = fn1() <= 0x8000000000000000ULL; // h = 1;

  int k = f; // k = 0;

  long i = h ? k : k / h; // i = 0;

  long l = (unsigned short)(i - 0x1800); // l = 0xe800

  i = l ? l : c; // i = 0xe800;

  *g = i; // *g = 0xe800; e = 0xe800;

  unsigned char result = e >> 9; // result = 0x74;

  if ((int)result != 0x74)
    __builtin_abort ();
  return 0;

The test compiles just fine. When run, the code calls abort, gcc testsuite speak for execution failure. We now know why - the last few lines of the testcase are the same as the code we were looking.

The code generated for that part looks like ths.

        lds r30,g
        lds r31,g+1
        std Z+1,r25
        st Z,r24
        call abort

Five lines down from .L9, there’s the unconditional call to abort.

Simple to fix though, all we have to do is ensure the integers involved are more than 16 bits wide. That is what this commit does.

comments powered by Disqus