I compared two GMP-ECM 6.3 builds under Linux. One compiled with GMP 5.0.1 and another with GMP 4.1.4

I got several strange results. In overall GMP 5.0.1 is better by 5-15% but with B1=11e6 with some ranges (tested 100-300digits) 4.1.4 was better. Some examples follows.

Code:

1. C121 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=334640802
Step 1 took 36869ms
Step 2 took 19737ms
GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2340904304
Step 1 took 35097ms
Step 2 took 33626ms
GMP 5.0.1 is significantly slower again on step 2.
2. C156 from aliquot seq 283752:i7004
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=4153245810
Step 1 took 55526ms
Step 2 took 26975ms
GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2955949299
Step 1 took 57614ms
Step 2 took 39257ms
Again step 2 with GMP 5.0.1 is much slower.
3. C209 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2560444052
Step 1 took 75055ms
Step 2 took 36402ms
GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=3908589128
Step 1 took 76103ms
Step 2 took 46634ms
Step 2 with GMP 5.0.1 is slower by 10sec.
With B1=3e6 all is OK - 5.0.1 is slightly better than 4.1.4
1. C121
Step 1 took 9562ms
Step 2 took 4803ms
vs.
Step 1 took 10009ms
Step 2 took 6219ms
2. C156
Step 1 took 15440ms
Step 2 took 6315ms
vs.
Step 1 took 15102ms
Step 2 took 8532ms
3. C209
Step 1 took 20846ms
Step 2 took 8188ms
vs.
Step 1 took 20306ms
Step 2 took 11598ms

I repeated tests 10x times and always got the same results. What's wrong?

Compile options: --enable-openmp --with-gmp=/usr/local/ --enable-shellcmd --enable-sse2 --enable-asm-redc

Test system: Xeon E5620 2.40GHz Centos 5.5 x86_64 on 2.6.18 kernel