This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

default alignments restricted to 32 bits?

I've been experimenting in the hope of clearing up some of the confusion 
about alignments.  I'm running the  cygwin 19990819 snapshot, but I don't 
think that makes a difference.  Here are my observations:

Cygwin does not provide a way to get 64-bit data alignment, except by Fortran 
COMMON blocks or the equivalent.  Some commercial compilers share this 
"feature."  I believe some say that the Microsoft ABI forbids any attempt to 
improve on this, although this presents a significant performance problem.  
128-bit alignment is required to get satisfactory performance with C long 
double, but proposals to support any such storage in g77 have been shot down 
for now.

The combination of gcc-2.95.1 and cygwin binutils did not configure code 
alignments properly.  Installing a recent binutils snapshot on cygwin and 
re-configuring 2.95.1 produces the expected .p2align 4,,7 scheme in the code 
generated by 2.95.1, but generates .align 16 with 2.96.  Either is an 
improvement over the code I obtained with the cygwin binutils.  However, 
since cygwin does not support 128-bit alignments, neither takes full effect 
at link time.  On my test cases, there is a net performance deficit of 5% 
associated with improper code alignments, using the p2align 4,,7 code and 
comparing with its performance on the same P II under linux.  Individual 
loops are affected by up to 30%, but some of what can be gained by changing 
alignment in one place is nearly always lost somewhere else.  The effect 
seems not so large when running on a P III Xeon, but the Xeon box doesn't 
have a decent timer for linux. I use the QueryPerformance..() in NT as I 
normally do in W95, as clock() is not useful on that box.  The results I get 
under NT, W2K, and W95 are consistent, given that all possible background 
processes have been shut off.

There is one aspect to .p2align which has been acknowledged as a bug in gcc, 
which is that the p2align instruction is not placed at the top of the loop 
body for those loops which have 1 to 4 stack adjustment instructions above 
the point where the loop is entered the first time.  This produces a 
significant performance hit when it causes a loop to occupy an extra cache 
line.  I have corrected this by editing the .s in each case, in order to be 
able to isolate the differences leading to the conclusions I have stated 

I have also supplied some of my own math functions in order to eliminate 
differences caused by the different libraries in cygwin (newlib) and linux 
(glibc-2.1).  To me, this is somewhat of a sore point, that all the common 
libraries continue to carry various deficiencies in the math functions 
(mainly performance problems as far as newlib is concerned).  I note that 
certain commercial compilers provide their own math libraries, not because 
theirs are better (they aren't necessarily), but because they isolate them 
from changes in operating environment.

Is there any possibility of cygwin addressing the problems associated with 
lack of 64- or 128-bit alignment, or is this simply one of the performance 
deficits we must accept?


Want to unsubscribe from this list?
Send a message to

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]