Tiled memory
root
root@jacob.remcomp.fr
Fri Mar 14 16:38:00 GMT 1997
Discussions in this group are really boring, and limit themselves
to some obscure bugs in bash or so. Let's talk about something else.
Something new for a change.
I am adding MMX support to lcc-win32.
As you may know, the MMX introduces a SIMD parallelism to the x86
architecture. Besides the obvious benefits of 8 bytes memory moves,
and other goodies, this parallelism feature of the new instruction
set will be a challenge for compiler writers.
I will try to introduce the concept of a 'tiled' vector, using a
special datatype. This vectors will be handled in parallel by the
compiler, i.e. if you declare
_tiled int vector1[1024],vector2[1024],vector3[1024];
you will be able to write something like:
vector3 = vector1+vector2;
and the compiler will add those vectors 2 adds in parallel. The
dimensions must be right of course, and be known at compile time.
If you declare:
_tiled short vector1[2048],vector2[2048];
You will add the 16 bits numbers 4 adds in parallel. With byte
operations the number goes to 8 operations in parallel. You will
be able to obtain a vector of bits, comparing two strings 8 bytes
at a time (using a _tiled char).
Another new concept is the saturation operations. Using the
_saturated keyword, adds/substracts, etc will be done using saturation
arithmetic instead of normal wraparound. For instance
_saturated char a = 150,b = 150,c;
c = a + b;
'c' contains now 255 instead of 300-255=45 as it is now.
This operators can be combined of course.
Special variables will allow you to use directly the mmx registers.
_mm0 to _mm7 denote the mmx registers and are 64 bits wide. This
registers, aliased to the FPU registers, are NOT organized as a stack
and can be addressed individually. The datatype can be described in C as:
typedef union {
struct {
int high_32_63;
int low_0_31;
} int32;
struct {
short high_48_63;
short high_32_47
short low_16-31;
short low_0-15;
} int16;
struct {
char high_56_63;
char high_48_55;
char high_40_47;
char high_32_39;
char low_24_31;
char low_16_23:
char low_8_15;
char low_0_7;
};
} _mmxData;
Individual bytes/shorts/ints must be individually addressed to be
able to control the pack/unpack operations.
To come back to parallelism, I will borrow many concepts from the
then famous but now forgotten programming language APL. I will
introduce the vector operations as an extension of the normal operations,
and many of the APL goodies like the inner product, the outer product,
the reduce (+/ operator) etc. For instance:
int sum = +/ vector;
This will add the vector in parallel 2/4/8 elements at a time. The
algorithm should be something like:
_tiled vector[16];
_mmx0 = 0;
_mmx0 += vector[0] + vector[8];
_mmx0 += vector[1] + vector[9];
.....
_mmx0 += vector[7] + vector[15];
To maximize the pipeline effect, we can use:
_mmx0 = _mmx1 = _mmx2 ... = 0;
_mmx0 += vector[0] + vector[8];
_mmx1 += vector[1] + vector[9];
...
etc.
The 8 mmx registers are then added together in _mmx0 at the
end of the operation. This will allow a theoretical 8 stage
pipeline.
Similar to the reduce operator we have the +\ (expand)
operator.
Suppose we have
_tiled vector1[] = { 1 2 3 4 5 };
vector1 = +\vector2;
gives:
1 3 6 10 15
(0+1) (0+1+2) (0+1+2+3) (0+1+2+3+4) (0+1+2+3+4+5)
---------------------------------------------------------------
Well, I will stop here, I am wasting bandwidth, that would be
better used discussing /groff/termcap/vi/bash/ls/less/old.
P.S. I still see mail about 'less'. It still exists somehow, even
termcap, even if there are no terminals around for ages...
What is 'less'?
Its goal is to display a text file isn't it?
Imagine this:
Several years ago, Xerox (who else) researchers published the
results of playing with a graphical control to display text that
presented the text to the user as a ROLL. You rolled text slowly
into view. The eye has been trained by an evolution of millions
of years to see the objects in 3 dimensions, so this text that
rolled from the back left of the screen to the center and again
to the right gave the eye cues that eased the recognition of text.
A control that does that would be easy to write using the graphic
3D libraries that are everywhere...
Yes but how about the termcap file for that??? :-)
Have fun guys, and stop bashing bash!
--
Jacob Navia Logiciels/Informatique
41 rue Maurice Ravel Tel 01 48.23.51.44
93430 Villetaneuse Fax 01 48.23.95.39
France
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".
More information about the Cygwin
mailing list