This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: fast additive copy method
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-help at sourceware dot org
- Date: Tue, 12 Aug 2014 10:47:28 -0300
- Subject: Re: fast additive copy method
- Authentication-results: sourceware.org; auth=none
- References: <1407703436 dot 8772 dot 15 dot camel at localhost dot localdomain> <c263f13528734a6d9b3c4c1051434f47 at BN1PR05MB262 dot namprd05 dot prod dot outlook dot com> <1407799611 dot 4848 dot 9 dot camel at debian>
On 11-08-2014 20:26, Joël Krähemann wrote:
> On Mon, 2014-08-11 at 11:04 +0000, Kilian, Jens wrote:
>>> -----Original Message-----
>>> From: Joël Krähemann [mailto:weedlight@gmail.com]
>>> Sent: Sunday, 10 August, 2014 22:44
>>> To: Carlos O'Donell
>>> Cc: libc-help@sourceware.org
>>> Subject: Re: fast additive copy method
>> [...]
>>
>>> Hi, I'm doing a soft synth therefore in RAM is copied audio buffers in a
>>> repining way. The function ags_audio_signal_copy_buffer_to_buffer()
>>> should be optimized.
>> First, you seem to be adding (short) ints with wraparound (0x7fff + 1 -> -0x8000). For audio signals a saturating addition (0x7fff + 1 -> 0x7fff) may be more appropriate.
>> Second, you want to look into whether your compiler supports vectorized operations, aka. MMX/SSE/etc.; either via autovectoring or special intrinsic functions (which are less portable).
>>
>> Hope this helps,
>>
>> Jens.
> I'm using gcc. What file do I need to include in order to get __m128i
> type on debian GNU/Linux?
>
> signed short s1[64] __attribute__((aligned(128)));
> signed short s2[64] __attribute__((aligned(128)));
>
> size = (guint) ceil((float) size / 64.0);
>
> for(; 0 < size; size--){
> __m128i *a;
> __m128i *b;
> signed short *offset;
> guint i;
>
> offset = destination;
>
> for(i = 64; i > 0; i--){
> *s1++ = *destination;
> destination += dchannels;
> }
>
> for(i = 64; i > 0; i--){
> *s2++ = *source;
> source += schannels;
> }
>
> a = _mm_load_si128((__m128i *) s1);
> b = _mm_load_si128((__m128i *) s2);
>
> _mm_store_si128(s1, _mm_adds_epu16(a, b));
> destination = offset;
>
> for(i = 64; i > 0; i--){
> destination = *s1++;
> destination += dchannels;
> }
> }
>
>
Try '#include <xmmintrin.h>'.