This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: fast additive copy method

From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
To: libc-help at sourceware dot org
Date: Tue, 12 Aug 2014 10:47:28 -0300
Subject: Re: fast additive copy method
Authentication-results: sourceware.org; auth=none
References: <1407703436 dot 8772 dot 15 dot camel at localhost dot localdomain> <c263f13528734a6d9b3c4c1051434f47 at BN1PR05MB262 dot namprd05 dot prod dot outlook dot com> <1407799611 dot 4848 dot 9 dot camel at debian>

On 11-08-2014 20:26, Joël Krähemann wrote:
> On Mon, 2014-08-11 at 11:04 +0000, Kilian, Jens wrote:
>>> -----Original Message-----
>>> From: Joël Krähemann [mailto:weedlight@gmail.com]
>>> Sent: Sunday, 10 August, 2014 22:44
>>> To: Carlos O'Donell
>>> Cc: libc-help@sourceware.org
>>> Subject: Re: fast additive copy method
>> [...]
>>
>>> Hi, I'm doing a soft synth therefore in RAM is copied audio buffers in a
>>> repining way. The function ags_audio_signal_copy_buffer_to_buffer()
>>> should be optimized.
>> First, you seem to be adding (short) ints with wraparound (0x7fff + 1 -> -0x8000).  For audio signals a saturating addition (0x7fff + 1 -> 0x7fff) may be more appropriate.
>> Second, you want to look into whether your compiler supports vectorized operations, aka. MMX/SSE/etc.; either via autovectoring or special intrinsic functions (which are less portable).
>>
>> Hope this helps,
>>
>> 	Jens.
> I'm using gcc. What file do I need to include in order to get __m128i
> type on debian GNU/Linux?
>
>   signed short s1[64] __attribute__((aligned(128)));
>   signed short s2[64] __attribute__((aligned(128)));
>
>   size = (guint) ceil((float) size / 64.0);
>
>   for(; 0 < size; size--){
>     __m128i *a;
>     __m128i *b;
>     signed short *offset;
>     guint i;
>
>     offset = destination;
>
>     for(i = 64; i > 0; i--){
>       *s1++ = *destination;
>       destination += dchannels;
>     }
>
>     for(i = 64; i > 0; i--){
>       *s2++ = *source;
>       source += schannels;
>     }
>
>     a = _mm_load_si128((__m128i *) s1);
>     b = _mm_load_si128((__m128i *) s2);
>
>     _mm_store_si128(s1, _mm_adds_epu16(a, b));
>     destination = offset;
>
>     for(i = 64; i > 0; i--){
>       destination = *s1++;
>       destination += dchannels;
>     }
>   }
>
>
Try '#include <xmmintrin.h>'.

References:
- Re: fast additive copy method
  - From: Joël Krähemann
- RE: fast additive copy method
  - From: Kilian, Jens
- Re: fast additive copy method
  - From: Joël Krähemann

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]