This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Malloc improvements


Hi DJ,

> I have a trace here that's 360 Gb, which is 24 Gb after conversion.
> 
> Although I am switching to a binary raw file, with a separate utility
> for converting it to ASCII.  The size and time considerations were
> significant.

Looks good. Is the plan to ship around the *.out or *.wl files? I
reran a trace of omnetpp from SPECint2006 with the new binary format:

23G	mtrace.out.77514
842M	mtrace.out.77514.xz

3.2G	test.wl
841M	test.wl.xz

There isn't much difference after compression, but it took ages to
compress the *.out file. Not surprisingly the *.wl file compressed much
faster.

> It would be interesting to rerun that with my new converter (in case
> the old one is overly pessimistic about synchronizing), but in
> general, every time a pointer passes "ownership" from one thread to
> another, the simulator puts in a set of calls to synchronize the two
> threads (the sync_w and sync_r commands in trace_run.c).  If you can
> come up with a faster way of doing it, or a way to reduce the number
> of times it's needed, I'm all ears, but I'm not that worried about it
> - the purpose of the simulator is to capture the application's
> malloc/free pattern "good enough" to benchmark the glibc calls in a
> way that "represents" the application's needs.  In the future, we'll
> be able to make performance changes to malloc's code with a good
> understanding of how it impacts a wide range of applications.

I was thinking about single threaded traces, perhaps we could avoid all
the locking in that case. My tests show avoiding the locking is about 4x
faster on the omnetpp trace on POWER8.

As well as the locking, the memory initialisation loops were showing
up in profiles. Is there a reason for encoding the offset in
free_wipe()? If not we can just use memset() which is much faster.

Anton
--

When initialising memory use memset() instead of an open coded loop.

diff --git a/malloc/trace2wl.cc b/malloc/trace2wl.cc
index aa53fb3..f3d60b5 100644
--- a/malloc/trace2wl.cc
+++ b/malloc/trace2wl.cc
@@ -156,13 +184,11 @@ static void
 wmem (volatile void *ptr, int count)
 {
   char *p = (char *)ptr;
-  int i;
 
   if (!p)
     return;
 
-  for (i=0; i<count; i++)
-    p[i] = 0x11;
+  memset(p, 0x11, count);
 }
 #define xwmem(a,b)
 
@@ -186,14 +212,7 @@ static void free_wipe (size_t idx)
   if (cp == NULL)
     return;
   size_t sz = sizes[idx];
-  size_t i;
-  for (i=0; i<sz; i++)
-    {
-      if (i % 8 == 1)
-	cp[i] = i / 8;
-      else
-	cp[i] = 0x22;
-    }
+  memset(cp, 0x22, sz);
 }
 
 static void *



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]