This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] manual: Update alloca and variable length array documentation


I noticed this while reviewing Paul Eggert's string function changes.

The restriction about using alloca from parameter lists has been gone
since basically forever (there is no distinction at the GIMPLE level in
GCC because of the way SSA works).  I'm not sure if the malloc-based
alloca emulation was ever part of glibc, and certainly it would not have
worked for some uses of alloca inside the dynamic linker.
2016-01-04  Florian Weimer  <fweimer@redhat.com>

	* manual/memory.texi (Variable Size Automatic): Document
	interaction between alloca and variable length arrays.  Mention
	function inlining.  Remove obsolete warning about alloca in
	function parameter lists.
	(Advantages of Alloca): Note that alloca is async-signal-safe.
	Mention C++ exceptions and lack of length checking in open2.
	(Disadvantages of Alloca): Clarify consequences of the lack of
	error checking.  Do no mention the non-existing alloca emulation.
	(GNU C Variable-Size Arrays): Switch terminology from GNU C
	variable-sized arrays to ISO C varliable length arrays.  Mention
	security aspect and aliasing violations.  Clarify loop behavior.
	Remove NB, now part of the alloca documentation.

	* manual/string.texi (Copying Strings and Arrays): Add warning
	about alloca and length checking to strdupa.  Drop restriction to
	GNU CC.
	(Truncating Strings): Add warning to strndupa.  Drop restriction
	to GNU CC.

diff --git a/manual/memory.texi b/manual/memory.texi
index 700555e..9dbd95f 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -2745,10 +2745,24 @@ The function @code{alloca} supports a kind of half-dynamic allocation in
 which blocks are allocated dynamically but freed automatically.
 
 Allocating a block with @code{alloca} is an explicit action; you can
-allocate as many blocks as you wish, and compute the size at run time.  But
-all the blocks are freed when you exit the function that @code{alloca} was
-called from, just as if they were automatic variables declared in that
-function.  There is no way to free the space explicitly.
+allocate as many blocks as you wish, and compute the size at run time.
+Memory allocated this way is freed automatically, at some point after
+the scope which contains the @code{alloca} call is left:
+
+@itemize @bullet
+@item
+@cindex variable length arrays
+If the scope calling @code{alloca} contains an variable length array, or
+is nested in such a scope, then the object allocated with @code{alloca}
+is deallocated when the closest enclosing scope which defines a
+variable length array is left.
+
+@item
+If no enclosing scope with a variable length array exist, the allocated
+object is deallocated when the function is exited, either normally or
+abnormally (for example, by throwing a C++ exception).  The life time of
+such objects is not extended by function inlining.
+@end itemize
 
 The prototype for @code{alloca} is in @file{stdlib.h}.  This function is
 a BSD extension.
@@ -2762,21 +2776,11 @@ The return value of @code{alloca} is the address of a block of @var{size}
 bytes of memory, allocated in the stack frame of the calling function.
 @end deftypefun
 
-Do not use @code{alloca} inside the arguments of a function call---you
-will get unpredictable results, because the stack space for the
-@code{alloca} would appear on the stack in the middle of the space for
-the function arguments.  An example of what to avoid is @code{foo (x,
-alloca (4), y)}.
-@c This might get fixed in future versions of GCC, but that won't make
-@c it safe with compilers generally.
-
 @menu
 * Alloca Example::              Example of using @code{alloca}.
 * Advantages of Alloca::        Reasons to use @code{alloca}.
 * Disadvantages of Alloca::     Reasons to avoid @code{alloca}.
-* GNU C Variable-Size Arrays::  Only in GNU C, here is an alternative
-				 method of allocating dynamically and
-				 freeing automatically.
+* GNU C Variable-Size Arrays::  On-stack dynamic allocation in ISO C.
 @end menu
 
 @node Alloca Example
@@ -2834,6 +2838,14 @@ block, space used for any size block can be reused for any other size.
 @code{alloca} does not cause memory fragmentation.
 
 @item
+@cindex mmap
+The @code{alloca} function can be safely called from a signal handler.
+But signal handlers may run with little stack space available, so
+it is unclear how much memory can be safely allocted with @code{alloca}.
+This means that robust code may have to use @code{mmap} instead.
+@xref{Memory-mapped I/O}.
+
+@item
 @cindex longjmp
 Nonlocal exits done with @code{longjmp} (@pxref{Non-Local Exits})
 automatically free the space allocated with @code{alloca} when they exit
@@ -2865,7 +2877,13 @@ freed even when an error occurs, with no special effort required.
 By contrast, the previous definition of @code{open2} (which uses
 @code{malloc} and @code{free}) would develop a memory leak if it were
 changed in this way.  Even if you are willing to make more changes to
-fix it, there is no easy way to do so.
+fix it, there is no easy way to do so (except to switch to C++ and
+exceptions).
+
+Note that the @code{open2} example with @code{alloca} is incorrect if
+@code{str1} and @code{str2} can be very long strings because
+@code{alloca} does not fail gracefully in case too many bytes are
+requested (see below).
 @end itemize
 
 @node Disadvantages of Alloca
@@ -2879,22 +2897,38 @@ These are the disadvantages of @code{alloca} in comparison with
 @itemize @bullet
 @item
 If you try to allocate more memory than the machine can provide, you
-don't get a clean error message.  Instead you get a fatal signal like
-the one you would get from an infinite recursion; probably a
-segmentation violation (@pxref{Program Error Signals}).
+don't get a clean error message.  Instead, you end up with undefined
+behavior.  In many cases, the program will just crash (which can still
+result in a denial-of-service vulnerability), but sometimes, it is
+possible to abuse an unbounded @code{alloca} to cause other security
+vulnerabilities such as information disclosure or arbitrary code
+execution.
 
 @item
 Some @nongnusystems{} fail to support @code{alloca}, so it is less
-portable.  However, a slower emulation of @code{alloca} written in C
-is available for use on systems with this deficiency.
+portable.
 @end itemize
 
+Due to lack of error checking, security-sensitive code must ensure that
+no large objects are allocated with @code{alloca}.  In general this
+means that the size argument is checked against an arbitrary limit (say,
+4096), and an error is returned if it is exceeded, or fallback to
+@code{malloc} is performed.
+
+Extra care is required when @code{alloca} is called from a function
+called recursively or from within the loop.  In this case, depending on
+the depth of the recursion or the loop iteration count, smaller
+allocation size can exhaust the stack and trigger undefined behavior.
+To a lesser degree, this problem also exists with callback functions.
+
+@c Node name preserved for backwards compatibility; the correct
+@c terminology is ``variable length array''.
 @node GNU C Variable-Size Arrays
-@subsubsection GNU C Variable-Size Arrays
-@cindex variable-sized arrays
+@subsubsection ISO C Variable Length Arrays
+@cindex variable length arrays
 
-In GNU C, you can replace most uses of @code{alloca} with an array of
-variable size.  Here is how @code{open2} would look then:
+In ISO C, you can replace most uses of @code{alloca} with an array of
+variable length.  Here is how @code{open2} would look then:
 
 @smallexample
 int open2 (char *str1, char *str2, int flags, int mode)
@@ -2905,26 +2939,39 @@ int open2 (char *str1, char *str2, int flags, int mode)
 @}
 @end smallexample
 
+Compared to @code{malloc}, variable length arrays share the same
+advantages and disadvantages as @code{alloca}.  In particular, there is
+no error checking (and security vulnerabilities can result from large
+allocation requests), and some @nongnusystems{} do not support variable
+length arrays because they only support earlier versions of ISO C which
+do not include variable length arrays.
+
+The variable length array version of @code{open2}, as shown above, still
+suffers from the same problem as the @code{alloca}-based variant: It
+does not check that the strings are short enough, to avoid undefined
+behavior which are the result of large allocation requests.
+
 But @code{alloca} is not always equivalent to a variable-sized array, for
 several reasons:
 
 @itemize @bullet
 @item
-A variable size array's space is freed at the end of the scope of the
-name of the array.  The space allocated with @code{alloca}
-remains until the end of the function.
+Memory returned by @code{alloca} is untyped.  A variable length array
+has always a specific type (even if it is an array of characters), and
+using it with another type can introduce aliasing violations into the
+program.
 
 @item
-It is possible to use @code{alloca} within a loop, allocating an
-additional block on each iteration.  This is impossible with
-variable-sized arrays.
+A variable length array is deallocated at the end of the scope of the
+name of the array.  The space allocated with @code{alloca} remains until
+the end of the function.
 @end itemize
 
-@strong{NB:} If you mix use of @code{alloca} and variable-sized arrays
-within one function, exiting a scope in which a variable-sized array was
-declared frees all blocks allocated with @code{alloca} during the
-execution of that scope.
-
+The second difference is most pronounced in loops: With @code{alloca},
+the allocated object can be referenced from later iterations and after
+the loop body has been exited.  But a loop with a variable length array
+can execute an arbitrary number of times, without exhausting the
+available stack, as long as the individual arrays are short enough.
 
 @node Resizing the Data Segment
 @section Resizing the Data Segment
diff --git a/manual/string.texi b/manual/string.texi
index 016fd0b..8f7f5d1 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -643,24 +643,17 @@ The behavior of @code{wcpcpy} is undefined if the strings overlap.
 This macro is similar to @code{strdup} but allocates the new string
 using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
 Automatic}).  This means of course the returned string has the same
-limitations as any block of memory allocated using @code{alloca}.
+limitations as any block of memory allocated using @code{alloca}, and
+@code{strdupa} can introduce security vulnerabilities due to the lack of
+failure checking.
 
-For obvious reasons @code{strdupa} is implemented only as a macro;
-you cannot get the address of this function.  Despite this limitation
-it is a useful function.  The following code shows a situation where
-using @code{malloc} would be a lot more expensive.
+For obvious reasons @code{strdupa} is implemented only as a macro; you
+cannot get the address of this function.  The following code shows an
+example of its use:
 
 @smallexample
 @include strdupa.c.texi
 @end smallexample
-
-Please note that calling @code{strtok} using @var{path} directly is
-invalid.  It is also not allowed to call @code{strdupa} in the argument
-list of @code{strtok} since @code{strdupa} uses @code{alloca}
-(@pxref{Variable Size Automatic}) can interfere with the parameter
-passing.
-
-This function is only available if GNU CC is used.
 @end deftypefn
 
 @comment string.h
@@ -958,16 +951,13 @@ processing text.
 This function is similar to @code{strndup} but like @code{strdupa} it
 allocates the new string using @code{alloca} @pxref{Variable Size
 Automatic}.  The same advantages and limitations of @code{strdupa} are
-valid for @code{strndupa}, too.
+valid for @code{strndupa}.  In particular, @code{strndupa} can introduce
+security vulnerabilities due to the lack of error checking.
 
 This function is implemented only as a macro, just like @code{strdupa}.
-Just as @code{strdupa} this macro also must not be used inside the
-parameter list in a function call.
 
 As noted below, this function is generally a poor choice for
 processing text.
-
-@code{strndupa} is only available if GNU CC is used.
 @end deftypefn
 
 @comment string.h
-- 
2.4.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]