This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
- From: Florian Weimer <fweimer at redhat dot com>
- To: Roland McGrath <roland at hack dot frob dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 12 Apr 2016 16:01:00 +0200
- Subject: Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
- Authentication-results: sourceware.org; auth=none
- References: <57068276 dot 9000503 at redhat dot com> <20160408184422 dot A74BE2C3B21 at topped-with-meat dot com> <570BD76D dot 5070907 at redhat dot com> <20160411211902 dot 8E34F2C3B22 at topped-with-meat dot com>
On 04/11/2016 11:19 PM, Roland McGrath wrote:
I'm trying to get clarification if getdents on Linux can ever return
zero d_ino values. It's going to be difficult to work this out due to
the VFS layer. My hunch is that inode 0 is more likely to mean I can't
tell you the inode number right now here , and not please skip this entry .
I don't know where your hunch comes from, but I don't believe it for a
second. The d_ino==0 convention has been universal in Unix since before
Linux existed. Any filesystem that returns d_ino==0 entries for getdents
will have those entries completely ignored by userland, so I doubt anyone
has implemented a filesystem that way.
Non-GNU userland would work, so it could be a file system restricted to
embedded applications.
+An implementation of @code{gl_readdir} should initialize the
+@code{struct dirent} object to zero, up to the @code{d_name} member.
+The @code{d_ino} member must be set to a non-zero value, otherwise
+@code{glob} may ignore the returned directory entry.
This wording is awkward. It seems to contradict itself, as it first
implies that d_ino should be zero and then says it must be nonzero. I'm
not convinced this "zero the object" advice is the way to go anyway. If
there is struct padding in struct dirent, it doesn't matter that it be
zero. If it has d_namlen, it (now) doesn't matter what it contains at all.
Perhaps it's better to say just which fields glob examines and that the
protocol requires only that those fields be set.
Regardless of what we settle on as best practice, I think we want a little
code example here that demonstrates it.
Right. I realized that d_type and d_ino are unconditionally available
in glibc, so I got rid of the memset and spelled out all relevant
members explicitly. I added the mkdirent example function in the new
version of the patch.
@code{glob} examines the @code{struct dirent} object only immediately after
each call to your @code{gl_readdir} callback function and does not store
the pointer returned. That pointer is not expected to remain valid after a
subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback. A
common way to implement these callbacks it to reuse the same buffer each
time the @code{gl_readdir} function is called and free the buffer in the
@code{gl_closedir} function.
I expanded a bit on that in the new version of the patch, too. I hope
the table nesting isn't too awkward. To me, it looks okay in Info, HTML
and PDF.
Florian
glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
Previously, application code had to set up the d_namlen member if
the target supported it, involving conditional compilation. After
this change, glob will use the length of the string in d_name instead
of d_namlen to determine the file name length. All glibc targets
provide the d_type and d_ino members, and setting the as needed for
gl_readdir is straightforward.
Changing the behavior with regards to d_ino is left to a future
cleanup.
2016-04-12 Florian Weimer <fweimer@redhat.com>
glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
callback function gl_readdir.
* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
(glob_in_dir): Remove len. Use strdup instead of malloc and
memcpy to copy the name.
* manual/pattern.texi (Calling Glob): Document requirements for
implementations of the gl_readdir callback function.
* manual/examples/mkdirent.c: New example.
* posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally,
per the manual guidance.
* posix/tst-gnuglob.c (my_readdir): Likewise.
diff --git a/manual/examples/mkdirent.c b/manual/examples/mkdirent.c
new file mode 100644
index 0000000..eea794d
--- /dev/null
+++ b/manual/examples/mkdirent.c
@@ -0,0 +1,42 @@
+/* Example for creating a struct dirent object for use with glob.
+ Copyright (C) 2016 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License
+ as published by the Free Software Foundation; either version 2
+ of the License, or (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <dirent.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+
+struct dirent *
+mkdirent (const char *name)
+{
+ size_t dirent_size = offsetof (struct dirent, d_name) + 1;
+ size_t name_length = strlen (name);
+ size_t total_size = dirent_size + name_length;
+ if (total_size < dirent_size)
+ {
+ errno = ENOMEM;
+ return NULL;
+ }
+ struct dirent *result = malloc (total_size);
+ if (result == NULL)
+ return NULL;
+ result->d_type = DT_UNKNOWN;
+ result->d_ino = 1; /* Do not skip this entry. */
+ memcpy (result->d_name, name, name_length + 1);
+ return result;
+}
diff --git a/manual/pattern.texi b/manual/pattern.texi
index d1b9275..565e7eb 100644
--- a/manual/pattern.texi
+++ b/manual/pattern.texi
@@ -237,7 +237,44 @@ function used to read the contents of a directory. It is used if the
@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of
this field is @w{@code{struct dirent *(*) (void *)}}.
-This is a GNU extension.
+An implementation of @code{gl_readdir} needs to initialize the following
+members of the @code{struct dirent} object:
+
+@table @code
+@item d_type
+This member should be set to the file type of the entry if it is known.
+Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob}
+function may use the specified file type to avoid callbacks in cases
+where the file type indicates that the data is not required.
+
+@item d_ino
+This member needs to be non-zero, otherwise @code{glob} may skip the
+current entry and call the @code{gl_readdir} callback function again to
+retrieve another entry.
+
+@item d_name
+This member must be set to the name of the entry. It must be
+null-terminated.
+@end table
+
+The example below shows how to allocate a @code{struct dirent} object
+containing a given name.
+
+@smallexample
+@include mkdirent.c.texi
+@end smallexample
+
+The @code{glob} function reads the @code{struct dirent} members listed
+above and makes a copy of the file name in the @code{d_name} member
+immediately after the @code{gl_readdir} callback function returns.
+Future invocations of any of the callback functions may dealloacte or
+reuse the buffer. It is the responsibility of the caller of the
+@code{glob} function to allocate and deallocate the buffer, around the
+call to @code{glob} or using the callback functions. For example, an
+application could allocate the buffer in the @code{gl_readdir} callback
+function, and deallocate it in the @code{gl_closedir} callback function.
+
+The @code{gl_readdir} member is a GNU extension.
@item gl_opendir
The address of an alternative implementation of the @code{opendir}
diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..0fdc5d0 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -193,7 +193,7 @@ my_readdir (void *gdir)
return NULL;
}
- dir->d.d_ino = dir->idx;
+ dir->d.d_ino = 1; /* glob should not skip this entry. */
#ifdef _DIRENT_HAVE_D_TYPE
dir->d.d_type = filesystem[dir->idx].type;
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..9ae76ac 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -57,10 +57,8 @@
#if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
# include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
#else
# define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
# ifdef HAVE_SYS_NDIR_H
# include <sys/ndir.h>
# endif
@@ -76,12 +74,6 @@
#endif
-/* In GNU systems, <dirent.h> defines this macro for us. */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
/* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
if the `d_type' member for `struct dirent' is available.
HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB. */
@@ -105,12 +97,6 @@
/* If the system has the `struct dirent64' type we use it internally. */
#if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-# define CONVERT_D_NAMLEN(d64, d32)
-# else
-# define CONVERT_D_NAMLEN(d64, d32) \
- (d64)->d_namlen = (d32)->d_namlen;
-# endif
# if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
# define CONVERT_D_INO(d64, d32)
@@ -127,8 +113,7 @@
# endif
# define CONVERT_DIRENT_DIRENT64(d64, d32) \
- memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1); \
- CONVERT_D_NAMLEN (d64, d32) \
+ strcpy ((d64)->d_name, (d32)->d_name); \
CONVERT_D_INO (d64, d32) \
CONVERT_D_TYPE (d64, d32)
#endif
@@ -1554,7 +1539,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
while (1)
{
const char *name;
- size_t len;
#if defined _LIBC && !defined COMPILE_GLOB64
struct dirent64 *d;
union
@@ -1622,12 +1606,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
names = newnames;
cur = 0;
}
- len = NAMLEN (d);
- names->name[cur] = (char *) malloc (len + 1);
+ names->name[cur] = strdup (d->d_name);
if (names->name[cur] == NULL)
goto memory_error;
- *((char *) mempcpy (names->name[cur++], name, len))
- = '\0';
+ ++cur;
++nfound;
}
}
diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c
index 992b997..b7b859b 100644
--- a/posix/tst-gnuglob.c
+++ b/posix/tst-gnuglob.c
@@ -211,7 +211,7 @@ my_readdir (void *gdir)
return NULL;
}
- dir->d.d_ino = dir->idx;
+ dir->d.d_ino = 1; /* glob should not skip this entry. */
#ifdef _DIRENT_HAVE_D_TYPE
dir->d.d_type = filesystem[dir->idx].type;