This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Corinna: (I apologize that this got so long, but strptime() is a bit messy.) I've taken a look at this, and have several thoughts. The biggest is that the entire strptime really needs to be re-written as it comes nowhere close to doing all that is called for. (It does not check ranges, it does not support the field widths for %C or %Y, it does not restrict the lengths read, the E and O modifiers are lacking given that locale support has generally been added, etc.) It does, however, provide full basic functionality. That major comment aside, as I don't wish to do major surgery, here are some more practical thoughts. 1) POSIX (2008) says "It is unspecified whether multiple calls to strptime() using the same tm structure will update the current contents of the structure or overwrite all contents of the structure. Conforming applications should make a single call to strptime() with a format and all data needed to completely specify the date and time being converted." Given that applications are required to give a single call with all data needed, and making an assumption that we don't at this time want to go so far as to check for and enforce it, I propose that the whole struct tm be initialized at the start of the call. It is permitted. Junk should not pass through. That is, if a user wants strptime() to do part of the work and they'll edit a little afterwards, fine--but they must edit the struct tm afterwards and not before. (That is, the example given should not be able to influence the results. Especially since automatic variables are not initialized and can--and do--have random trash in them.) The only drawback to this is that it could change behavior slightly--possibly breaking applications that are not using it in a proper manner (as defined by the standard's usage guidelines). 2) The first_day() function has another issue other than maybe just taking a long time to run. It assumes that the year is not before 1970, and gives incorrect results if it is. POSIX explicitly speaks of %y being able to specify 1969, so that, at least, should come out right. Trying to make it completely general is problematic due to 1752, or 1753, or one of the other many years that different countries went from Julian to Gregorian. I propose that it refuse to work before a certain year to avoid the mess. The two that come to mind are 1900 (tm_year value of 0), and 1753. Since England went Gregorian in 1752 and September was robbed of 11 days, 1753 is the first "safe" year. England was almost the last to adopt, thus this covers most countries. I chose 1753 in the edit that I've made. (Trivia: I just noticed that this was written orignally in Sweden. They made the change 1 year after England, so we could use 1754 in their honor, I suppose.) 3) Yes, present behavior is incorrect, and U, V, and W processing need to be delayed as you suggest. But this introduces a potential complication. This is because U, W, and V should be mutually exclusive. The present implementation lets them override each other, with the last one in the format taking precedence. I preserved that behavior. (I was tempted to error if more than 1 were set, but POSIX doesn't comment on that. It is explicit about scanning the format from left to right.) 4) Code has some non-POSIX formats: k, l, V, and Z. (At least, not in 2008, and not mentioned in the notes as having been removed.) k is an alias for H, l is an alias for I, V is similar to U and W, and Z is ignored. Personally, I'd be inclined to remove all of them. Thoughts? 5) The %y specifier does not account for POSIX qualifying the assumed century with the clause "When format contains neither a C conversion specifier nor a Y conversion specifier". (This would be a bit messy to do, so I passed for now. I did put a "FIXME" note in.) 6) strptime() manpage information is missing. Attached is a patch that addresses the first 3 items. It addresses #2 and #3 by re-writing first_day()--which also addresses the possible long run-time observation. (It is valid for the user to give 0x6FFFFFFF as a year (via %Y), actually, even if it makes no sense. But at least now it will take the same amount of time as a sane year.) #1 was addressed by zeroing the whole struct tm before any other processing, except for tm_isdst, which is set to -1 (indicating that DST status is unknown). If there is general agreement about deleting the non-standard formats mentioned in #4, I can either amend the patch or make another one. I have tested the new first_day() function but not the other changes. Craig -----Original Message----- From: newlib-owner@sourceware.org [mailto:newlib-owner@sourceware.org] On Behalf Of Corinna Vinschen Sent: Monday, February 14, 2011 8:06 AM To: newlib@sourceware.org Subject: Re: strptime() could hangs for hours On Feb 14 11:41, Aleksandr Platonov wrote: > Hi, > strptime() function could hangs if format string contains characters > which set date by week number (U,W,V). > Example: > struct tm tm; > tm.tm_year = 0x6FFFFFFF; > strptime("52", "%U", &tm); > > This happens because first_day() function call hangs for hours if big > year value is passed to it. > We could not assume that tm_year field of tm structure has correct value > at processing U,V,W characters in strptime(), so first_day() function > parameter could have any value. > In BSD libc characters U,V and W are ignored (only range check is > performed). > Is this behaviour of strptime() in newlib correct? No, it's not. Probably the right thing to do would be to store the information that U,V,W is present together with the given week number, and to calculate the information when the string has been completely scanned and we only have a valid year, but no day or month. Or, as on BSD, ignore the input and just check validity. Does anybody want to provide a patch? Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Attachment:
ChangeLog.txt
Description: ChangeLog.txt
Attachment:
strptime.patch.txt
Description: strptime.patch.txt
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |