Bogus assumption prevents d2u/u2d/conv/etal working on mixed files.

David Fritz zeroxdf@att.net
Mon Apr 5 04:02:00 GMT 2004


You guys are missing the point.  Charles Wilson mentioned a side effect of the 
code at issue in the original post and suggested that it was valuable.

Personally, I don't care if they attempt to detect binary files or not.  My 
point was (and is) that: *If detection of binary files is desirable*, then why 
not implement it in a more robust manner and inform the user rather than 
silently skipping "binary" files.


Hannu E K Nevalainen wrote:

>>From: David Fritz
>>Sent: Sunday, April 04, 2004 6:46 AM
> 
> 
>>Charles Wilson wrote:
>>[...]
>>
>>>  (2) it's an attempt to prevent users from permanently
>>
>>scrogging binary
>>
>>>files.  See: d2u, on a binary file, is an irreversible operation.  So,
>>>if you do "d2u *" you'll probably kill something deep inside
>>
>>some binary
>>
>>>file, and you can't fix it -- unless some minimal safeguards
>>
>>are in place.
>>
>>>  u2d MAY be reversible -- IF there were no pre-exising \r\n
>>>combinations in the file to begin with -- so when (OMG-fixit-)d2u is
>>>run, obviously the first '\n' is preceeded by a (newly-added)
>>
>>'\r\n', so
>>
>>>the prog merrily replaces ALL '\r\n' with '\n'...which MAY fix your
>>>oops, but maybe not.
>>>
>>>
>>>So, with the current code, if you snarf the first "line" -- all chars
>>>until the first '\n' -- if it's a binary file the odds are pretty low
>>>that the immediately-preceeding character is a '\r' -- so d2u as
>>>currently coded will bail out, and no harm is done.
>>>
>>>It doesn't work so well in the other direction -- by the same logic
>>>above, you'll almost never bail out early if you run 'u2d' on a binary
>>>file -- but if you immediately do a 'd2u' you MIGHT be able to recover.)
>>>
>>
>>[...]
>>
>>If detection of binary files is desirable, why not use an
>>explicit test with a
>>more robust methodology?  GNU grep detects binary files by
>>looking for a '\0'
>>byte.  Such a test could be used by both d2u and u2d; they could
>>bail out with a
>>message like "skipping binary file".
>>
>>Cheers
> 
> 
> A more "foolproof" (? does such a thing exist) test would be to disallow
> using d2u/u2d on anything in directories found in $PATH. But then that one
> has its disadvantages too, but less so IMO.
> 
>  I find all this "safety" related stuff be a PITA at times. Any kind of test
> is prone to fail at some instances; at other instances just a cause for
> confusion most of the time -> a lot of bug-hunting - for so little gain.
> 
>  How about running d2u/u2d, say, on a regedit 5 file (ie; mostly ascii but
> due to the coding every other character is a NUL)?
>  Would that be considered "legal"? IMO it should, a fast and easy way to
> strip the garbage - to create a file that can be used with normal tools.
> 

Huh?  u2d/d2u will not strip the "garbage".  For that use iconv; as in,

$ iconv -f UTF-16LE -t UTF-8 < in > out


>  IMO; stay away from all of this safety thingies, at _LEAST_ allow them to
> be bystepped; e.g. --force. I will be using that switch all the time.
> 
>  There are a lot of these foolhardy "traps" one can fall into; e.g:
> $ cd /;rm -rf *
> are you gonna find a "safety" hatch for that too?
> 
> 
>  Noo... Please, remove all of these safety checks.
> There must be some kind of user sanity presupposition. Or else the tools
> soon will be crippled to a state where they are unusable for normal work.
> 
>  Make Backups, Not War!  -> MBNW!  ;-P
> 

OLOCA?

[...]

Cheers


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list