develooper Front page | perl.perl5.porters | Postings from August 2008

Re: Near-FMTEYEWTK instructorial on ties, handles, and methods (was: How to tell whether readline got an error or EOF)

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
August 3, 2008 10:16
Subject:
Re: Near-FMTEYEWTK instructorial on ties, handles, and methods (was: How to tell whether readline got an error or EOF)
Message ID:
32526.1217783756@chthon
>Tom Christiansen <tchrist <at> perl.com> writes:

>> For my part, I in implementing a class that I expected folks to call
>> tie *GLOB .... and hence invoke TIEHANDLE on, for that I would
>> specifically document precisely *which* operations were supported.

> I will do that - but it leaves the question of which operations
> *should* be supported.  If you write a tied filehandle class, what is
> the reasonable minimum interface that it should provide to be worthy
> of the name?  I had assumed I just need to implement every filehandle
> method in perltie(1), but it appears that is neither necessary nor
> sufficient to give something that acts reasonably close to a real
> filehandle.

The perltie manpage states, with white space liberally inserted to 
clarify what it's saying and numbers at the left for it 3 pieces:

    A class implementing a tied filehandle should define the
    following methods:  

(1)                     TIEHANDLE, 

(2)                     at least one of PRINT, 
                                        PRINTF, 
                                        WRITE, 
                                        READLINE, 
                                        GETC, 
                                        READ, 

(3)                     and possibly    CLOSE,
                                        UNTIE[,]
                                    and DESTROY.

That means a minimum tie handle class can have just two methods;
TIEHANDLE and one of the listed I/O functions.  Notice that my 
off-the-cuff Tie::Randy class did that.

>> Anything beyond that is asking for... something beyond that.

> If I understand you correctly, you are saying that there is no way in
> general to tell whether getc() got EOF or an error.  You can find out
> if the filehandle is a real one, but a tied filehandle is under no
> obligation to provide this interface.

I believe that to be correct.  You do realize, don't you, how one 
really determines eof()?  It's not a syscall; or rather, not 
one by that name.  As perlfunc reports of eof:

    Returns 1 if the next read on FILEHANDLE will return end of
    file, or if FILEHANDLE is not open.  FILEHANDLE may be an
    expression whose value gives the real filehandle.  (Note that
    this function actually reads a character and then C<ungetc>s
    it, so isn't very useful in an interactive context.)  Do not
    read from a terminal file (or call C<eof(FILEHANDLE)> on it)
    after end-of-file is reached.  File types such as terminals may
    lose the end-of-file condition if you do.

On many types of streams, including at the very least pipes, sockets, and
when you're trailing a growing logfile, the *only* way to know whether you
can get a character from that stream is to try to do so.  So eof() must
make the attempt, record whether it got something, and then push back the
char on to the stream it just got it from.  Of course, buffering is nearly
always an issue here, too.  The C stdio docs for ungetc() read:

         The ungetc() function pushes the character c (converted to an
         unsigned char) back onto the input stream pointed to by stream.
         The pushed-backed characters will be returned by subsequent reads
         on the stream (in reverse order).  A successful intervening call,
         using the same stream, to one of the file positioning functions
         (fseek(3), fsetpos(3), or rewind(3)) will discard the pushed back
         characters.

         One character of push-back is guaranteed, but as long as there is
         sufficient memory, an effectively infinite amount of pushback is
         allowed.

         If a character is successfully pushed-back, the end-of-file
         indicator for the stream is cleared.

    RETURN VALUES
         The ungetc() function returns the character pushed-back
         after the conversion, or EOF if the operation fails.  If
         the value of the argument c character equals EOF, the
         operation will fail and the stream will remain unchanged.

Curiously, Perl has a getc builtin, but no ungetc (despite its mention of
the same in the eof() description, probably therefore referring to the C
function).  It is, however, supported as a method on an IO::Handle.

       $io->ungetc ( ORD )
           Pushes a character with the given ordinal value back
           onto the given handle's input stream.  Only one
           character of pushback per handle is guaranteed.

You'll also find ungetc mentioned in perlclib and perlapio.

Here's something curious:

    % perl -MIO::Handle -E 'open(FH, "/dev/null")||die; FH->ungetc(65); while ($ch = getc(FH)) { say "got $ch, ord ", ord($ch) }'
    got A, ord 65

    % perl -MIO::Handle -E 'open(FH, "/dev/null")||die; FH->ungetc(660); while ($ch = getc(FH)) { say "got $ch, ord ", ord($ch) }'
    got , ord 148

    % perl -MIO::Handle -E 'open(FH, "/dev/null")||die; binmode(FH,":utf8")||die;FH->ungetc(660); while ($ch = getc(FH)) { say "got $ch, ord ", ord($ch) }'
    Malformed UTF-8 character (unexpected continuation byte 0x94, with no preceding start byte) in ord at -e line 1.
    Wide character in print at -e line 1.
    got , ord 0

    % perl -MIO::Handle -E 'open(FH, "/dev/null")||die; binmode(FH,":utf8")||die;FH->ungetc(65); while ($ch = getc(FH)) { say "got $ch, ord ", ord($ch) }'
    got A, ord 65

    % perl -MIO::Handle -E 'open(FH, "/dev/null")||die; binmode(FH,":utf8")||die;FH->ungetc(300); while ($ch = getc(FH)) { say "got $ch, ord ", ord($ch) }'
    got ,, ord 44

Hm... :-)

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About