develooper Front page | perl.vmsperl | Postings from November 2010

Re: Problem with VMS Carriage return carriage control files in 5.10and 5.12

Thread Next
From:
Craig A. Berry
Date:
November 4, 2010 17:38
Subject:
Re: Problem with VMS Carriage return carriage control files in 5.10and 5.12
Message ID:
5AFA3D8F-8800-4602-B881-46E661008147@mac.com

On Apr 21, 2010, at 5:29 PM, Martin.Zinser@deutsche-boerse.com wrote:

> If you open a text file with Carriage return carriage control for  
> output
> (based off an existing file) and populate the new file with longer
> records, at some point gratuitous
> line breaks are added to the file.

Finally getting back to this after six months.  And I think I have a  
solution.  To review, what happens when you use the Perl "open"  
operator is that it calls into its own buffered I/O layer named  
"perlio" which sits on top of another layer called "unixio" which is  
implemented in terms of the CRTL read/write functions.  This  
arrangement was new in about 5.6 but became the default in 5.10, and  
that's where we started seeing the problem Martin describes on VMS.

The problem is that while the perlio layer is buffered, the unixio  
layer is not.  When the buffer in the perlio layer gets filled up, it  
triggers a flush to the lower layer.  The flush in the perlio layer  
causes a write() in the unixio layer, and when you do that you go all  
the way to disk, and if writing to a record-oriented file, you'll  
likely introduce an extra record boundary in the file unless you had  
the extreme good fortune to hit the end of a line at the same time you  
hit the end of the buffer.  Part of the problem is that the buffer in  
the perlio layer is hard-wired to 4K.  With a larger buffer, you would  
typically not see as many extra records, but you would still see them.

It turns out the perlio layer has some knobs and switches on it, and  
one of them is a "line buffering" option.  If this option is enabled,  
then the flush to the lower layer happens whenever a newline character  
appears in the data.  As long as your lines are shorter than the  
length of the buffer, you write them out whole, which empties the  
buffer in the upper layer making room for more data, and everything is  
peachy.

So, where and how to enable this line buffering?  Here's my proposed  
patch:

--- perlio.c;-0 2010-10-21 07:58:15 -0500
+++ perlio.c    2010-11-02 21:32:41 -0500
@@ -3758,6 +3758,22 @@ PerlIOBuf_open(pTHX_ PerlIO_funcs *self,
                  */
                 PerlLIO_setmode(fd, O_BINARY);
  #endif
+#ifdef VMS
+#include <rms.h>
+               /* Enable line buffering with record-oriented regular  
files
+                * so we don't introduce an extraneous record boundary  
when
+                * the buffer fills up.
+                */
+               if (PerlIOBase(f)->flags & PERLIO_F_CANWRITE) {
+                   Stat_t st;
+                   if (PerlLIO_fstat(fd, &st) == 0
+                       && S_ISREG(st.st_mode)
+                       && (st.st_fab_rfm == FAB$C_VAR
+                           || st.st_fab_rfm == FAB$C_VFC)) {
+                       PerlIOBase(f)->flags |= PERLIO_F_LINEBUF;
+                   }
+               }
+#endif
             }
         }
      }

[end]


This is right after the perlio layer has called down to the unixio  
layer to get the file open.  We have an fd, so we can do an fstat() on  
that and retrieve the record format from the VMS-specific bits of the  
stat structure.  Then I check to see if it's a regular file (not a  
device like a mailbox that may need to carry binary data) and that the  
record format is either variable or variable with fixed control.  If  
these conditions are met, I enable the line buffering option on that  
filehandle.

I have tested this and it works for situations similar to Martin's  
original report, and it does not introduce any new test failures in  
the test suite.  But what situations, if any, does this break?  I'm  
assuming that if the record format is FAB$C_VAR or FAB$C_VFC, the  
records will never contain binary data with embedded newlines.  Is  
that true?   What other assumptions am I making that I shouldn't?

________________________________________
Craig A. Berry
mailto:craigberry@mac.com

"... getting out of a sonnet is much more
  difficult than getting in."
                  Brad Leithauser


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About