develooper Front page | perl.perl6.porters | Postings from October 1999

Re: Topaz and Regular Expressions

Thread Previous | Thread Next
Ed Peschko
October 20, 1999 10:22
Re: Topaz and Regular Expressions
Message ID:
On Tue, Oct 19, 1999 at 09:37:28PM -0400, Ken Fox wrote:
>Ed Peschko writes:
>> embed perl in C++ but it comes at the expense of embedding an interpreter.
>I wish there were a direct interface into regex matching (and other ops
>for that matter), but I don't see a problem with embedding itself. I
>think it's an incredibly difficult challenge to optimize ops without
>assuming that a perl interpreter is available. For example, would you
>want the regex engine to abstract the concept of evaluating a
>replacement string? That might hurt performance/maintanability quite a
>bit. Is it worth it?

I don't quite understand - when you say 'evaluation', are you referring to 
something like:

$line =~ s"(.*)" $1 x 2"ge;


$parenmatch =
        qr{ \(
                    (?p{ $parenmatch })

where there are 'perl bits' embedded in the regular expression? If so, I'd say
give the opportunity to disable them - by #ifdef if necessary - and provide the 
direct interface with the missing functionality.

The importance of all this? Well, I've been thinking a bit about the whole 
'conquer the world' thing.  I agree with Chip - there are basically two types 
of languages : systems languages and applications languages. Perl is 
relatively good as an application language but it isn't much as a system 

So if we want perl to grow into the system arena, making perl a viable API would
help quite a bit. The perl API probably would not make itself into time 
sensitive applications like kernels, but large mission-critical systems 
(like business systems software) definitely. We do it already at our company, 
but in limited scope because of the interpreter issue and thread-safeness.

>> Wheras some of us would like to say:
>> 	while (a.regmatch("H(.*?)(?=H|$)", "sg"))
>This interface looks pretty, but it is a total pain to optimize. It
>is possible to special case for (const char *) and lookup pre-built
>regexps from a cache -- but that leads to surprises when a programmer
>changes from a constant to a variable and suddenly the performance
>drops by an order.

I'm not sure what you mean here, either. Are you talking about interpolation,
about stuff like:

a.regmatch("H(.*)$variable", "sg")

If so, I think that the C++ regmatch should handle char strings and *only* 
char strings, not do any interpolation whatsoever ( well, except for internal
interpolation for things like $1). Instead I'd think that interpolation could
be handled by something like.

Scalar a = new Scalar("HELLO WORLD");

Scalar b = new Scalar("This is $a");  
b.Interpolate("a", a);       // This is HELLO WORLD

ie, that the interpolation is done explicity by the user.

>I'd much rather encourage user-visible regexp objects. IMHO C++ can be
>used to hide a great deal of complexity, but when things that look fast
>run extremely slowly, the hiding gets in the way. (This is one of the
>reasons Linux developers oppose C++ -- it's hard to "get a feel" for
>the way code will run by just looking at it.)

Again, 'user visible' is a bit of an abstraction... Could you provide a bit 
more in the way of syntax or example?

In any case, what I'm proposing is pretty much the same as what MS is trying 
to do with 'cool' (horrid name) - provide a useful layer of abstraction for 
datatypes whilst still maintaining an 'in place' language (ie: C++) The theory
then being that you can migrate to the new API with a lot less pain than if 
they were writing to a new language.

>I hope that we will be able to run ops directly without bouncing
>through the trampoline code. Is that what you mean? I rather think
>that an API layer is a very good idea -- I just don't want the API
>to be the current perl_call_sv() interface...

Well, I think that a good API would come out of Topaz if, in implementing it, 
the conscious intent to make a good API - sans interpreter - was there and it 
was banged against whilst development. So perhaps 'we' (the collective we of 
C++ users) should be thrashing against it to see how useful it is currently for
development, with the lexer in the back of our mind.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About