develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
demerphq
Date:
February 27, 2007 13:16
Subject:
Re: Future Perl development
Message ID:
9b18b3110702271315l27afd6fh75ab64726d35f5f6@mail.gmail.com
On 2/27/07, Nicholas Clark <nick@ccl4.org> wrote:
> [and y'all thought that this thread was dead. Well, no, it's undead. :-)]
> 3: I'm not sure that there are really sufficient tasks in the perl todo to
>    keep someone busy for the other 3 days a week (for that long) even if
>    full time funding were to appear.

I think perltodo could be beefed up. I know in my area of interest
there is quite a bit of work that could be done. Heres an off the top
of my head list in no particular order.

1. Improve unicode charclass representation
2. Split match results out of the regexp structure
3. rework compilation so that further and more aggressive
optimisations are possible
4. trifurcate the match routine into 3, octect-octet/ut8-utf8/octect-utf8
5. split up study_chunk in multiple passes that do different things (see 3)
6. rework exact nodes so they store octect and utf8 representations
when necessary
7. rework trie code so that it doesnt use a hash for handling utf8
charclass lookups
8. rework trie code so that it doesnt use a compressed table
representation when the table is sufficiently small.
9. rework aho-corasick logic so it doesnt get invoked when the string
being matched is very short.
10. implement reverse regops
11. change the way variable binding in regexps is done
12. fix the way (?{...}) (??{...}) interact with PL_curpm
13. introduce ways to handle safe signals during long matches
14. introdcue resource limits on regexps so that they can consume only
a certain amount of memory or run for a certain amount of regops
15. add limited DFA capabilities for degenerate type patterns.
16. clean up source code so that all regexp related code are grouped
together in clearly identifiable groups.
17. make $REGERROR and $REGMARK dynamically scoped
18. rework the initialization stage of regmatch so that initialization
is only done when necessary
19. improve the way regexp.extflags are ordered and processed so that
accessing the flags is more efficient.
20. enhance pos code so that we store both character and byte offsets
when matching a utf8 string.
21. cache charclass properties for localized matches.
22. come up with a way of mapping capture buffers into the perl 6
nested capturing scheme and introduce a way to access them in that
way.
23. implement char class set operations
24. super-linear cache for recursive regexps.

Thats just what i can come up with off the top of my head.

Also in the general codebase (again not in any particular order):

1. mg_find is heavily overused throughout the code. Many routines that
ive looked into throughout the code base do repeated searches though
the magic linked list in the same routine. there must be a way to make
this more efficient.
2. swash code for unicode is currently half-perl, the other half could
probably be converted.
3. look into improving codepaths for unicode logic. many times we dont
get to the core routine without going through multiple layers of
wrapper routines
4. rework as much of the internal logic that uses char * as part of
its interface to take some kind of lightweight string structure that
knows its encoding and length.

Anyway, just some ideas.

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About