On 2/27/07, Nicholas Clark <nick@ccl4.org> wrote: > [and y'all thought that this thread was dead. Well, no, it's undead. :-)] > 3: I'm not sure that there are really sufficient tasks in the perl todo to > keep someone busy for the other 3 days a week (for that long) even if > full time funding were to appear. I think perltodo could be beefed up. I know in my area of interest there is quite a bit of work that could be done. Heres an off the top of my head list in no particular order. 1. Improve unicode charclass representation 2. Split match results out of the regexp structure 3. rework compilation so that further and more aggressive optimisations are possible 4. trifurcate the match routine into 3, octect-octet/ut8-utf8/octect-utf8 5. split up study_chunk in multiple passes that do different things (see 3) 6. rework exact nodes so they store octect and utf8 representations when necessary 7. rework trie code so that it doesnt use a hash for handling utf8 charclass lookups 8. rework trie code so that it doesnt use a compressed table representation when the table is sufficiently small. 9. rework aho-corasick logic so it doesnt get invoked when the string being matched is very short. 10. implement reverse regops 11. change the way variable binding in regexps is done 12. fix the way (?{...}) (??{...}) interact with PL_curpm 13. introduce ways to handle safe signals during long matches 14. introdcue resource limits on regexps so that they can consume only a certain amount of memory or run for a certain amount of regops 15. add limited DFA capabilities for degenerate type patterns. 16. clean up source code so that all regexp related code are grouped together in clearly identifiable groups. 17. make $REGERROR and $REGMARK dynamically scoped 18. rework the initialization stage of regmatch so that initialization is only done when necessary 19. improve the way regexp.extflags are ordered and processed so that accessing the flags is more efficient. 20. enhance pos code so that we store both character and byte offsets when matching a utf8 string. 21. cache charclass properties for localized matches. 22. come up with a way of mapping capture buffers into the perl 6 nested capturing scheme and introduce a way to access them in that way. 23. implement char class set operations 24. super-linear cache for recursive regexps. Thats just what i can come up with off the top of my head. Also in the general codebase (again not in any particular order): 1. mg_find is heavily overused throughout the code. Many routines that ive looked into throughout the code base do repeated searches though the magic linked list in the same routine. there must be a way to make this more efficient. 2. swash code for unicode is currently half-perl, the other half could probably be converted. 3. look into improving codepaths for unicode logic. many times we dont get to the core routine without going through multiple layers of wrapper routines 4. rework as much of the internal logic that uses char * as part of its interface to take some kind of lightweight string structure that knows its encoding and length. Anyway, just some ideas. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"