develooper Front page | perl.perl5.porters | Postings from February 2011

Registry for proposed new regex modifiers; was: Optimizing qr/STRING/msixpodualfor amusement

Thread Previous | Thread Next
From:
Karl Williamson
Date:
February 21, 2011 09:08
Subject:
Registry for proposed new regex modifiers; was: Optimizing qr/STRING/msixpodualfor amusement
Message ID:
4D629BA4.90401@khwilliamson.com
On 02/21/2011 09:48 AM, Tom Christiansen wrote:
>> Using an extended English word list, I get:
>
>> 14-letter "aadeeilmoprsux", no dups:     "superomedial"      (leaving "ax")
>> 14-letter "aadeeilmoprsux", dups:        "pseudoparallelism" (leaving "x")
>> 15-letter "aadeegilmoprsux", no dups:    "superomedial"      (leaving "agx")
>> 15-letter "aadeegilmoprsux", dups:       "megalosauridae"
>>                                           "pseudoparallelism"
>>                                           "pseudoparaplegia"
>> 16-letter "aacdeegilmoprsux", no dups:   "ampelidaceous"     (leaving "grx")
>>                                           "parmeliaceous"     (leaving "dgx")
>> 16-letter "aacdeegilmoprsux", dups:      "ampelidaceous"
>>                                           "cardiomegalies"
>>                                           "megalosauridae"
>>                                           "paradoxicalsleep"
>>                                           "parmeliaceous"
>>                                           "pseudoacademical"
>>                                           "pseudoparallelism"
>>                                           "pseudoparaplegia"
>>                                           "pseudosacrilegious"
>
> Oh!  We're trying for longest words not highest scores?
>
> If you change my sort from
>
>
> 	       $b->{SCORE}<=>           $a->{SCORE}
> 			      ||
> 	length $a->{LEFT}<=>    length $b->{LEFT}
> 			      ||
> 	length $b->{KEY}<=>    length $a->{KEY}
> 			      ||
> 	       $a->{KEY}      cmp          $b->{KEY}
> 			      ||
> 	       $a->{RANK}<=>           $b->{RANK}
>
>
> around to this:
>
> 	length $b->{KEY}<=>    length $a->{KEY}
> 			      ||
> 	       $b->{SCORE}<=>           $a->{SCORE}
> 			      ||
> 	length $a->{LEFT}<=>    length $b->{LEFT}
> 			      ||
> 	       $a->{KEY}      cmp          $b->{KEY}
> 			      ||
> 	       $a->{RANK}<=>           $b->{RANK}
>
>
> Then you get these:
>
>      13 saluspopulisupremalex  /d              ‖ salus populi suprema lex (esto) [n.]
>      11 dollarimperialism      /xue            dollar imperialism ← dollar
>      10 salledespasperdus      /mixo           salle des pas perdus ← salle
>      10 middleeardisease       /xpou           › middle ear disease ← middle
>      10 pseudoplasmodium       /xare           pseudo‐plasˈmodium ← pseudo‐
>       9 parallelepipedal       /msxou          parallelepipedal [adjs.] ← parallelepiped
>       9 peerlessprimrose       /xduaa          peerless primrose ← primrose
>       9 primrosepeerless       /xduaa          ˈprimrose ˈpeerless [n.]
>       7 surprisesurprise       /mxodaal        › surprise, surprise ← surprise
>      12 pullorumdisease        /xa             pullorum disease ← pullorum
>      11 palladiousoxide        /mre            › palladious oxide ← palladious
>      11 russellsparadox        /mie            › Russell’s paradox ← paradox
>      11 sailorspleasure        /mxd            sailor’s pleasure ← sailor
>      10 semiellipsoidal        /xuar           semi‐ellipˈsoidal [a.] ← ˌsemi‐eˈllipse
>       9 plasmalemmasome        /ixdur          plasmaˈlemmasome [n.]
>       9 spuriouspareira        /mxdle          › Spurious Pareira ← pareira
>      11 parallelmedium         /sxo            › parallel‐medium ← parallel
>      11 pseudoallelism         /xar            pseudoaˈllelism ← pseudoallele
>      10 parallelopiped         /msxu           × parallelopiped [n.] → parallelepiped
>      10 primordialsoup         /xaee           › primordial soup ← soup
>      10 pseudoperidium         /xaal           ‖ pseudopeˈridium ← pseudo‐
>      10 pseudospermium         /xaal           ‖ pseudoˈspermium [a.] ← pseudo‐
>      10 purpurasimplex         /odae           › purpura simplex ← purpura
>      10 soldierssupper         /mxaa           soldier’s supper ← soldier
>       9 disimperialism         /xouae          disimperialism [n.]
>       9 parallelepiped         /msxou          parallelepiped [n.]
>
> I can't seem to break the 16-letters barrier without allowing non-letters
> in terms. Well, ok, maybe "pseudoplasmodium", since the hyphen is optional.
> Now I wonder whether I'm actually scoring these the way the rest of you
> guys are. :(
>
> I think I'm afraid to look up what pullorum disease might be.
>
> --tom
>

Note that I intend to fix things so that multiple charset modifiers are 
disallowed, same as in infix (didn't have time before the freeze).

Note also that there are a number of enhancements under consideration 
for future Perls that will affect all this.

Here are the ones I can think of at the moment:
/w to change the meaning of \b to Unicode's word boundary algorithm.
/k to change /i to mean compatible decomposition
/? to change /i to mean canonical decomposition.  I've thought about /i 
/ii, and /iii for these; but don't think it's a good idea.
/? to change /i to mean simple case folding, or make that the default 
and /f to mean full.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About