Front page | perl.perl6.internals |
Postings from January 2002
Re: on parrot strings
Thread Previous
|
Thread Next
From:
Bryan C. Warnock
Date:
January 21, 2002 14:48
Subject:
Re: on parrot strings
Message ID:
20020121224848.LIPF10932.femail35.sdc1.sfba.home.com@there
On Monday 21 January 2002 17:11, Russ Allbery wrote:
> No, pretty much all of the time. There are differences between proper
> nouns and common nouns, but those are differences routinely quashed as a
> typesetting decision; if you write both proper nouns and common nouns in
> all caps as part of a headline, the lack of distinction is not considered
> a misspelling. Similarly, if you capitalize the common noun because it
> occurs at the beginning of the sentence, that doesn't transform its
> meaning.
That doesn't mitigate the fact that they are different words. Sure, English
is forgiving, as its filled with heteronyms and homographs. But it's all
moot because regexes are character-oriented, not word-oriented.
Given that they're character-oriented, we only need to provide character
transformations between upper, lower, and title case. But is that the
dividing line?
>
> Whereas adding or removing an accent is always considered a misspelling,
> at least in some languages. It's like adding or removing random letters
> from the word.
No, it's substituting letters in a word. It's adding or removing random
characters from the string representation of the word.
>
> re'sume' and resume are two different words. It so happens that in
> English re'sume' is a varient spelling for one meaning of resume. I don't
> believe that regexes should try to automatically pick up varient
> spellings. Should the regex /aerie/ match /eyrie/? That makes as much
> sense as a search for /resume/ matching /re'sume'/.
Varient spellings imply word-oriented searches. We're talking about
character-oriented transformations, and the questions is whether or not
there's enough justification - which I feel won't come from grammatical
rationales, but from the 7-bit ASCII storage of words with accents - to
provide a transformation from a base letter with accents to just the base
letter.
Do you feel that altering accented letters to better represent them within
the facilities provided isn't done, or is wrong? I'm not sure what
you're typing as your example word, and whether or not it's getting munged
in the meantime, but "résumé" (r, e accent, s, u, m, e accent) is coming
across "re'sume'" (r, e, apostrophe, s, u, m, e, apostrophe). (The incoming
message was encoded ISO-8859-1, so presumably it should have preserved
character 233, which is what I'm sending out.)
This isn't a ridiculous question. Personally, I don't think that we should.
The facilities are quickly coming into place to be able to do proper
character encodings, and I think that we should lead from the front and
encourage folks to be proper - not only in their searches, but in their text
production.
--
Bryan C. Warnock
bwarnock@capita.com
Thread Previous
|
Thread Next