develooper Front page | perl.pep | Postings from August 2016

Re: Email::Address::XS

Thread Previous | Thread Next
From:
Ricardo Signes
Date:
August 2, 2016 22:36
Subject:
Re: Email::Address::XS
Message ID:
20160802223611.GA24987@debian
* pali@cpan.org [2016-08-02T17:03:07]
> I can imagine, that people could be confused about header_str meaning. 
> It has suffix _str and I would expect it needs (Unicode) string, not 
> object... Name "header" is better as it does not say it needs string.

People will want to be able to pass non-ASCII strings in as subject, meaning
that header is not suitable for the "one true list of fields."  Passing in a
pre-encoded value is pretty sure to be the exception, not the rule.

In other words, I think this would be more sensible:

  header_str => [
    Foo => raw_mime($header_raw),
    Bar => "Text string to be encoded",
    Baz => $message_id_object,
  ],

The alternative, using header, is:

  header => [
    Foo => $header_raw,
    Bar => mime_encode("Text string to be encoded"),
    Baz => $message_id_object,
  ],

Of course, there's no reason that both header and header_str can't accept these
objects, and the user can pick whichever is more convenient, right?  The
difference between header and header_str becomes only the behavior for plain
strings.

> > * if you know exactly octets you, the user, want in the header field,
> > use "header", but this is likely rare
> 
> Do you mean $email->header_raw_set()?
> 
> I think it is not rare to encode header (to MIME) externally and then 
> pass ASCII 7bit string to $email. At least I see this usage for From 
> header (in previous version of Email::MIME encoding of From/To/Cc 
> headers was totally broken).

I mean both "header" in the initializer and header_set and header_raw_set,
which are equivalent.

> > unchanged are probably in error at least insofar as they let you put
> > non-7-bit-clean data in your headers.  This should probably be
> > fatal:
> > 
> >   header_str => [ Date => "\N{SMILING FACE WITH HORNS}" ]
> 
> Here is problem: Should Email::MIME understand meaning of email headers?

I think its level of understanding is roughly appropriate, although imperfect.
It's meant to prevent you passing in a string of addresses that are naively
correct but actually need encoding.

It's better if people use something structured for headers where this is
complex, though.

> Here we see that header_str does not say (or specify) which string must 
> be specified as parameter. Unicode string? Arbitrary 8bit string? 7bit 
> ASCII string? Or ASCII subset visible characters?

It says, in the docs for create:

    This method creates a new MIME part. The "header_str" parameter is a
    list of headers pairs to include in the message. The value for each pair
    is expected to be a text string that will be MIME-encoded as needed. A
    similar "header" parameter can be provided in addition to or instead of
    "header_str". Its values will be used verbatim.

*text string*, not byte string.

> I think we should unify API for it. And ideally describe into 
> documentation how to correctly use it.

Agreed.

> That /mostly/ with special exceptions for Message-Id or Date is wrong.

I don't think I agree.  I think that the behaviors on address list headers is
useful.  Ideally, people use methods to produce objects for structured headers.
email_addr_list(....) for example.  The current behavior is roughly to saying:
bare strings for these headers are implicitly parsed into objects that then
encode things.  That's roughly how the message list headers are implemented.
That the Date field is bogus is unfortunate.  I imagine that really there are
only about 3 things to worry about:

  * mailbox and mailbox list
  * fields that do not allow encoded words (and so must be 7-bit clean)
  * fields that are sequences of words

If people know how to produce the already-encoded form, they can do so already.
If they don't, but know what the decoded string would look like, the current
system can continue to improve over time.

In other words: if you say "I have this structured data and it isn't yet
encoded, please encode it for me," we need to understand it exactly enough to
know how to encode it, so this behavior is necessary if header_str is going to
work for structured fields.

> 1) Function name say what it accept

I am not very swayed by this.  Users can be surprised once for a brief moment
when they see [ header_str => [ From => $object ] ] and then they know forever.
On the other hand, having multiple sets of headers to write is annoying every
time.

> 2) No problem with meaning which type of string is accepted (subset 
> ASCII, ASCII or Unicode as described above)

This is already unambiguous.  _str forms always expect character strings.

> 3) Possible performance optimization (less objects are created)

How?

> And there is another problem still not solved. From $email object it is 
> needed also to read From/To/Cc headers and user (caller) of Email::MIME 
> module is sometimes interested in de-composited addresses objects (e.g. 
> when want to parse each email address in CC field) and sometimes 
> interested only in one string representation (e.g. want to write header 
> to STDOUT)...
> 
> With explicit $email->header_str() $email->header_addr() and also 
> $email->header_grps() calls user get type which wants. I cannot imagine 
> without 3 different calls how to achieve it.

Here is the first idea that comes to mind:

->header_str always returns a text string.

->header_raw always returns a byte string.

Pardon the arbitrary name, but:

  $email->header_frob($field);

Read only, always returns an object that can ->as_mime_string.  For fields that
were set without an object, it returns an unstructured just-in-time proxy.
Headers set with "raw" return the same kind of object I proposed above for
passing a raw header into header_str.  Headers set with header_str get the kind
of thing that mime_encode() returns.  Possibly/probably if you have set the
From header with header_str, you get the object currently being produced, just
for brief use, in Email::MIME::Encode.

> But if you still prefer that there should be only one function which 
> accept both objects and strings, lets define its name, how should it act 
> on different types of strings + header names. And also how user of 
> Email::MIME can receive for arbitrary header Unicode string value...

I believe I'm happy with my suggestion above that both header and header_str
can work with objects, with the difference being the behavior on plain old
strings.

I realize I have expanded it in the course of this email.  Do you think it is
unworkable in some way?

-- 
rjbs

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About