On 2021-08-07 4:46 a.m., Dave Mitchell wrote: > On Fri, Aug 06, 2021 at 11:51:01PM -0700, Darren Duncan wrote: >> Perhaps a reasonable design would be that if a file contains a UTF-8 >> declaration anywhere in it, the entire file is treated as such, both the >> part above and the part below that declaration. And if multiple conflicting >> encoding declarations exist for the current file, that is an error. > > How could that possibly work? If the 'use utf8' appears halfway through > the source file, does that retrospectively invalidate everything parsed so > far? It would if the parser was so far treating everything parsed so far as something other than UTF-8. As said in my post, the parser would restart at the beginning of the file and treat it as UTF-8. But this would only need to happen if the parser kept track of whether it saw any high bits so far, and if it didn't, it knows it only saw ASCII which is also valid UTF-8 and it can skip the restart. -- Darren DuncanThread Previous | Thread Next