I'm almost ready to submit my proposed changes for the uc(), lcfirst(), etc. functions for code review. But I have several more questions. These functions are all in pp.c. Currently, if in a "use bytes" scope these functions treat the data as strict ASCII, and change the case accordingly. Someone earlier suggested that this is a bug, that this mode is really for binary data only, and that the case should not change in this mode. What should I do? There are a couple cases where a string has to be converted to utf8. bytes_to_utf8() assumes the worst case that the new string will occupy 2n+1 bytes, and allocates a new scalar with that size. The code in these functions check every time through the processing characters loop to see if more space is needed, and if so grows the scalar by just that amount. (This happens only in Unicode where the worst case may be more than 2n) Which precedent would it be preferable for me to follow when the worst case is 2n? The ucfirst() and lcfirst() functions are implemented in one function which branches at the crucial moment to do the upper or lower case and then comes back together. Comments in the code ask if the same thing should happen for lc() and uc(). There are now several differences between the two, but the vast majority of these routines is identical. Should I do the combining or let it alone? Finally, it would be trivial to change ucfirst() and lcfirst() so that if handed a utf8 string in which the first character (the only one being operated on) is in the strict ascii range, then to look up its case change in a compiled-in table instead of going out to the filesystem to look it up, as it must do for the general case. The extra expense when this isn't true is an extra comparison, but if it is true, there is quite a bit of savings. Shall I make this change? An extension could be to even do this on characters in the 128-255 range, but there would need to be more extensive code changes, and extra tests, so I don't think that this is worth doing.Thread Next