start | find | index | login or register | edit
2008-05-24
by earl, 4192 days ago
Kragen Sitaker and Aristotle Pagaltzis on strlen performance with UTF-8: "GCC is better at writing x86 assembly than I (Kragen) am. Aristotle is better at writing x86 assembly than GCC is. [The] penalty for counting [..] or iterating over the characters of a UTF-8 string [..] is very small."

I think the conclusion that "indexing into" an UTF-8 string is also carrying only a small penalty is the result an editorial mistake in Kragen's write-up; considering that in the introduction he quotes Aristotle with "All you lose with a variable-width encoding is direct random access to arbitrary indices in the string."

Otherwise, a very fine write-up. I just love this style of exposition, with a clearly-defined hypothesis, working code and proper performance measurements. Just like early CACM, and I think that's the way a lot of computer science (still) should be presented.
powered by vanilla
echo earlZstrainYat|tr ZY @.
earl.strain.at • esa3 • online for 6829 days • c'est un vanilla site