TransWikia.com

Unicode character matching in Latex macros

TeX - LaTeX Asked by Nobody-Knows-I-am-a-Dog on December 2, 2020

I have a problem getting a macro working with UTF-8 characters. I was able to boil down my problem to the following non-working minimal example:

documentclass{beamer}
defta#1#2{BA}
begin{document}
begin{frame}

ta1 ä
end{frame}
end{document}

Which produces a Invalid UTF-8 byte "A4 error. My file is in UTF-8 encoding.

Looking at the UTF-8 table it is obvious where the A4 comes from, it is the second byte of the umlaut.

However, I have no idea how to fix this.

How would I properly design a macro which is supposed to pick up individual letters – without splitting UTF characters into several bytes??

Update: Learned a lot here but still stuck with the ultimate use case, which I shall copy in an example of:

documentclass{article}
usepackage{tabto}

makeatletter
deftb#1#2{defca{#1}expandaftertba#2}

deftba#1{%
ifx#1+tabto{dimexprca cm + 0.5cm}%
elseifx#1-tabto{dimexprca cm - 0.5cm}%
elsetabto{dimexprca cm}%
ifxUTFviii@two@octets#1expandaftertbaafi%
ifxUTFviii@three@octets#1expandaftertbabfi%
ifxUTFviii@four@octets#1expandaftertbacfi%
fifi%
relax#1%
}

deftbaa#1#2{#1#2}
deftbab#1#2#3{#1#2#3}
deftbac#1#2#3#4{#1#2#3#4}

begin{document}

VORtb3- Ü

VOR tb3 Ü

VOR tb3+ Ü
end{document}

I cannot get rid of the printout of the + and – in the + and – case. Whatever I tried always produces yet another UTF-8 coding error.

One Answer

You need to brace the arguments to keep things together (your ta macro works unchanged with braced arguments, although I add a typeout here for debugging. Or you need to inspect the first byte, and then collect as many bytes as are in that character's UTF-8 encoding (tb here)

This produces a log

1,ä
macro:->1,macro:->UTFviii@two@octets ä
2,?
macro:->2,macro:->UTFviii@four@octets ?

from

documentclass{beamer}
defta#1#2{%
typeout{detokenize{#1},detokenize{#2}}%
BA}

makeatletter
deftb#1#2{%
defca{#1}expandaftertba#2}
deftba#1{%
ifxUTFviii@two@octets#1expandaftertbaafi
ifxUTFviii@three@octets#1expandaftertbabfi
ifxUTFviii@four@octets#1expandaftertbacfi
relax#1}
deftbaa#1relax#2#3#4{%
defcb{#2#3#4}%
typeout{meaningca,meaningcb}%
BA}
deftbab#1relax#2#3#4#5{%
defcb{#2#3#4#5}%
typeout{meaningca,meaningcb}%
BA}
deftbac#1relax#2#3#4#5#6{%
defcb{#2#3#4#5#6}%
typeout{meaningca,meaningcb}%
BA}

begin{document}
begin{frame}

ta{1}{ä}

tb 1 ä

ta{2}{?}

tb 2 ?


end{frame}
end{document}

Answered by David Carlisle on December 2, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP