TransWikia.com

What are the benefits of writing vector inner products as $langle u, vrangle$ as opposed to $u^T v$?

MathOverflow Asked on November 3, 2021

In a lot of computational math, operations research, such as algorithm design for optimization problems and the like, authors like to use $$langle cdot, cdot rangle$$ as opposed to $$(cdot)^T (cdot)$$

Even when the space is clearly Euclidean and the operation is clearly the dot product. What is the benefit or advantage for doing so? Is it so that the notations generalize nicely to other spaces?

Update: Thank you for all the great answers! Will take a while to process…

10 Answers

Lots of great answers so far, but I'll add another (hopefully at least good) answer: the notation $v^T u$ makes it somewhat difficult to speak of collections of bilinear pairings depending on a parameter. Typical examples:

  • "Let $langle cdot, cdot rangle_i$ be a finite set of inner products on a vector space $V$"
  • "Let $langle cdot, cdot rangle_p$, $p in M$, be a Riemannian metric on a manifold $M$"
  • "Let $langle cdot, cdot rangle_t$ be a continuously varying family of inner products on a Hilbert space $H$"

These are all difficult to express using the transpose notation. The closest you can get is to write, for instance $v^T A_i u$ where $A_i$ is a family matrices, but particularly when one is speaking of continuously varying families of inner products you run into all sorts of difficult issues with coordinate systems, and it becomes very difficult to keep things straight.

Answered by Paul Siegel on November 3, 2021

I consider the distinction quite important. There are two separate operations that look superficially like each other but are in fact different.

First, the abstract description. If $V$ is an abstract vector space and $V^*$ is its dual, then there is the natural evaluation operation of $v in V$ and $theta in V^*$, which is commonly written as $$ langletheta,vrangle = langle v,thetarangle $$ No inner product is needed here. If you choose a basis $(e_1, dots, e_n)$ of $V$ and use the corresponding dual basis $(eta^1, dots, eta^n)$ of $V^*$ and write $v = v^ie_i$ and $theta = theta_ieta^i$, then $$ langletheta,vrangle = theta_iv^i. $$ The distinction between up and down indices indicates whether the object is a vector or a dual vector ($1$-form).

If $V$ has an inner product and $(e_1, e_n)$ is an orthonormal basis, then given two vectors $v = v^ie_i, w = w^ie_i in V$, then $$ vcdot w = v^iw^i $$ Notice that here both indices are up. There is a similar formula for the dot product of two dual vectors. Here, the formula only works if the basis is orthonormal.

How does this look in terms of row and column vectors? My personal convention, a common one, is the following:

  1. When writing the components of a matrix as $A^i_j$, I view the superscript as the row index and the subscript as the column index.
  2. I view a vector $v in V$ as a column vector, which is why its coefficients are superscripts (and the basis elements are labeled using subscripts).
  3. This means that a dual vector $theta$ is a row vector, which is why its coefficients are subscripts.
  4. With these conventions $$ langle theta,vrangle = theta v, $$ where the right side is matrix multiplication. The catch here is that the dual vector has to be the left factor and the vector the right vector. To avoid this inconsistency, I always write either $langle theta,vrangle$ or $theta_iv^i = v^itheta_i$. Again, note that these formulas hold for any basis of $V$.
  5. If $V$ has an inner product and $v, w$ are written with respect to an orthonormal basis, then indeed $$ vcdot w = v^Tw = v^iw^i $$ You can, in fact, lower (or raise) all of the indices and have an implicit sum for any pair of repeated indices. This is, in fact, what Chern would do.

ASIDE: I gotta say that having such precisely defined conventions is crucial to my ability to do nontrivial calculations with vectors and tensors. When I was a graduate student, my PhD advisor, Phillip Griffiths, once asked me, "Have you developed your own notation yet?" I also have to acknowledge that my notation is either exactly or based closely on Robert Bryant's notation.

Answered by Deane Yang on November 3, 2021

Maybe it's worth mentioning that the computer language APL has a "generalized" inner product where you can use any two functions of two arguments (i.e., "dyadic functions" in APL terms) to form an inner product. Thus, for example, ordinary inner product is written as "A+.xB", which can apply to two arrays A, B of any dimension whatsoever (vectors, matrices, three-dimensional arrays, etc.), provided that the last dimension of A matches the first dimension of B.

Thus, for example, A^.=B represents string matching of A against B, Ax.*B evaluates a number given its prime divisors A and prime factorization exponents B, etc.

The authors of APL, Iverson and Falkoff, cared intensely about notation and tried to find the most general interpretation of every new item they added to the language.

Answered by Jeffrey Shallit on November 3, 2021

Mathematical notation in a given mathematical field $X$ is basically a correspondence $$ mathrm{Notation}: { hbox{well-formed expressions}} to { hbox{abstract objects in } X }$$ between mathematical expressions (or statements) on the written page (or blackboard, electronic document, etc.) and the mathematical objects (or concepts and ideas) in the heads of ourselves, our collaborators, and our audience. A good notation should make this correspondence $mathrm{Notation}$ (and its inverse) as close to a (natural) isomorphism as possible. Thus, for instance, the following properties are desirable (though not mandatory):

  1. (Unambiguity) Every well-formed expression in the notation should have a unique mathematical interpretation in $X$. (Related to this, one should strive to minimize the possible confusion between an interpretation of an expression using the given notation $mathrm{Notation}$, and the interpretation using a popular competing notation $widetilde{mathrm{Notation}}$.)
  2. (Expressiveness) Conversely, every mathematical concept or object in $X$ should be describable in at least one way using the notation.
  3. (Preservation of quality, I) Every "natural" concept in $X$ should be easily expressible using the notation.
  4. (Preservation of quality, II) Every "unnatural" concept in $X$ should be difficult to express using the notation. [In particular, it is possible for a notational system to be too expressive to be suitable for a given application domain.] Contrapositively, expressions that look clean and natural in the notation system ought to correspond to natural objects or concepts in $X$.
  5. (Error correction/detection) Typos in a well-formed expression should create an expression that is easily corrected (or at least detected) to recover the original intended meaning (or a small perturbation thereof).
  6. (Suggestiveness, I) Concepts that are "similar" in $X$ should have similar expressions in the notation, and conversely.
  7. (Suggestiveness, II) The calculus of formal manipulation in $mathrm{Notation}$ should resemble the calculus of formal manipulation in other notational systems $widetilde{mathrm{Notation}}$ that mathematicians in $X$ are already familiar with.
  8. (Transformation) "Natural" transformation of mathematical concepts in $X$ (e.g., change of coordinates, or associativity of multiplication) should correspond to "natural" manipulation of their symbolic counterparts in the notation; similarly, application of standard results in $X$ should correspond to a clean and powerful calculus in the notational system. [In particularly good notation, the converse is also true: formal manipulation in the notation in a "natural" fashion can lead to discovering new ways to "naturally" transform the mathematical objects themselves.]
  9. etc.

To evaluate these sorts of qualities, one has to look at the entire field $X$ as a whole; the quality of notation cannot be evaluated in a purely pointwise fashion by inspecting the notation $mathrm{Notation}^{-1}(C)$ used for a single mathematical concept $C$ in $X$. In particular, it is perfectly permissible to have many different notations $mathrm{Notation}_1^{-1}(C), mathrm{Notation}_2^{-1}(C), dots$ for a single concept $C$, each designed for use in a different field $X_1, X_2, dots$ of mathematics. (In some cases, such as with the metrics of quality in desiderata 1 and 7, it is not even enough to look at the entire notational system $mathrm{Notation}$; one must also consider its relationship with the other notational systems $widetilde{mathrm{Notation}}$ that are currently in popular use in the mathematical community, in order to assess the suitability of use of that notational system.)

Returning to the specific example of expressing the concept $C$ of a scalar quantity $c$ being equal to the inner product of two vectors $u, v$ in a standard vector space ${bf R}^n$, there are not just two notations commonly used to capture $C$, but in fact over a dozen (including several mentioned in other answers):

  1. Pedestrian notation: $c = sum_{i=1}^n u_i v_i$ (or $c = u_1 v_1 + dots + u_n v_n$).
  2. Euclidean notation: $c = u cdot v$ (or $c = vec{u} cdot vec{v}$ or $c = mathbf{u} cdot mathbf{v}$).
  3. Hilbert space notation: $c = langle u, v rangle$ (or $c = (u,v)$).
  4. Riemannian geometry notation: $c = eta(u,v)$, where $eta$ is the Euclidean metric form (also $c = u neg (eta cdot v)$ or $c = iota_u (eta cdot v)$; one can also use $eta(-,v)$ in place of $eta cdot v$. Alternative names for the Euclidean metric include $delta$ and $g$).
  5. Musical notation: $c = u_flat(v)$ (or $c = u^flat(v)$).
  6. Matrix notation: $c = u^T v$ (or $c = mathrm{tr}(vu^T)$ or $c = u^* v$ or $c = u^dagger v$).
  7. Bra-ket notation: $c = langle u| vrangle$.
  8. Einstein notation, I (without matching superscript/subscript requirement): $c = u_i v_i$ (or $c=u^iv^i$, if vector components are denoted using superscripts).
  9. Einstein notation, II (with matching superscript/subscript requirement): $c = eta_{ij} u^i v^j$.
  10. Einstein notation, III (with matching superscript/subscript requirement and also implicit raising and lowering operators): $c = u^i v_i$ (or $c = u_i v^i$ or $c = eta_{ij} u^i v^j$).
  11. Penrose abstract index notation: $c = u^alpha v_alpha$ (or $c = u_alpha v^alpha$ or $c = eta_{alpha beta} u^alpha v^beta$). [In the absence of derivatives this is nearly identical to Einstein notation III, but distinctions between the two notational systems become more apparent in the presence of covariant derivatives ($nabla_alpha$ in Penrose notation, or a combination of $partial_i$ and Christoffel symbols in Einstein notation).]
  12. Hodge notation: $c = mathrm{det}(u wedge *v)$ (or $u wedge *v = c omega$, with $omega$ the volume form). [Here we are implicitly interpreting $u,v$ as covectors rather than vectors.]
  13. Geometric algebra notation: $c = frac{1}{2} {u,v}$, where ${u,v} := uv+vu$ is the anticommutator.
  14. Clifford algebra notation: $uv + vu = 2c1$.
  15. Measure theory notation: $c = int_{{1,dots,n}} u(i) v(i) d#(i)$, where $d#$ denotes counting measure.
  16. Probabilistic notation: $c = n {mathbb E} u_{bf i} v_{bf i}$, where ${bf i}$ is drawn uniformly at random from ${1,dots,n}$.
  17. Trigonometric notation: $c = |u| |v| cos angle(u,v)$.
  18. Graphical notations such as Penrose graphical notation, which would use something like $displaystyle c =bigcap_{u v}$ to capture this relation.
  19. etc.

It is not a coincidence that there is a lot of overlap and similarity between all these notational systems; again, see desiderata 1 and 7.

Each of these notations is tailored to a different mathematical domain of application. For instance:

  • Matrix notation would be suitable for situations in which many other matrix operations and expressions are in use (e.g., the rank one operators $vu^T$).
  • Riemannian or abstract index notation would be suitable in situations in which linear or nonlinear changes of variable are frequently made.
  • Hilbert space notation would be suitable if one intends to eventually generalize one's calculations to other Hilbert spaces, including infinite dimensional ones.
  • Euclidean notation would be suitable in contexts in which other Euclidean operations (e.g., cross product) are also in frequent use.
  • Einstein and Penrose abstract index notations are suitable in contexts in which higher rank tensors are heavily involved. Einstein I is more suited for Euclidean applications or other situations in which one does not need to make heavy use of covariant operations, otherwise Einstein III or Penrose is preferable (and the latter particularly desirable if covariant derivatives are involved). Einstein II is suitable for situations in which one wishes to make the dependence on the metric explicit.
  • Clifford algebra notation is suitable when working over fields of arbitrary characteristic, in particular if one wishes to allow characteristic 2.

And so on and so forth. There is no unique "best" choice of notation to use for this concept; it depends on the intended context and application domain. For instance, matrix notation would be unsuitable if one does not want the reader to accidentally confuse the scalar product $u^T v$ with the rank one operator $vu^T$, Hilbert space notation would be unsuitable if one frequently wished to perform coordinatewise operations (e.g., Hadamard product) on the vectors and matrices/linear transformations used in the analysis, and so forth.

(See also Section 2 of Thurston's "Proof and progress in mathematics", in which the notion of derivative is deconstructed in a fashion somewhat similar to the way the notion of inner product is here.)

ADDED LATER: One should also distinguish between the "one-time costs" of a notation (e.g., the difficulty of learning the notation and avoiding standard pitfalls with that notation, or the amount of mathematical argument needed to verify that the notation is well-defined and compatible with other existing notations), with the "recurring costs" that are incurred with each use of the notation. The desiderata listed above are primarily concerned with lowering the "recurring costs", but the "one-time costs" are also a significant consideration if one is only using the mathematics from the given field $X$ on a casual basis rather than a full-time one. In particular, it can make sense to offer "simplified" notational systems to casual users of, say, linear algebra even if there are more "natural" notational systems (scoring more highly on the desiderata listed above) that become more desirable to switch to if one intends to use linear algebra heavily on a regular basis.

Answered by Terry Tao on November 3, 2021

One advantage of $langle cdot, cdot rangle$ is that you don't have to worry about changes in basis.

Suppose we have a coordinate system $alpha$ in which our (real) inner product space is explicitly Euclidean, and an alternative coordinate system $beta$. A vector $v$ is expressed in the coordinates systems as, respectively, the column vectors $[v]_alpha$ and $[v]_beta$. Let $P$ denote the change of basis matrix

$$ [v]_beta = P [v]_alpha $$

The inner product, which in coordinate system $alpha$ is $langle v, vrangle = [v]_{alpha}^T [v]_{alpha}$ is certainly not in general $[v]_beta^T[v]_beta$ in the second coordinate system. (It is only so if $P$ is orthogonal.)


That said: given any Hilbert space $V$, by Riesz-representation there exists an (anti-)isomorphism from $V$ to its dual space $V^*$. You can certainly choose to call this mapping $v mapsto v^*$ (in Riemannian geometry contexts this is more usually denoted using the musical isomorphism notation $flat$ and $sharp$) and I don't think in this case there are reasons to prefer one to another. But a major caveat if you do things this way is that unless you are working in an orthonormal basis, you cannot associate $v mapsto v^*$ to the "conjugate transpose" operation on matrices.

Answered by Willie Wong on November 3, 2021

The family $F$ of (real) quadratic polynomials is a vector space isomorphic to the vector space $mathbb{R}^3.$ One way to make $F$ an inner product space is to define $langle f, g rangle =int_a^bf(t)g(t),dt$ for some fixed interval $[a,b].$ Instead of quadratic polynomials one might consider all polynomials or all bounded integrable functions. One could also define the inner product as $langle f, g rangle =int_a^bf(t)g(t)mu(t)dt$ for some weight function $mu.$ There isn’t a natural role for transposes here.

Answered by Aaron Meyerowitz on November 3, 2021

One huge advantage, to my mind, of the bracket notation is that it admits 'blanks'. So one can specify the notation for an inner product as $langle , rangle$, and given $langle , rangle : V times V rightarrow K$, one can define elements of the dual space $V^star$ by $langle u , - rangle$ and $langle -, v rangle$. (In the complex case one of these is only conjugate linear.)

More subjective I know, but on notational grounds I far prefer to write $langle Au, v rangle = langle u, A^dagger v rangle$ for the adjoint map than $(Au)^t v = u^t (A^tv)$. The former also emphasises that the construction is basis independent. It generalises far better to Hilbert spaces and other spaces with a non-degenerate bilinear form (not necessarily an inner product).

I'll also note that physicists, and more recently anyone working in quantum computing, have taken the 'bra-ket' formulation to the extreme, and use it to present quite intricate eigenvector calculations in a succinct way. For example, here is the Hadamard transform in bra-ket notation:

$$ frac{| 0 rangle + |1 rangle}{sqrt{2}} langle 0 | + frac{| 0 rangle - |1rangle}{sqrt{2}} langle 1 |. $$

To get the general Hadamard transform on $n$ qubits, just taken the $n$th tensor power: this is compatible with the various implicit identifications of vectors and elements of the dual space.

Finally, may I issue a plea for everyone to use $langle u ,v rangle$, with the LaTeX langle and rangle rather than the barbaric $<u,v>$.

Answered by Mark Wildon on November 3, 2021

Inner product is defined axiomatically, as a function from $Vtimes Vto k$, where $k$ is a field and $V$ is a $k$-vector space, satisfying the three well-known axioms. The usual notation is $(x,y)$. So when you want to say anything about an arbitrary inner product, you use this notation (or some similar one). $(x,y)=x^*y$ is just one example of an inner product on the space $mathbb C^n$. There are other examples on the same space, $(x,y)=x^*Ay$ where $A$ is an arbitrary Hermitian positive definite matrix, and there are dot products on other vector spaces.

Answered by Alexandre Eremenko on November 3, 2021

I do not see a compelling argument for $langle cdot, cdot rangle$ over $(cdot)^T(cdot)$, or, better $(cdot)^*(cdot)$, so that the star operator can be generalized to other more complicated settings (complex vectors, Hilbert spaces with a dual operation).

Let me summarize the arguments in the comments:

  • emphasizes vectors as geometric objects: not clear why $u^*v$ is less geometric.
  • free space for a superscript: I agree, that is an argument in favor of $langle cdot, cdot rangle$. In a setting where I need many superscripts, I would probably favor that notation.
  • emphasizes bilinearity: disagree. In the complex case, it makes a lot less clear why one of these two arguments is not like the other and implies a conjugation, and it does not make clear which one it is: is $langle lambda u,v rangle$ equal to $lambdalangle u,v rangle$ or to $overline{lambda}langle u,v rangle$? Is there a way to recall it other than remembering it?
  • Leaves room for an operator and gives a clear interpretation of adjointness: I find $(Au)^*v=u^*A^*v = u(A^*v)$ equally clear, and it relies only on manipulations that are well ingrained in the mind of mathematicians.
  • Gives an interpretation for the linear functional $langle u, cdot rangle$: but what is $u^*$ or $u^T$ if not a representation for that same linear functional?

An advantage of the $u^*v$ notation, in my view, it that it makes clear that some properties are just a consequence of associativity. Consider for instance the orthogonal projection on the orthogonal space to $u$

$$Pv = (I-uu^*)v = v - u(u^*v).$$

If one writes it as $v - langle v,u rangle u$ (especially by putting the scalar on the left as is customary), it is less clear that it is equivalent to applying the linear operator $I-uu^*$ to the vector $v$. Also, the notation generalizes nicely to repeated projections $$ (I-u_1u_1^* - u_2u_2^*)v = (I - begin{bmatrix}u_1 & u_2end{bmatrix}begin{bmatrix}u_1^* \ u_2^*end{bmatrix})v = (I - UU^*)v. $$

A disadvantage, of course, is working with spaces of matrices, where transposes already have another meaning; for instance, working with the trace scalar product $langle A,B rangle := operatorname{Tr}(A^TB)$ one really needs the $langle A,B rangle$ notation.

Answered by Federico Poloni on November 3, 2021

This is to expand on my comment in response to Federico Poloni:

$langle u,vrangle $ is explicitly a number, whereas $u^Tv$ is a 1 by 1 matrix :).

While it is true that there is a canonical isomorphism between the two, how do you write the expansion of $u$ in an orthonormal base ${v_i}$? Something like $$ u=sum_i u^Tv_i v_i $$ feels uncomfortable as if you view everything as matrices, the dimensions do not allow for multiplication. So, I would at least feel a need to insert parentheses, $$ u=sum_i (u^Tv_i) v_i, $$ to indicate that the canonical isomorphism is applied. But that is still vague-ish while already cancelling any typographical advantages of $u^Tv$.

(I do also share the sentiment that the basis-dependent language is inferior and should be avoided when possible.)

Answered by Kostya_I on November 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP