Simple explanation of sliding-window and wNAF methods of elliptic curve point multiplication

Question

I'm trying to understand the implementation of elliptic curve point multiplication.  I can easily understand the naive double-and-add algorithm, but I'm struggling to find a good explanation / example of the window method, or indeed of the wNAF method.
The canonical reference seems to be this Wikipedia page, in which is outlined all the various algorithms on a high level.
Does anyone know of any accessible explanations / examples of either the window method / wNAF method for point multiplication?  Alternatively, can anyone easily explain how these algorithms work?

To give you an example of how I'm confused by the Wikipedia article, the windowed method is listed as:
Q ← 0
  for i from m to 0 do
      Q ← point_double_repeat(Q, w)
      if di > 0 then
          Q ← point_add(Q, diP) # using pre-computed value of diP
  return Q

But it references the "point_double_repeat(Q, w)". method call.  But what is this method? What does this actually do?
With regards to the "w-ary non-adjacent form (wNAF) method", the algorithm seems simple to follow, but it's hard to be confident that I understand it properly without a simple example.
I understand that this is probably quite rudimentary to many people on this forum, but I would nonetheless greatly appreciate any help.

Update 10 Aug:
Reading Sam's answer below makes sense, and I was able to implement the wNAF method, but I'm still having trouble understanding how to implement the window method.
This is how I've tried to understand it:
Lets start with an example of a normal double-and-add for the scalar 2329.
2329 in binary is 100100011001, so going from right to left:
1   0   0   1   0   0   0   1   1   0   0   1
DBL DBL DBL DBL DBL DBL DBL DBL DBL DBL DBL START
ADD         ADD             ADD ADD         ADD
2048        256             16  8           1

As you can see there are 12 doubles and five additions.  Now I'm going to try to convert this to a window method so that there are less additions. Lets use a window size of 4:
1   0   0   1       0   0   0   1       1   0   0   1
DBL DBL DBL DBL     DBL DBL DBL DBL     DBL DBL DBL START
2048   +    256                 16      8     +     1
---------------     ---------------     ---------------
      2304       +        16         +        9

This means only three additions, a reduction of 40%, excellent.
Now for the algorithm itself, as per the description in Wikipedia:

One selects a window size $w$ and computes all $2^w$ values of $dP$ for $d = 0,1,2,...2^w - 1$. The algorithm now uses the representation $d = d_0 + 2^wd_1 + 2^{2w}d_2 + ... + 2^{mw}d_m$

In the example of using the scalar 2329 with a window size of 4, this means pre-computing a collection of 16 points, ($0 ... 2^4 - 1$ or $0 ... 15$).
For the scalar value, the algorithm then uses the representation $(9 cdot 2^0) + (1 cdot 2^4) + (9 cdot 2^8)$. Simplifying: $(9 + 16 + 2304)$.  So in the algorithm below, this means that $d1 = 9, d2 = 16, d3 = 2304$, and "m" is 3.
This is the window method of the algorithm as per Wikipedia:
Q ← 0
  for i from m to 0 do
      Q ← point_double_repeat(Q, w)
      if di > 0 then
          Q ← point_add(Q, diP) # using pre-computed value of diP
  return Q

In this case, when it comes to adding the pre-computed point $d_iP$ There are three iterations, in which $d_i$ is either $9P, 16P, or 2304P$.  But our pre-computes are $0,1,2,...2^w-1$.  In order words, there is no "$9P$" in our pre-computes.  There is only $P,2P,4P,8P,16P,32P,64P....$
Obviously I have misunderstood either the algorithm itself or how I am supposed to compute the pre-computes that algorithm uses, either way I'm at a bit of a loss.

Sam Jaques · Answer

Suppose for simplicity we have an 4-bit key $k= (k_0,k_1,k_2,k_3)$, where each $k_i$ is a bit. To compute $kP$, we find
$$k_0P + k_1(2P)+k_2(4P)+k_3(8P)$$
But we could instead write $k=(k_{01},k_{23})$, where we separate it into 2-bit windows. Then we can write
$$ kP = (k_0+2k_1)P + (k_2+2k_3)(4P) = k_{01}P + k_{23}(4P)$$
Then one way to actually compute this is to compute $k_{23}P$, then double it twice to get $4(k_{23}P)=k_{23}(4P)$. Then we add $k_{01}P$ to this.
Extending the pattern, this is the same as double-and-add, except (1) we're doubling multiple times between each add; (2) in the add step, we don't just add $P$ or $0$, but a value in ${0,P,2P,3P}$.
For a window size of $w$, we split the key into $(k_0,dots, k_{n/w})$, where each $k_i$ has $w$ bits. We start with $Q=0$, then repeat for $i=0$ to $i=n/w$:

$Q leftarrow 2^w Q$
$Qleftarrow Q + 2^{k_i}P$

In regular double-and-add for an $n$-bit key, we need $n$ doublings and $n$ additions. With this windowed method, we still need $n$ doublings, but only $n/w$ additions, plus whatever it costs to compute each value of $2^{k_i}P$.
If you just compute $2^{k_i}P$ with regular double-and-add, then there is no point to this windowed method -- it will cost more! But if you pre-compute a table of all the values of ${0,P,dots, (2^w - 1)P}$, then you can look up the value in the table, which (depending on your cost model) is probably much cheaper. But the size of this table is exponential in $w$, so you save a factor of $w$ additions at the expense of an exponential amount of precomputation.
I haven't heard of the sliding window before, but it looks like it's exactly the same except you start your pre-computed table at $2^{w-1}P$, and then you don't bother adding a multiple of $P$ if the leading bit $k_i$ is 0 -- you just double $Q$ until the leading bit is 1, which shifts all the windows by the number of doublings.
Wikipedia says "In effect, there is little reason to use the windowed method over this approach, except that the former can be implemented in constant time" which to me seems like saying "there is little reason to use an elevator to get to the bottom floor of a building, instead of jumping out of the window, except that the former can be done without breaking your legs".
Edit: For wNAF, the main idea is that if you precompute ${0,P,dots, (2^w-1)P}$, then you have ${-(2^w-1)P,dots,-P,0,P,dots,(2^w-1)P}$ almost for free, because you can just flip the sign of the $y$-coordinate (in Weierstrass form at least).
If you're careful about how you represent a number, then you should be able  to do something almost identical to windowed multiplication, except you use these negative multiples of $P$ and thus use fewer additions with the same amount of precomputation.
I am fairly sure (but not certain!) that you can represent a number $k$ by $(k_0,0,k_1,0,dots, 0, k_{n/w})$, where each $k_i$ is in ${-2^{w-1},dots,2^{w-1} -1 }$. That is, by using negative values, you can fit a 0 between each word (hence "non-adjacent form").
What the wNAF algorithm on Wikipedia is doing is similar to the sliding window: Rather than double exactly $w$ times between each addition, it checks whether the remaining value is even, and if it does, it does another doubling before the addition. This ensures it only ever adds odd multiples of $P$, which saves half the precomputation cost.
An important paragraph is:

One property of the NAF is that we are guaranteed that every non-zero element ${displaystyle scriptstyle d_{i}}$ is followed by at least ${displaystyle scriptstyle w,-,1}$ additional zeroes. This is because the algorithm clears out the lower $scriptstyle w$ bits of $scriptstyle d $ with every subtraction of the output of the mods function.

EDIT: It looks like the issue is that $0,1,2,dots, 2^w-1$ is ambiguous notation. The pre-computed values are actually $0,1,2,3,4,dots, 2^w-1$. That is, they are just incremented by 1 instead of doubling each time.
In your example, this means the precomputed values are ${P,2P,3P,4P,5P,6P,7P,8P,9P,10P,11P,12P,13P,14P,15P}$ (since $w=4$). Then the $d_i$ would actually be $d_1=9$, $d2=1$, and $d_3=9$ (that is, exclude the powers of 2), and so $d_iP$ is in the table for all $i$.
When you start the algorithm, $m=3$ and $Q=0$, and you add $d_3P=9P$ to $Q$ to get $Q=9P$. $9P$ should be in your table. Then you move to the next iteration of the loop, and double $Q$ for $w$ iterations. Since $w=4$, this means we get
$$(2^4)Q = (9cdot 2^4)P$$
Then $m=2$, and you add $d_2P = 1P$, to get $Q=(9cdot 2^4 + 1)P$. Then in the next iteration of the loop, double $Q$ another four times to get:
$$(2^4)Q = (2^4)(9cdot 2^4 + 1)P=(9cdot 2^8 + 1cdot 2^4)P$$
Finally, $m=1$, and and so we add $d_1P=9P$ to $Q$ to get $(9cdot 2^8 + 1cdot 2^4 + 9)P = 2329P$.

Simple explanation of sliding-window and wNAF methods of elliptic curve point multiplication

One Answer

Add your own answers!

Ask a Question