Hi Sjors
Thanks for the feedback!

The first step is for the Coordinator to generate a TOKEN, presumably using its own entropy. But IIUC anyone who intercepts that token can decrypt any future step in the setup process. This suggests a chicken-egg problem where you need some pre-existing secure communications channel.

The exchange of the TOKEN is frequently mistaken as the chicken-and-egg problem, but it is not so.

To understand why this isn't chicken-and-egg, and why the TOKEN actually adds value, consider the scale of the communication operation needed to exchange the TOKEN, and the scale of the communication operation needed to gather data for the creation of the multisig wallet (with or without the TOKEN):

1) The TOKEN itself is a single piece of data that is 64- or 96-bit. It is small enough to be easily exchanged (even memorized) and entered into various devices. It requires only a single round of communication, but can protect as many rounds of communication as needed.

2) The data needed to create the multisig wallet, on the other hand, are quite involving:
(a) Each Signer needs to share its XPUB, which cannot be memorized
(b) The XPUBs also come with their own metadata
(c) The creation of the wallet requires at least two rounds of communications since the Signers need to voluntarily share their XPUBs first, only then can a Coordinator combine the XPUBs into a single multisig script and pass back the configuration to the Signers. (Note that without a Coordinator, you'll need O(N^2) rounds of communication). 
(d) Because Signers are typically off-line cold storage, the paths between the Signers / the Signers <> Coordinator likely involve multiple hops through various media, such as unsecure USB connection. This is the way most multisig solutions are currently being implemented. It means the XPUBs and the multisig configuration are vulnerable to leaking and/or modifications.

Note that (d) is especially problematic for remote multisig setups. The more remote, the more potential hops along the way, the more problematic.

So you can see that the TOKEN ultimately reduces the problem of sharing a large amount of sensitive data back and forth, to the sharing of a single, small piece of data upfront. An added advantage of this approach is that if the parties fail to establish a shared TOKEN, the scheme fails with no harm done.

The Coordinator, on the other hand, adds value by solving the O(n^2) communication problem. Some minimal amount of trust is needed for the Coordinator, but this can be greatly mitigated by a number of ways that we have defined in the spec, such as:
* Signers must check that their XPUBs are included in the final descriptor
* Signers must display to the user the multisig configuration: M/N, relative position(s) of XPUBs, etc.
* Signers must display the full descriptor upon user request for manual inspection - this one is important because it means that the new scheme cannot be worse than the status quo.
* Signers are recommended to display a preview of the first receive address(es).

All in all, the Coordinator's role helps ease the setup process, while its ability to pull off any shenanigans is greatly limited.

Best,
Hugo