The ERC20 Short Address Attack Explained

GOLEM's tech team published a nice find today -- a vulnerability for any service that takes user-generated withdrawal addresses and passes them in unaudited to a transfer function.

From their post at https://blog.golemproject.net/how-to-find-10m-by-just-reading-blockchain-6ae9d39fcd95:

The service preparing the data for token transfers assumed that users will input 20-byte long addresses, but the length of the addresses was not actually checked. 

edit: I mistook bytes and nibbles, thanks to Noel Marsk for the careful reading!

Short(en)ing An Address

The server taking in user data allowed an Ethereum address that was less than 20 bytes: usually an Ethereum address looks like 0x1234567890123456789012345678901234567800.

What if you leave off those trailing two zeros (one hex byte equal to 0)? The Ethereum VM will supply it, but at the end of the packaged function arguments, and that's a very bad place to add extra zeros.

The transfer function arguments get shifted over one byte from the point that a short argument was given, and this shifts over the number of tokens transferred.

How to Construct This Attack

Let's say I have 1,000 tokens and would like 256,000 -- what do I do?

  1. Generate an Ethereum address with a trailing 0. Ethereum addresses are generated pretty much randomly, so on average this will take 256 tries -- almost no time at all.
  2. Find an exchange wallet with 256,000 tokens.
  3. Send 1,000 tokens to this exchange wallet, crediting my account internally (off chain) with 1,000.
  4. Request a withdrawal of 1,000 tokens using my generated address. Critically I will leave off my last "0" byte.

What happens then? Read the Golem blog for a clear explanation, but in brief, if the server does not validate the address, it will "pack" everything together and move the amount, the final argument, over one byte, yielding a 67 byte argument to the transfer function when 68 is what's needed.

All these arguments are passed around under the hood in the msg.data portion of a call. msg.data has three components -- the function signature -- a hash of the name of the function, then the two arguments, address and amount. In ERC20, amount is a uint256, so it has lots of leading zeros.

What happens in this attack is that one byte of leading zeros is taken from the amount, and given to the shortened address. This leaves us with the same address as we started with, so tokens sent here will be transferable.

When the parser is getting to the end of its bytes, it has an underflow -- there aren't enough bytes left to make a uint256 -- so it just adds zeros to the end and calls it a day. This means you've multiplied your amount by 1<<8 or 256, and crucially after the exchange has checked your balance on their internal ledger.

You could even probably get some plausible deniability if you needed -- "Oops, I just copied it over and missed the 0, sorry!"

Who Is To Blame?

This is likely to end up with some finger pointing. The Ethereum VM will return 0s for any data that is requested by CALLDATALOAD and doesn't yet exist. This is sane on the one hand. On the other hand, arguably it should complain. However, I would bet a few GOLEM tokens that this would break some functionality elsewhere in the Ethereum blockchain.

This bug is compounded by Ethereum's decision to not add an in-band checksum on addresses. Bitcoin, for instance, has a checksum at the end of every address so any client can validate that the address is correct. This was a good idea in 2009, and it's a good idea now. In this particular case, the address stays valid at each step, but it is not valid as offered. There is no way to check for this validity right now.

Ethereum does have a checksum standard for addresses, but it's only on the text version of addresses -- and it is not universally used. For instance, as of this writing BitFinex does not capitalize addresses in their user interface when showing deposit addresses. And, once the address is cast into a number or address type in your code, you won't be able to check.

Either way, there are a few things that can be done.

Fixes

From hard to easy:

  1. Ethereum could create a way to check address validity in-band, or just using the address itself. There is no simple way to do this right now that I can see.
  2. Ethereum could complain about data underflows more vigorously up to and including just throwing if it encounters one.
  3. Any code you write that deals with user input could check that a full 20 bytes has been offered for an address, and refuse to work otherwise.
  4. Your transfer function could check that len(msg.data) is the right size (68 bytes). This nice idea was suggested by redditor izqui9. He has sample code that could decorate any function written up here.

New Alchemy

I'm the Managing Director of New Alchemy, a company dedicated to the token ecosystem. We provide full stack services for companies wishing to issue tokens, from ICO support to technology and strategy to security auditing. Say hello! We'd love to talk about your project: hello@newalchemy.io.

Peter Vessenes

Read more posts by this author.

Subscribe to Peter Vessenes

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!