This page has a decent explanation of how multi-input, multi-output (MIMO) works:
http://www.multicap.be/en/technology/mimo-and-spatial-streams
When a wireless device is described as 3x3, it is usually a simplification of the full form of 3x3:3 where the last number after the colon is the number of spatial streams. The first number describes how many transmit (tx) "chains" are supported while the second number describes the number of receive (rx) "chains". A Wi-Fi device must have at least as many tx
or rx chains as the number of spatial streams it supports. When sending data, it is split into the largest number of spatial streams supported by both sides of the connection. These streams are then transmitted across all of the available tx chains. When receiving data, the transmitting side would split the data into the maximum number of spatial streams supported by both sides and the receiving side would re-combine the spatial streams. In order for devices to recognize a difference between two spatial streams that are being transmitted at the same frequency, the signal from each stream must travel a different distance so that it they arrive at different times which the router will recognize as a difference in phase of the signal. The separate tx/rx chains hooked up to different antennas that are separated from each other help to make sure that this happens but an indoor signal may take many paths around a room before reaching its destination resulting in a phase difference.
If one side of the wireless connection supports more spatial streams than another, this can result in a performance improvement even though the connection is limited to the lower of the two numbers. This is because additional tx/rx chains can improve the signal strength of the connection for a better quality link. This is why 4x4:4 routers tend to outperform 3x3:3 routers in SNB tests even though the client is always 2x2:2.
The multi-user MIMO (MU-MIMO) feature found in Wave 2 802.11ac devices seeks to allow a single wireless router/access point with a large number of spatial streams to divide its attention simultaneously across multiple clients. For example, a 4x4:4 access point could split into two simultaneous 2x2:2 connections with two 2x2:2 clients. A 4x4:4 access point without MU-MIMO would have to stop talking to one 2x2:2 client in order to talk to a second 2x2:2 client.
Smarter people please correct me where necessary!
This article looks like it would give a better explanation than I have:
https://djw.cs.washington.edu/papers/mimo_for_dummies.pdf