OK - I'm not saying what I hear in these tests (I'm not sure how good my hearing is), but see what you make of them:
First test file (because it was relatively simple) is a 20-sec blast of alternating 2-sec tones: one tone is the fundamental (440Hz) third and fifth harmonic contributions that would build a square wave if continued to infinity; the other tone has the same harmonics at the same power, but the third harmonic is phase shifted 180 deg so the waveform looks more like an approximation to a triangle. If you can't honestly tell them apart, then the conclusion is that, at least within the parameters of this test, the human ear-brain combination is not sensitive to relative phases of harmonics.
https://dl.dropbox.com/s/wrvzjyhfwqu8kla/Test1.zip
Second test file: tests 2a, 2b, 2c contained within one zip. Test2a.wav is three bursts of 440Hz tone (as per above) where the left and right channels are in phase and the only variation is in the envelope defining the amplitude of the burst over time. The first burst delays the envelope on the right channel compared with the left by 0.3ms (equivalent to a path difference of about 4 inches), second burst is in step, and the third burst delays the left envelope by 0.3ms.
Test2b.wav uses the same tones, but keeps the envelopes in step and varies the phase of the tones within the envelopes, in the first burst the phase is delayed by 0.3ms in the right channel, then zero delay, then 0.3ms in the left channel.
Test2c.wav keeps the phases and the envelopes in step, and only varies the relative amplitudes by +0.5dB, 0dB, -0.5dB.
https://dl.dropbox.com/s/r1jg8sg5jxguh6t/Test2.zip
If you perceive the stereo image to pan from left to right in any or all of these samples, it demonstrates that only one piece of information is necessary to fool the ear-brain combo into creating the image. Obviously, a real source would have a combination of all three - the envelope and the phase would reach the ears at different times, and the relative amplitudes would be different too.
Only binaural recordings (intended for listening through headphones) can accommodate all three. Stereo recordings for playback through a pair of speakers will have cross-talk between the two ears, and typical recordings made by simply mixing multiple mono mic sources into a left-right sound field will only be using relative amplitude to provide the stereo image.
For completeness, here's the "Black Hole's Noise" I mentioned above - pure sine tones with octave separations are constantly increasing in pitch but with a frequency envelope so that they fade in at the low frequency end and fade out at high frequencies. This results in a tone of infinitely increasing pitch.
https://dl.dropbox.com/s/9x5c6l3a0ccc4t9/Ipcres.zip