Having recently upgraded to a GeForce GTX 580 I was amazed at the 30x speedup. I have been using a variation on Alex Krizhevsky’s cuda-convnet for a number of Kaggle competitions. The next step for me will be to add another card and use Alex’s updated cuda-convnet2 that supports multiple GPU’s. I like the idea of scaling up rather than scaling out. There is a good paper on this by Adam Coates et al.
There are a number of sites with good information on choosing a GPU for machine learning. The first one I came across at FastML called “Running things on a GPU”. Recently I came across another site with good advice also. The page is “Which GPU(s) to Get for Deep Learning”.
Here is a link to a benchmark of a number of the open source convnets. https://github.com/soumith/convnet-benchmarks
Udacity has a good introduction to Cuda programming called “Intro to Parallel Programming”.