Thanks, but optimisation is not really the point. The total processing time is not just the seconds, minutes, or hours that the processor runs, it is also the time spent coding in the first place. If one is only going to run a program and get the output once, there is no point spending many hours coding to shave off a few seconds of run time.
Also, the optimisations are not going to be much use if they rely on features that are not available in the target processor. These fast sieves do not run division tests - what they do is the same as we would have done at school: draw up a grid and cross off every second number, then every third, then every fifth... To implement my output using this method requires an array of (roughly) 30,000,000 elements, which might be tricky to implement on the Humax.
Besides, the point is to torture the processor - not make life easy for it!