MULPS — Packed Single-Precision Floating-Point Multiply
I have started looking at assembly to see how Single-Instruction Multiple-Data (SIMD) works, and ended up with PPC’s Altivec and Intel’s SSE. I was successful to optimized C++ source codes using SIMD techniques.
I will look into SSE4.1 because I am also expecting to receive a new CPU E8400.