PMSP architecture attempts to exploit both coarse-grain and fine-grain parallelism in a single processor environment. A 22 PMSP (2-thread-slot, 2-instruction/slot/cycle) is capable of achieving four times speedup over a conventional processor. We have proposed an instruction scheduling strategy, developed a performance prediction model, and finished the architecture description. Based on an embedded RISC processor which we designed for a graphics system, we are designing the detailed circuits of 22 PMSP.