For example, the i7-4800mq, i5-4200H, they are very different cpus. Every model would need hundred of engineers to design. Besides, intel must has several hundred sets of production equipment to produce every single model of them. But this would cost a lot. How does intel managed to do this?
What's different between them?
Cherry picking, really. Do 4 cores work after fabrication? If no, then it has to be an i3 or a non-quad mobile. How much of the manufactured cache works? It can only be sold as an i7 if enough of the cache works, but if less than that works it may still be sellable as an i5 or i3. If it runs fast with low voltage, it goes as a laptop processor. If it consumes little power at a higher voltage and can run fast, it goes as a server processor. Otherwise, as long as it stays under consumer TDP requirements, it gets sold as a desktop processor. And which ones they're sold as will depend more finely upon speed/power requirements.
And it turns out that futzing with critical dimensions very slightly in the process (modify etch time for more fin height, slightly move focus on lithography to shorten / lengthen channels, slightly modify deposition times or rates to make gates thicker/thinner, etc.) you can make the same silicon design work better under different constraints -- push leakage current up or down, switching speed up or down, and other parameters. It's all a huge game of tradeoffs, and it turns out even on the same wafer there's going to be quite a bit of variability. For the process I worked under, the center tends to be low power / high speed, the "donut" around it tends to be high power / high speed, and the edge tends to be low power / low speed. But you can play with that by modifying the process. This was something that was a bit of a compromise to maximize yield.
As far as your comment goes: "Intel must have several hundred sets of production equipment to produce every model of them"
That's quite false. When they design a family of products, they force them all to conform to the same process so they go through exactly all the same processing steps -- all the same transistors, all the same thin film layers, everything all the same -- and the only difference is which photomasks to use. For my work on p126x (exact name hidden due to NDA), we ran Cougar Point (6-series, for Sandy Bridge processors) and Panther Point (7-series for Ivy Bridge) chipsets mostly, but we also ran Patsburg (Z-6 series for Sandy Bridge E) and a few other low-volume products.
As long as the process is compatible, many different products can use the same equipment. So they only need a new set of equipment for every process, not for every product.
Why do it that way?
Because it's cheaper, while still being able to offer a variety of products to maximize teh munny making.