• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Intel CPUs have design and security flaw

William Parcher

Show me the monkey!
Joined
Jul 26, 2005
Messages
27,482
'Kernel memory leaking' Intel processor design flaw forces Linux, Windows redesign

Other OSes will need an update, performance hits loom


The Register said:
A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug.

Programmers are scrambling to overhaul the open-source Linux kernel's virtual memory system. Meanwhile, Microsoft is expected to publicly introduce the necessary changes to its Windows operating system in an upcoming Patch Tuesday: these changes were seeded to beta testers running fast-ring Windows Insider builds in November and December.

Crucially, these updates to both Linux and Windows will incur a performance hit on Intel products. The effects are still being benchmarked, however we're looking at a ballpark figure of five to 30 per cent slow down, depending on the task and the processor model. More recent Intel chips have features – such as PCID – to reduce the performance hit. Your mileage may vary.

Similar operating systems, such as Apple's 64-bit macOS, will also need to be updated – the flaw is in the Intel x86-64 hardware, and it appears a microcode update can't address it. It has to be fixed in software at the OS level, or go buy a new processor without the design blunder...


https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw
 
If you own shares in Intel... SELL SELL SELL

That isn't even a joke, this is very serious, and ultimately unresolvable (the hardware microcode cannot be patched). This may not seriously affect gamers playing GPU-bound games, but as a business software developer, working with many CPU-bound processes, I am dreading seeing what the Windows patch does to our software performance.

I'll wait to see the benchmarks of others and my own, but this could significantly hurt Intel's business. It won't be clueless home PC owners that get pissed off, but businesses like mine.
 
Reading on, this sounds like a joke to me

"The fix is to separate the kernel's memory completely from user processes using what's called Kernel Page Table Isolation, or KPTI. At one point, Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka ****WIT, was mulled by the Linux kernel team, giving you an idea of how annoying this has been for the developers."
 
Reading on, this sounds like a joke to me

"The fix is to separate the kernel's memory completely from user processes using what's called Kernel Page Table Isolation, or KPTI. At one point, Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka ****WIT, was mulled by the Linux kernel team, giving you an idea of how annoying this has been for the developers."

Linux developers aren't always the most professional. The issue isn't a joke, the kernel developers simply made a joke acronym.

edit:
Here are some initial benchmarks of the Linux kernel changes: https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti&num=2

On the one hand, I was clearly wrong about there being much of an impact on CPU-bound processes which is good. On the other hand there is clearly a large impact on I/O and Memory bound processes. Now that I understand the underlying issue better this makes sense. It is system calls to the kernel that will take longer after these patches, so basically any process that makes lots of system calls will experience the worst performance degradation.
 
Last edited:
So I thought I would add some information about why these changes to Windows and Linux will cause a large performance hit.

TLDR: The fix involves clearing a hardware cache on the CPU whenever the OS kernel is invoked, and CPU caches are much faster than accessing your RAM, so each time the OS kernel is invoked, it is going to take significantly more time that prior to these patches.

To explain this first I need to explain virtual memory. Basically, virtual memory is a way of letting every running application in a system pretend it has the entire addressable memory space to itself. So say Program A has data stored in physical memory addresses 1113,1114, and 1115, while Program B has data stored in physical memory addresses 4322,4323, and 4324. With virtual memory each program might "see" those addresses as address 10,11, and 12. Thus there is no need for them to worry about where stuff is actually located in physical memory, they only need to know where things are located in their virtual address space. On the CPU there is a module called the Translation Lookaside Buffer. It's job is to act as a cache of mappings between virtual addresses and real physical addresses. The operating system maintains it's own map of virtual addresses to physical addresses, but the TLB is present so that an expensive memory lookup doesn't need to take place every time the CPU needs to check what the actual physical address represented by a virtual address is.

Next, to oversimplify, you need to understand that the CPU can be in one of two modes: user mode or kernel mode. User mode is what your applications run in, while things like the OS kernel and your device drivers run in kernel mode. While in user mode certain CPU functions and memory addresses cannot be accessed. In order access the functionality of the kernel, the application makes system calls, special functions exposed by the kernel that switch the CPU into kernel mode, perform kernel operations like sending data across the network, accessing the disk, etc..., then switching back to user mode and returning to the user process.

Now, what happens in Windows and Linux is that the kernel memory space is mapped into every user processes virtual memory. Because of the kernel mode/user mode access restrictions, the user mode process can't read the portions of virtual memory occupied by the kernel, but they are there. The main benefit of this is that whenever a user->kernel or kernel->user switch occurs, the virtual memory pages don't need to be swapped out, and even more importantly the kernel page mappings are usually present in the CPU's Translation Lookaside Buffer. CPUs are fast, main memory access is comparatively quite slow, so you want these CPU caches like the TLB to be used as much as possible.

Now to the Intel CPU issue. I don't know all the details but it seems to be that one of the pieces of functionality of the CPU, the speculative execution feature. Your CPU doesn't execute instruction A, then when it is done execute instruction B etc... Processing of each instruction takes time and thus your CPU pipelines the instructions. So if each instruction has say 5 phases before it is done executing, instruction A finishes phase 1 and starts phase 2, instruction B starts phase 2, instruction A finishes phase 2 and starts phase 3 and instruction B finishes phase 1 and starts phase 2 while instruction C enters phase 1...etc....etc.... This is a pipelined architecture where the processing of instructions works a lot like an industrial assembly line. Now imagine instruction B was a conditional branch instruction. The next instruction that should be processed after B depends on the final result of B, so how do you pipeline it? Well the CPU tries to predict which way the branch will go, and just starts speculatively executing the instructions down that execution path. If it turns out the branch went the other way it has to discard all that work and start over, but if it guessed right, it already has a bunch of instructions well on the way to completion.

So what is the issue? It seems to be that the Intel CPUs don't enforce the ring 0 kernel mode security features when accessing the Translation Lookaside Buffer during a speculative execution. So if your user mode code tries to access kernel virtual address 10 while being executed speculatively, rather than the CPU throwing up a fault, the TLB will give you the physical memory address associated with it. This lets exploit writers effectively map where stuff is located in the kernel. I'm not sure if this bug allows them to actually access the kernel memory, or if it is just that the TLB will resolve the physical address.

What is the solution, and why does it suck? Like I said before, caches like the TLB are important because memory access is slow compared to the CPU. The solution to the problem is to force a full context switch whenever a kernel system call is made. You remove the kernel from the virtual address space of the user process, and every time a system call happens you have to flush the TLB, and load a new page table for the kernel. These are comparatively slow memory access processes that significantly impact the time taken by each system call. Any process that makes many system calls will be very adversely affected by this. Even overall system performance will be hampered by this however, since hardware interrupts (such as say receiving a network packet) will also force the context switch to the kernel. Hell, pressing a key on your keyboard causes a hardware interrupt, although I don't think any human could ever type fast enough to actually see a measurable difference in performance due to keyboard interrupts.
 
So I thought I would add some information about why these changes to Windows and Linux will cause a large performance hit.
<snip great stuff>

Wanted to thank you for this and your other posts on this topic.

I was getting a bit depressed since I just spent >$2000.00 on a new home office workstation 11 months ago. Based on only the Register article I was beginning to think my choice of an i7-6700 over an AMD CPU (which I normally would have chosen but lagged in performance vs. the i7) had turned against me. Seeing the benchmark for FFmpeg in the article you linked I'm comfortable that I will likely not feel any actual performance decrease because that benchmark most closely corresponds to my most intensive commuting tasks.
 
Small sample size and all, but I have 5 computers running Windows 10 on the insider preview fast ring, which have all been patched to fix this. None of them have experienced anything close to a 30% reduction. In fact, I haven't noticed any reduction at all, so I ran some benchmarks on the 2 I have access to right now. On one, no reduction at all, on the other, a 1% reduction from October.

Also, according to the article, "macOS has been patched to counter the chip design blunder since version 10.3.2, according to operating system kernel expert Alex Ionescu". While I'm sure some systems will experience some sort of degradation, I imagine most users experience no noticeable change. Windows users should find out for sure this patch Tuesday.
 
Small sample size and all, but I have 5 computers running Windows 10 on the insider preview fast ring, which have all been patched to fix this. None of them have experienced anything close to a 30% reduction. In fact, I haven't noticed any reduction at all, so I ran some benchmarks on the 2 I have access to right now. On one, no reduction at all, on the other, a 1% reduction from October.

Also, according to the article, "macOS has been patched to counter the chip design blunder since version 10.3.2, according to operating system kernel expert Alex Ionescu". While I'm sure some systems will experience some sort of degradation, I imagine most users experience no noticeable change. Windows users should find out for sure this patch Tuesday.

Could I ask which benchmarks you ran? I myself have been collecting a series of pre-patch benchmarks to check against a few of my systems post-patch. My main concern is really the numbers I've been seeing from Linux database benchmarks.
 
No problem! I use Geekbench and SANDRA. Well, to be completely honest, I also use Holomark 2, but that's not really related to this topic. Anyway, I hope I'm not sounding like I'm trying to pretend I KNOW what the result of this will be. Just that in my experience, nothing happened. When Fast Ringers were patched a month or so ago, there was no outcry of a performance drop, and when MacOS 10.3.2 was released, there was no massive outcry. This could of course be an incorrect conclusion to draw, but it seems logical to me...

Oh, I don't really use any Linux databases, so I know less than squat about how those will be affected!
 
No problem! I use Geekbench and SANDRA. Well, to be completely honest, I also use Holomark 2, but that's not really related to this topic. Anyway, I hope I'm not sounding like I'm trying to pretend I KNOW what the result of this will be. Just that in my experience, nothing happened. When Fast Ringers were patched a month or so ago, there was no outcry of a performance drop, and when MacOS 10.3.2 was released, there was no massive outcry. This could of course be an incorrect conclusion to draw, but it seems logical to me...

Oh, I don't really use any Linux databases, so I know less than squat about how those will be affected!

Thanks, the best benchmarks for me will of course be performance metrics on my stress test app servers, running my actual applications. I agree that this will not be a big deal for most consumers, but I still worry it could be a big deal for business applications like mine which ultimately, through layer after layer of APIs, make extensive use of system calls to the kernel. Most commonly used consumer applications, including games, are not particularly system call intensive.
 
Also, according to the article, "macOS has been patched to counter the chip design blunder since version 10.3.2, according to operating system kernel expert Alex Ionescu". While I'm sure some systems will experience some sort of degradation, I imagine most users experience no noticeable change.
It's reported that there is more Apple patching coming.

AppleInsider said:
After a public disclosure of a security flaw with nearly every Intel processor produced for the last 15 years, concern grew that a fix may take up to 30 percent of the processing power away from a system. But Apple appears to have at least partially fixed the problem with December's macOS 10.13.2 —and more fixes appear to be coming in 10.13.3. ......


http://iphone.appleinsider.com/arti...fix-in-macos-for-kpti-intel-cpu-security-flaw
 
Hmmm.... I think I might actually have taken a performance hit after I updated my system after quite a long time.

I'm currently running Linux Mint Cinnamon 64-bit (kernel is 4.4.0-104-generic whatever that means) with a fairly recent upgrade of an Intel i7-7700K (4.2ghz x 4 cores) and 32 gigs of DDR3 RAM; I have nearly 8 terabytes of storage space, spread across five physical hard drives (one of which, my OS drive is an SSD, the rest being merely storage) and I have two monitors running off of an NVidia GForce GTX 1050 Ti 5 gig DDR5 RAM video card.

With the new mobo and new case as well as several replacement drives, set me back close to two grand all told. Funny to me is that I'm not particularly a heavy gamer nor do I do much other than surf the net, watch movies, download porn interesting and engaging materials.

I just love to have to never shut my machine down for any reason and so I routinely have ten or so applications running simultaneously.

Anywhooo.... I did the upgrade to my software packages as well as the !-marked kernel upgrades and then after that, I seemed to suffer some serious slow downs.
 
Last edited:
Hmmm.... I think I might actually have taken a performance hit after I updated my system after quite a long time.

I'm currently running Linux Mint Cinnamon 64-bit (kernel is 4.4.0-104-generic whatever that means) with a fairly recent upgrade of an Intel i7-7700K (4.2ghz x 4 cores) and 32 gigs of DDR3 RAM; I have nearly 8 terabytes of storage space, spread across five physical hard drives (one of which, my OS drive is an SSD, the rest being merely storage) and I have two monitors running off of an NVidia GForce GTX 1050 Ti 5 gig DDR5 RAM video card.

With the new mobo and new case as well as several replacement drives, set me back close to two grand all told. Funny to me is that I'm not particularly a heavy gamer nor do I do much other than surf the net, watch movies, download porn interesting and engaging materials.

I just love to have to never shut my machine down for any reason and so I routinely have ten or so applications running simultaneously.

Anywhooo.... I did the upgrade to my software packages as well as the !-marked kernel upgrades and then after that, I seemed to suffer some serious slow downs.

It could just be happenstance. Given the workload you describe, along with how beefy your system is, it is unlikely you would experience very noticeable effects. But you can test it, I believe there are flags you can set to turn the new patch off.

Another thing to keep in mind is that you have certainly received other changes, both kernel and non-kernel that could affect your performance. A recent OS update absolutely killed my Android phone, not because of any kernel change, but because my cell provider decided my OS update should include a ton of absolute junk being installed and running in the background.
 
It could just be happenstance. Given the workload you describe, along with how beefy your system is, it is unlikely you would experience very noticeable effects. But you can test it, I believe there are flags you can set to turn the new patch off.

Another thing to keep in mind is that you have certainly received other changes, both kernel and non-kernel that could affect your performance. A recent OS update absolutely killed my Android phone, not because of any kernel change, but because my cell provider decided my OS update should include a ton of absolute junk being installed and running in the background.
"Beefy"? lol I love that. Yeah, my box is pretty much overkill for anything rational. I'm, like, in the stratosphere maaaannnn....

But yes, you could be right on that. At first my mouse started running reeeeeaaaaalllllyyyyy ssssslllloooowwwwwlllllyyyyyy.... I'd be mousing my ass off and the cursor would just... take its sweet little time and mosey on over to where I was feverishly trying to move it. It could easily take 30 seconds to go from one monitor to the other.

Then I discovered that was happening because my mouse's light sensor was pretty dirty. So... yeah. Felt kinda dumb there.

But it's in my music player program Clementine that seems to be hitching a lot more than it used to, even when I have rebooted and not run my usual load of applications.

But thank you, I will look more into maybe setting some flags and then run my own bench tests just to see.
 
Excuse my ignorance, but am I right in assuming that this flaw only applies to Intel CPUs for 64 bit systems and that 32 bit systems are completely unaffected?

If so, the only 64 bit system I have is my HP laptop which I suspect uses an AMD CPU not an Intel one.
 
Last edited:

Back
Top Bottom