Questions tagged [x86]

x86 is an architecture derived from the Intel 8086 CPU. The x86 family includes the 32-bit IA-32 and 64-bit x86-64 architectures, as well as legacy 16-bit architectures. Questions about the latter should be tagged [x86-16] and/or [emu8086]. Use the [x86-64] tag if your question is specific to 64-bit x86-64. For the x86 FPU, use the tag [x87]. For SSE1/2/3/4 / AVX* also use [sse], and any of [avx] / [avx2] / [avx512] that apply

Filter by
Sorted by
Tagged with
2430 votes
10 answers
258k views

Why are elementwise additions much faster in separate loops than in a combined loop?

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } ...
Johannes Gerer's user avatar
1621 votes
11 answers
194k views

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. ...
gexicide's user avatar
  • 38.7k
935 votes
11 answers
178k views

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: ...
rosghub's user avatar
  • 8,954
863 votes
17 answers
866k views

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?
user200557's user avatar
  • 8,819
370 votes
16 answers
204k views

How can I determine if a .NET assembly was built for x86 or x64?

I've got an arbitrary list of .NET assemblies. I need to programmatically check if each DLL was built for x86 (as opposed to x64 or Any CPU). Is this possible?
Judah Gabriel Himango's user avatar
344 votes
4 answers
49k views

Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your ...
Cowmoogun's user avatar
  • 2,507
317 votes
12 answers
288k views

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working: I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally I ...
GabrielChu's user avatar
  • 6,036
309 votes
11 answers
69k views

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 ...
Paul Hollingsworth's user avatar
308 votes
4 answers
126k views

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive ...
user2320609's user avatar
  • 2,079
281 votes
6 answers
253k views

What is exactly the base pointer and stack pointer? To what do they point?

Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(), (Note that this diagram has high addresses at the bottom and low addresses at the top.) Could anyone explain me what ...
devoured elysium's user avatar
279 votes
3 answers
95k views

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to ...
BeeOnRope's user avatar
  • 60.7k
264 votes
5 answers
280k views

How does the ARM architecture differ from x86? [closed]

Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile? What are the key differences between the two?
user1922878's user avatar
  • 2,843
253 votes
3 answers
49k views

How much of ‘What Every Programmer Should Know About Memory’ is still valid?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on ...
Framester's user avatar
  • 33.5k
217 votes
10 answers
233k views

What is the difference between Trap and Interrupt?

What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86?
David's user avatar
  • 3,190
212 votes
5 answers
263k views

The point of test %eax %eax [duplicate]

Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? I'm very very new to assembly language programming, and I'm currently trying to read the assembly language generated from a binary. I'...
pauliwago's user avatar
  • 6,373
190 votes
3 answers
35k views

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(...
orlp's user avatar
  • 113k
190 votes
4 answers
28k views

What happens when a computer program runs?

I know the general theory but I can't fit in the details. I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then ...
gaijinco's user avatar
  • 2,146
184 votes
1 answer
82k views

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
balajimc55's user avatar
  • 2,202
182 votes
12 answers
240k views

What is the difference between MOV and LEA?

I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR]
naveen's user avatar
  • 53.5k
182 votes
3 answers
116k views

How do you use gcc to generate assembly code in Intel syntax?

The gcc -S option will generate assembly code in AT&T syntax, is there a way to generate files in Intel syntax? Or is there a way to convert between the two?
hyperlogic's user avatar
  • 7,555
173 votes
4 answers
46k views

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (...
Nubok's user avatar
  • 3,512
172 votes
5 answers
94k views

Header files for x86 SIMD intrinsics

Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.
fredoverflow's user avatar
172 votes
4 answers
10k views

An expensive jump with GCC 5.4.0

I had a function which looked like this (showing only the important part): double CompareShifted(const std::vector<uint16_t>& l, const std::vector<uint16_t> &curr, int shift, int ...
Jakub Jůza's user avatar
  • 1,113
170 votes
9 answers
170k views

"No such file or directory" error when executing a binary

I was installing a binary Linux application on Ubuntu 9.10 x86_64. The app shipped with an old version of gzip (1.2.4), that was compiled for a much older kernel: $ file gzip gzip: ELF 32-bit LSB ...
Lorin Hochstein's user avatar
160 votes
3 answers
192k views

What does `dword ptr` mean?

Could someone explain what this means? (Intel Syntax, x86, Windows) and dword ptr [ebp-4], 0
小太郎's user avatar
  • 5,520
158 votes
3 answers
51k views

What is the meaning of "non temporal" memory accesses in x86

This is a somewhat low-level question. In x86 assembly there are two SSE instructions: MOVDQA xmmi, m128 and MOVNTDQA xmmi, m128 The IA-32 Software Developer's Manual says that the NT in ...
Nathan Fellman's user avatar
153 votes
6 answers
105k views

What is the purpose of XORing a register with itself? [duplicate]

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0? 012B1002 in al,dx 012B1003 push ...
devoured elysium's user avatar
152 votes
5 answers
262k views

Purpose of ESI & EDI registers?

What is the actual purpose and use of the EDI & ESI registers in assembler? I know they are used for string operations for one thing. Can someone also give an example?
Tony The Lion's user avatar
147 votes
7 answers
43k views

How does this milw0rm heap spraying exploit work?

I usually do not have difficulty to read JavaScript code but for this one I can’t figure out the logic. The code is from an exploit that has been published 4 days ago. You can find it at milw0rm. ...
Patrick Desjardins's user avatar
144 votes
5 answers
436k views

What is the function of the push / pop instructions used on registers in x86 assembly?

When reading about assembler I often come across people writing that they push a certain register of the processor and pop it again later to restore it's previous state. How can you push a register? ...
Ars emble's user avatar
  • 1,459
137 votes
6 answers
17k views

Why does integer overflow on x86 with GCC cause an infinite loop?

The following code goes into an infinite loop on GCC: #include <iostream> using namespace std; int main(){ int i = 0x10000000; int c = 0; do{ c++; i += i; ...
Mysticial's user avatar
  • 465k
135 votes
6 answers
119k views

What is the "FS"/"GS" register intended for?

So I know what the following registers and their uses are supposed to be: CS = Code Segment (used for IP) DS = Data Segment (used for MOV) ES = Destination Segment (used for MOVS, etc.) SS = Stack ...
user541686's user avatar
  • 205k
131 votes
9 answers
178k views

What does "int 0x80" mean in assembly code?

Can someone explain what the following assembly code does? int 0x80
Josh Curren's user avatar
  • 10.2k
131 votes
3 answers
44k views

CPU Privilege Rings: Why rings 1 and 2 aren't used?

A couple of questions regarding the x86 CPU privilege rings: Why aren't rings 1 and 2 used by most operating systems? Is it just to maintain code compatibility with other architectures, or is there a ...
user541686's user avatar
  • 205k
129 votes
8 answers
93k views

`testl` eax against eax?

I am trying to understand some assembly. The assembly as follows, I am interested in the testl line: 000319df 8b4508 movl 0x08(%ebp), %eax 000319e2 8b4004 movl 0x04(%eax), %eax ...
maxpenguin's user avatar
  • 5,039
126 votes
11 answers
133k views

Floating point vs integer calculations on modern hardware

I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole ...
maxpenguin's user avatar
  • 5,039
123 votes
3 answers
197k views

Difference between JE/JNE and JZ/JNZ

In x86 assembly code, are JE and JNE exactly the same as JZ and JNZ?
Daniel Hanrahan's user avatar
122 votes
9 answers
249k views

How to write hello world in assembly under Windows?

I wanted to write something basic in assembly under Windows. I'm using NASM, but I can't get anything working. How do I write and compile a hello world program without the help of C functions on ...
feiroox's user avatar
  • 3,079
122 votes
19 answers
159k views

System.BadImageFormatException: Could not load file or assembly (from installutil.exe)

I am trying to install a Windows service using InstallUtil.exe and am getting the error message System.BadImageFormatException: Could not load file or assembly '{xxx.exe}' or one of its ...
Epaga's user avatar
  • 38.3k
116 votes
11 answers
428k views

How to install ia32-libs in Ubuntu 14.04 LTS (Trusty Tahr)

I installed Ubuntu 14.04 (Trusty Tahr) yesterday. Everything seems OK. But when I tried to compile some C code, I encounter the following error. The error seems to be due to the OS lacking the 32-bit ...
andycoder's user avatar
  • 1,593
113 votes
2 answers
42k views

How does x86 paging work?

This question is meant to fill the vacuum of good free information on the subject. I believe that a good answer will fit into one big SO answer or at least in a few answers. The main goal is to give ...
Ciro Santilli OurBigBook.com's user avatar
112 votes
6 answers
51k views

Why is SSE scalar sqrt(x) slower than rsqrt(x) * x?

I've been profiling some of our core math on an Intel Core Duo, and while looking at various approaches to square root I've noticed something odd: using the SSE scalar operations, it is faster to take ...
Crashworks's user avatar
  • 40.6k
112 votes
10 answers
40k views

Why is x86 ugly? Why is it considered inferior when compared to others? [closed]

I've been reading some SO archives and encountered statements against the x86 architecture. Why do we need different CPU architecture for server & mini/mainframe & mixed-core? says "PC ...
claws's user avatar
  • 52.3k
105 votes
3 answers
130k views

Using gdb to single-step assembly code outside specified executable causes error "cannot find bounds of current function"

I'm outside gdb's target executable and I don't even have a stack that corresponds to that target. I want to single-step anyway, so that I can verify what's going on in my assembly code, because I'm ...
Paul's user avatar
  • 1,798
103 votes
8 answers
104k views

What are IN & OUT instructions in x86 used for?

I've encoutered these to instructions IN & OUT while reading "Understanding Linux Kernel" book. I've looked up reference manual. 5.1.9 I/O Instructions These instructions move data ...
claws's user avatar
  • 52.3k
103 votes
5 answers
59k views

What is the purpose of the EBP frame pointer register?

I'm a beginner in assembly language and have noticed that the x86 code emitted by compilers usually keeps the frame pointer around even in release/optimized mode when it could use the EBP register for ...
dsimcha's user avatar
  • 67.6k
101 votes
7 answers
83k views

How do I disassemble raw 16-bit x86 machine code?

I'd like to disassemble the MBR (first 512 bytes) of a bootable x86 disk that I have. I have copied the MBR to a file using dd if=/dev/my-device of=mbr bs=512 count=1 Any suggestions for a Linux ...
sigjuice's user avatar
  • 28.7k
101 votes
7 answers
53k views

Why does Intel hide internal RISC core in their processors?

Starting with Pentium Pro (P6 microarchitecture), Intel redesigned it's microprocessors and used internal RISC core under the old CISC instructions. Since Pentium Pro all CISC instructions are divided ...
Goofy's user avatar
  • 5,197
100 votes
6 answers
30k views

Enhanced REP MOVSB for memcpy

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy. ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and ...
Z boson's user avatar
  • 32.7k
97 votes
7 answers
44k views

Limitations of Intel Assembly Syntax Compared to AT&T [closed]

To me, Intel syntax is much easier to read. If I go traipsing through assembly forest concentrating only on Intel syntax, will I miss anything? Is there any reason I would want to switch to AT&T (...
oevna's user avatar
  • 1,246

1
2 3 4 5
346