Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, and for Java bytecode, use the tag java-bytecode-asm instead.

Filter by
Sorted by
Tagged with
2298 votes
12 answers

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and ...
xis's user avatar
  • 24.4k
1766 votes
16 answers

Is < faster than <=?

Is if (a < 901) faster than if (a <= 900)? Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated ...
Vinícius's user avatar
  • 15.5k
1621 votes
11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. ...
gexicide's user avatar
  • 38.7k
935 votes
11 answers

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: ...
rosghub's user avatar
  • 8,954
863 votes
17 answers

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?
user200557's user avatar
  • 8,819
697 votes
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE ...
user1059432's user avatar
  • 7,538
527 votes
17 answers

How do you get assembler output from C/C++ source in GCC?

How does one do this? If I want to analyze how something is getting compiled, how would I get the emitted assembly code?
Doug T.'s user avatar
  • 64.4k
502 votes
40 answers

When is assembly faster than C? [closed]

One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. ...
319 votes
7 answers

Why does this code execute more slowly after strength-reducing multiplications to loop-carried additions?

I was reading Agner Fog's optimization manuals, and I came across this example: double data[LEN]; void compute() { const double A = 1.1, B = 2.2, C = 3.3; int i; for(i=0; i<LEN; i++) {...
ttsiodras's user avatar
  • 10.7k
315 votes
11 answers

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able ...
James's user avatar
  • 3,692
314 votes
16 answers

Is it possible to "decompile" a Windows .exe? Or at least view the Assembly?

A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in ...
swilliams's user avatar
  • 48.1k
309 votes
11 answers

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 ...
Paul Hollingsworth's user avatar
308 votes
4 answers

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive ...
user2320609's user avatar
  • 2,079
286 votes
5 answers

Why does Java switch on contiguous ints appear to run faster with added cases?

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying ...
Andrew Bissell's user avatar
282 votes
12 answers

Is 'switch' faster than 'if'?

Is a switch statement actually faster than an if statement? I ran the code below on Visual Studio 2010's x64 C++ compiler with the /Ox flag: #include <stdlib.h> #include <stdio.h> #include ...
user541686's user avatar
  • 205k
281 votes
6 answers

What is exactly the base pointer and stack pointer? To what do they point?

Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(), (Note that this diagram has high addresses at the bottom and low addresses at the top.) Could anyone explain me what ...
devoured elysium's user avatar
279 votes
10 answers

Assembly code vs Machine code vs Object code?

What is the difference between object code, machine code and assembly code? Can you give a visual example of their difference?
mmcdole's user avatar
  • 91.5k
279 votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to ...
BeeOnRope's user avatar
  • 60.7k
268 votes
5 answers

Why does GCC use multiplication by a strange number in implementing integer division?

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include <stdlib.h> #include <stdio.h> int ...
qiubit's user avatar
  • 4,708
233 votes
4 answers

Why would introducing useless MOV instructions speed up a tight loop in x86_64 assembly?

Background: While optimizing some Pascal code with embedded assembly language, I noticed an unnecessary MOV instruction, and removed it. To my surprise, removing the un-necessary instruction caused ...
tangentstorm's user avatar
  • 7,193
228 votes
8 answers

Show current assembly instruction in GDB

I'm doing some assembly-level debugging in GDB. Is there a way to get GDB to show me the current assembly instruction in the same way that it shows the current source line? The default output after ...
JSBձոգչ's user avatar
  • 40.7k
227 votes
25 answers

Protecting executable from reverse engineering?

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been ...
graphitemaster's user avatar
216 votes
32 answers

Why aren't programs written in Assembly more often? [closed]

It seems to be a mainstream opinion that assembly programming takes longer and is more difficult to program in than a higher level language such as C. Therefore it seems to be recommend or assumed ...
212 votes
5 answers

The point of test %eax %eax [duplicate]

Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? I'm very very new to assembly language programming, and I'm currently trying to read the assembly language generated from a binary. I'...
pauliwago's user avatar
  • 6,373
201 votes
21 answers

Is inline assembly language slower than native C++ code?

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code: #define TIMES 100000 void calcuC(...
user957121's user avatar
  • 2,946
196 votes
4 answers

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:
claws's user avatar
  • 52.3k
190 votes
3 answers

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(...
orlp's user avatar
  • 113k
185 votes
13 answers

Can num++ be atomic for 'int num'?

In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here): void f() { int ...
Leo Heinsaar's user avatar
  • 3,907
184 votes
1 answer

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
balajimc55's user avatar
  • 2,202
182 votes
12 answers

What is the difference between MOV and LEA?

I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR]
naveen's user avatar
  • 53.5k
182 votes
3 answers

How do you use gcc to generate assembly code in Intel syntax?

The gcc -S option will generate assembly code in AT&T syntax, is there a way to generate files in Intel syntax? Or is there a way to convert between the two?
hyperlogic's user avatar
  • 7,555
179 votes
1 answer

Why do ARM chips have an instruction with Javascript in the name (FJCVTZS)?

FJCVTZS is "Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero". It is supported in Arm v8.3-A chips and later. Which is odd, because you don't expect to see ...
Tim Smith's user avatar
  • 1,714
173 votes
4 answers

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (...
Nubok's user avatar
  • 3,512
160 votes
3 answers

What does `dword ptr` mean?

Could someone explain what this means? (Intel Syntax, x86, Windows) and dword ptr [ebp-4], 0
小太郎's user avatar
  • 5,520
158 votes
3 answers

What is the meaning of "non temporal" memory accesses in x86

This is a somewhat low-level question. In x86 assembly there are two SSE instructions: MOVDQA xmmi, m128 and MOVNTDQA xmmi, m128 The IA-32 Software Developer's Manual says that the NT in ...
Nathan Fellman's user avatar
157 votes
14 answers

How can I see the assembly code for a C++ program?

How can I see the assembly code for a C++ program? What are the popular tools to do this?
Geek's user avatar
  • 23.1k
154 votes
13 answers

How are everyday machines programmed? [closed]

How are everyday machines (not so much computers and mobile devices as appliances, digital watches, etc) programmed? What kind of code goes into the programming of a Coca-Cola vending machine? How ...
153 votes
6 answers

What is the purpose of XORing a register with itself? [duplicate]

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0? 012B1002 in al,dx 012B1003 push ...
devoured elysium's user avatar
152 votes
5 answers

Purpose of ESI & EDI registers?

What is the actual purpose and use of the EDI & ESI registers in assembler? I know they are used for string operations for one thing. Can someone also give an example?
Tony The Lion's user avatar
147 votes
7 answers

How does this milw0rm heap spraying exploit work?

I usually do not have difficulty to read JavaScript code but for this one I can’t figure out the logic. The code is from an exploit that has been published 4 days ago. You can find it at milw0rm. ...
Patrick Desjardins's user avatar
144 votes
5 answers

What is the function of the push / pop instructions used on registers in x86 assembly?

When reading about assembler I often come across people writing that they push a certain register of the processor and pop it again later to restore it's previous state. How can you push a register? ...
Ars emble's user avatar
  • 1,459
141 votes
2 answers

What is the purpose of the RBP register in x86_64 assembler?

So I'm trying to learn a little bit of assembly, because I need it for Computer Architecture class. I wrote a few programs, like printing the Fibonacci sequence. I recognized that whenever I write a ...
user avatar
140 votes
3 answers

How can one see content of stack with GDB?

I am new to GDB, so I have some questions: How can I look at content of the stack? Example: to see content of register, I type info registers. For the stack, what should it be? How can I see the ...
user avatar
140 votes
11 answers

How to view the assembly behind the code using Visual C++?

I was reading another question pertaining the efficiency of two lines of code, and the OP said that he looked at the assembly behind the code and both lines were identical in assembly. Digression ...
user avatar
137 votes
4 answers

Why does Windows64 use a different calling convention from all other OSes on x86-64?

AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it's own x86-64 calling convention. Why? Does anyone know the ...
JanKanis's user avatar
  • 6,354
136 votes
4 answers

What are CFI directives in Gnu Assembler (GAS) used for?

There seem to be a .CFI directive after every line and also there are wide varieties of these ex.,.cfi_startproc , .cfi_endproc etc.. more here. .file "temp.c" .text .globl main ...
claws's user avatar
  • 52.3k
135 votes
6 answers

What is the "FS"/"GS" register intended for?

So I know what the following registers and their uses are supposed to be: CS = Code Segment (used for IP) DS = Data Segment (used for MOV) ES = Destination Segment (used for MOVS, etc.) SS = Stack ...
user541686's user avatar
  • 205k
133 votes
10 answers

How to disassemble a binary executable in Linux to get the assembly code?

I was told to use a disassembler. Does gcc have anything built in? What is the easiest way to do this?
Syntax_Error's user avatar
  • 5,974
133 votes
3 answers

Possible GCC bug when returning struct from a function

I believe I found a bug in GCC while implementing O'Neill's PCG PRNG. (Initial code on Godbolt's Compiler Explorer) After multiplying oldstate by MULTIPLIER, (result stored in rdi), GCC doesn't add ...
vitorhnn's user avatar
  • 1,043
131 votes
9 answers

What does "int 0x80" mean in assembly code?

Can someone explain what the following assembly code does? int 0x80
Josh Curren's user avatar
  • 10.2k

2 3 4 5