
I’m hoping to record a video showing where I’m up to with my trip down memory lane.
Elsewhere there are things that we all miss, yet it takes just one to notice...

I’m hoping to record a video showing where I’m up to with my trip down memory lane.
Okay, so far I’ve figured that there are six registers used for the first 6 parameters used in a function call. The below c program sets up the function which also uses RAX as the return value:
#include
#include
int asmparams(int, int, int, int, int, int);
int main(void) {
int ret = asmparams(1, 2, 3, 4, 5, 6);
printf("return:%d\n", ret);
return EXIT_SUCCESS;
}
This now expects my assembler function to take 6 arguments and a return value. Below is how those first 6 arguments are used in 64 bit Unix function calls to assembler.
section .text
global asmparams:function
extern printf
printnum:
push rbp ; stack frame? x64
push rsi
push rdx
push rcx
push r8
push r9
mov rsi,rax
mov rdi,pf_msg
xor rax,rax
call printf
pop r9
pop r8
pop rcx
pop rdx
pop rsi
pop rbp ; stack frame
ret
asmparams:
mov rax, rdi
call printnum
mov rax, rsi
call printnum
mov rax, rdx
call printnum
mov rax, rcx
call printnum
mov rax, r8
call printnum
mov rax, r9
call printnum
mov rax, -1 ; return value
ret
section .data
pf_msg db "%08x",10,0
When I’ve got some time to explore how the ‘Prologue’ and ‘Epilogue’, or beginning and ending of a function, I post about that. At least now parameters can be passed to assembler functions.
Things have changed a lot since the 680×0 days. Then you could use a single push/pop instruction to push/pop multiple registers.
Tonight I set myself a mission of getting any IDE to build assembler files alongside my C/C++ code.
Find it here: Eclipse Oxygen and NASM
I tried NetBeans at first but the IDE isn’t good at all, actually quite useless for setting up NASM. I spent about half an hour trying this out and eventually gave up. Apparently there are plugins out there, but I wasn’t going to try them out as they were not part of the NetBeans official plugins.
So I look into Eclipse and initially I was put off because there was nothing in the official plugins. A few searches later and I got it. It was already built in to the CDT plugin for eclipse. I only had to make a minor alteration to get it working.
Now I can have a C/C++ project that will also automatically compile and link my assembler source files.
Here’s me thinking I was going to be stuck over the weekend finding this out. Took an hour. What’s next to move on to? Oh yeah, my project. he he…
I started off wanting to know the CPU cycles and possible cache misses from SSE SIMD instructions, but was kind of mind blown at what SIMD can actually do. There’s also a hell of a lot that the best compilers cannot do with C or C++ code with SIMD stuff. An example is getting the sign bits from each value in a SIMD register in EAX, which is damn handy for some math.
The SSE SIMD stuff is only the base line because there’s now SSE4 I believe. (just checked and it is. behind times I am) Being able to multiple math operations on a single register has tickled my interest for a long time. Tonight I decided to put some proper research into it. Until I got sidetracked with ARM NEON.
Anyway, besides the point and before I go onto the ARM stuff. Over the next few nights at least, I’m going to be testing out more assembler programming. This time using SIMD instructions and possibly being able to use maybe a noise algorithm. Later on in time, not over the next few nights, I will look at using this experience for 3d matrix calcs.
But… Then I looked deeper into the ARM NEON…
What I found with the NEON is that it is kind of like a hyper-threaded architecture. The cpu will run 2 NEON instructions per cycle but during CPU down things (stalls, waits, etc). I need to get deeper to undertsand that but it does sound very much like the way the hyper-threading works on the intel core processors. Still good.
Another thing I did like about the ARM assembler language is it just just so awesome. In standard assembler, you load registers, multiply one register by 2/4/8, add them and the last instruction grabs the result. In ARM, you load the basic registers, and in one instruction you can offsets and bit manipulate to get the address and store the result. Crazy.
I’ll come back to ARM stuff later one. For now, I’ll be focusing on x86_64 stuff and all the SIMD stuff. Over the next few days I’ll run some test and hopefully post some test code. That is if I get something running.
So, I had a perfectly working PRNG in Java like this:
public class WLPRNG {
long seed;
public WLPRNG(long seed) { this.seed = seed; }
public int nextInt() {
long result = seed + 0x123defca;
result = Long.rotateLeft(result, 19);
result += 0xbead6789;
result *= 0x1234567c;
int temp = (int)result;
result ^= 0x5ecdab73;
result = Long.rotateLeft(result, 48);
if (temp % 4 == 0) result *= 0x87650027;
result += 13;
seed = result;
return (int)result;
}
public byte nextByte() {
return (byte)nextInt();
}
}
And I thought I’d test out the Assembler version like this:
; random number generator to be used in crypto transmission
; of sensitive data over the internet
; WLGfx 2017-Nov-19
section .text
global main
extern printf
srand: mov [seed],rax ; set random seed
ret
arand: mov rax,[seed] ; get seed
mov rbx,qword 0x023defca321acfed
add rax,rbx ; add 64 bit value
rol rax,19 ; rotate bits
mov rbx,qword 0xbead6789
add rax,rbx ; another add
imul rax,qword 0x1234567c ; a multiple this time
mov rbx,rax ; copy into rbx
xor rax,qword 0x5ecdab73 ; flip some bits
rol rax,48 ; rotate bits again
mov rcx,rax ; copy to rcx
and rax,0x3 ; mask and test with 0
jnz .notz ; 25% chance of other ops
mov rax,rbx
add rcx,rax
mov rbx,qword 0x87650027
imul rax,rbx
jmp .cont
.notz mov rax,rcx ; back into rax
.cont mov [seed],rax ; store into seed
and rax,0xff ; return byte value only
ret
main: mov rax,9 ; set seed
call srand
mov dword[lc],10 ; set loop counter
.loop call arand ; get random byte
push rbp ; stack frame
mov rsi,rax ; random number
mov rdi,pf_msg ; format string
xor rax,rax ; 0
call printf ; call printf
pop rbp ; stack frame
sub dword[lc],1 ; dec loop counter
jnz .loop
ret
section .data
seed dq 0,0,0,0 ; random seed value 64 bit
lc dd 10 ; loop counter
pf_msg db "Number: 0x%02x",10,0
Using the build script:
#!/bin/bash nasm -f elf64 random.asm gcc -o random random.o
Gives a sample output of:
~/dev/asm/tests $ ./buildrand.sh ~/dev/asm/tests $ ./random Number: 0xe0 Number: 0x5b Number: 0xca Number: 0x7c Number: 0xfc Number: 0x2d Number: 0x79 Number: 0xa5 Number: 0x62 Number: 0x7f
All I need to do now is to be able to link directly to C and C++ code. I’m currently reading up on threading in assembler, but it looks like the standard pthreads are just the same really.
There’s lot’s of potential for using assembler.
So, I wanted to start playing about with assembler again. Mainly so I could use it for data encryption over the internet. Here’s a simple sample of printing 64 bit numbers as hex.
section .text
global main
extern printf
; use printf to print 64 bit hex string
_test: push rbp
mov rsi,0x1234567890abcdef
mov rdi,pf_msg
xor rax,rax
call printf
pop rbp
ret
main: call _test
mov edx,len
mov ecx,msg
mov eax,4
int 128
;mov eax,1
;int 128
xor rax,rax
ret
section .data
msg db "Hello world!",10
len equ $ - msg
; some testing stuff
pf_msg db "Register = %016llx", 10, 0
I set up a simple script to build the executable.
#!/bin/bash nasm -f elf64 test.asm gcc -o test test.o
And the output is just…
Register = 1234567890abcdef Hello world!