assembler

TCP C++ server upgrades

A quick snippet from tonight’s diary entry:

20:30 – I’m still reading up on some stuff about non-blocking TCP server programming and so far the best examples come from http://developerweb.net/viewtopic.php?id=2933. Ideally what I want is NOT to have a shed load of threads running, which I’ll be using a thread manager for, but a way of quickly determining if a connection is valid before shutting it down.

By the looks of the examples,  a continuous loop could be checking upto 1024 connections at any time before sending them off to another thread. This method has caught my attention big time because I should be able to setup some kind of timeout very quickly to close the connection if it doesn’t respond to the initial connection request. And I can still serve 100 to 1000 devices without any problems at all.

The same single thread can be constantly checking for connections and readability. It’s the readability part that matters the most here because it will be timed very precisely. Once a connection passes the readability test by passing the initial CSPRNG test, then it can go to a thread handler. Otherwise it gets closed. All of this a single core/thread can easily handle.

I will be studying this a lot further as this looks to be the way forward for a tcp server.

And this is just from my crappy laptop at home.

Basically, the new software upgrades to the server will be 1000 times more efficient than the current Java version. Although once in the cloud, the Java version can easily serve over 500 devices because of the bandwidth increase.

The updated C++ version will increase that significantly as well as handle dodgy connections better. Once a connection has passed the CSPRNG test then the TIMEOUT increases a smidgen.

I’ve already got the CSPRNG functions running and tested and written in assembler code to reduce that bottleneck too.

Until next time… WLGfx…

x86_64 assembler first steps function parameters

Okay, so far I’ve figured that there are six registers used for the first 6 parameters used in a function call. The below c program sets up the function which also uses RAX as the return value:

#include 
#include 

int asmparams(int, int, int, int, int, int);

int main(void) {
	int ret = asmparams(1, 2, 3, 4, 5, 6);
	printf("return:%d\n", ret);

	return EXIT_SUCCESS;
}

This now expects my assembler function to take 6 arguments and a return value. Below is how those first 6 arguments are used in 64 bit Unix function calls to assembler.

		section	.text

global	asmparams:function

extern	printf

printnum:
		push    rbp                             ; stack frame? x64

		push	rsi
		push	rdx
		push	rcx
		push	r8
		push	r9

        mov     rsi,rax
        mov     rdi,pf_msg 
        xor     rax,rax     
        call    printf  

        pop		r9
        pop		r8
        pop		rcx
        pop		rdx
        pop		rsi

        pop     rbp                             ; stack frame

        ret

asmparams:
		mov		rax, rdi
		call	printnum

		mov		rax, rsi
		call	printnum

		mov		rax, rdx
		call	printnum

		mov		rax, rcx
		call 	printnum

		mov		rax, r8
		call	printnum

		mov		rax, r9
		call	printnum

		mov		rax, -1		; return value
		ret

		section	.data

pf_msg	db		"%08x",10,0

When I’ve got some time to explore how the ‘Prologue’ and ‘Epilogue’, or beginning and ending of a function, I post about that. At least now parameters can be passed to assembler functions.

Things have changed a lot since the 680×0 days. Then you could use a single push/pop instruction to push/pop multiple registers.

ARM assembler 32 bit with 64 bit variables

Although I won’t be using ARM CPU’s for my main project I started to think about my x86_64 random number generator I wrote in a previous blog post and also included in a page for adding assembler code to Eclipse Neon.

There is now 32 bit and 64 bit ARM CPU’s out there in the mobile world, but mostly they consist of 32 bit ARM’s. What I really am better off doing in that case is to use a C function and grab the disassembly  and work from that.

Other uses for assembler on mobile devices can easily be upgraded using assembler code, such as image manipulation. Faster math routines, such as calculating 3D meshes and vertex and face normals. I won’t be studying much of ARM assembler because I’m working with portable x86_64 boards.

And, when I was doing 24/7 assembler on the Atari ST and the Amiga 1200, there was 0.01% need for 64 bit values then.

The only reason at the moment for needing 64 bit variables is for the random number generation so that cryptic code can be written. I’m not using ARM CPU’s so carry on with x86_64.

I was hoping to try out some other assembler stuff tonight, especially using Qt because I can access bitmap data directly. I’ll jump onto that tomorrow night instead. I’ve got the assembler bug again.

Eclipse Oxygen and NASM

Tonight I set myself a mission of getting any IDE to build assembler files alongside my C/C++ code.

Find it here: Eclipse Oxygen and NASM

I tried NetBeans at first but the IDE isn’t good at all, actually quite useless for setting up NASM. I spent about half an hour trying this out and eventually gave up. Apparently there are plugins out there, but I wasn’t going to try them out as they were not part of the NetBeans official plugins.

So I look into Eclipse and initially I was put off because there was nothing in the official plugins. A few searches later and I got it. It was already built in to the CDT plugin for eclipse. I only had to make a minor alteration to get it working.

Now I can have a C/C++ project that will also automatically compile and link my assembler source files.

Here’s me thinking I was going to be stuck over the weekend finding this out. Took an hour. What’s next to move on to? Oh yeah, my project. he he…

SSE SIMD and ARM NEON research

I started off wanting to know the CPU cycles and possible cache misses from SSE SIMD instructions, but was kind of mind blown at what SIMD can actually do. There’s also a hell of a lot that the best compilers cannot do with C or C++ code with SIMD stuff. An example is getting the sign bits from each value in a SIMD register in EAX, which is damn handy for some math.

The SSE SIMD stuff is only the base line because there’s now SSE4 I believe. (just checked and it is. behind times I am) Being able to multiple math operations on a single register has tickled my interest for a long time. Tonight I decided to put some proper research into it. Until I got sidetracked with ARM NEON.

Anyway, besides the point and before I go onto the ARM stuff. Over the next few nights at least, I’m going to be testing out more assembler programming. This time using SIMD instructions and possibly being able to use maybe a noise algorithm. Later on in time, not over the next few nights, I will look at using this experience for 3d matrix calcs.

But… Then I looked deeper into the ARM NEON…

What I found with the NEON is that it is kind of like a hyper-threaded architecture. The cpu will run 2 NEON instructions per cycle but during CPU down things (stalls, waits, etc). I need to get deeper to undertsand that but it does sound very much like the way the hyper-threading works on the intel core processors. Still good.

Another thing I did like about the ARM assembler language is it just just so awesome. In standard assembler, you load registers, multiply one register by 2/4/8, add them and the last instruction grabs the result. In ARM, you load the basic registers,  and in one instruction you can offsets and bit manipulate to get the address and store the result. Crazy.

I’ll come back to ARM stuff later one. For now, I’ll be focusing on x86_64 stuff and all the SIMD stuff. Over the next few days I’ll run some test and hopefully post some test code. That is if I get something running.