Wednesday, 16 April 2014

The Final Countdown

So it's Wednesday the 16th of April. And for some reason we're still with snow. Quite an odd day, but one in which I will be posting what is likely my last mark-able blog post for SPO600. As much as I'd like to say it's going to be built, that is unlikely.

I've updated my previous rough draft code a bit, once I found my mistake with the random register values. For example, the atomic add now looks like:

           static __inline__ void atomic_add(int i, atomic_t *v)
           {
                     __asm__ __volatile__(
                                   SMP_LOCK “add  %1, %0, %0“
                                   :“=m” (v->counter)
                                   :"ir” (i), “m” (v->counter))    ; - this needs fixing/porting to proper aarch64
           }
    
No need to provide actual registers since their own code is just using whichever ones are available to them.

I've also manager to complete the port of subtract, subtract and test, increment, increment and test, decrement and decrement and test, which I've maybe inconveniently added below. It wasn't too much to change really. I just hope "=m" and "=qm" are still valid in aarch64 assembler, as I was not able to find replacements for them. The c-style comments are for blog readability and not actually in the code.

//subtract
static __inline__ void atomic_sub(int i, atomic_t *v)
{
    __asm__ __volatile__(
            SMP_LOCK “sub %2, %1“
            : “=m” (v->counter)
            :"ir” (i), “m” (v->counter));
}   
        
//Subtract and test
static __inline__ int atomic_sub_and_test(int i, atomic_t *v)
{
    unsigned char c;

    __asm__ __volatile__(
            SMP_LOCK “sub %2, %0; beq %1“
            :"=m” (v->counter), “=qm"(c)
            : “ir” (i), “m” (v->counter) : “memory”);
    return c;
}

//Increment
static __inline__ void atomic_inc(atomic_t *v)
{
    __asm__ __volatile(
           SMP_LOCK “add %0“
           :"=m” (v->counter)
           :"m” (v->counter));
}

//decrement
static __inline__ void atomic_dec(atomic_t *v)
{
            __asm__ __volatile__(
                    SMP_LOCK "sub %0"
                       :"=m" (v->counter)
                       :"m" (v->counter));
}

//Decrement and test
static __inline__ int atomic_dec_and_test(atomic_t *v)
{
        unsigned char c;

            __asm__ __volatile__(
                      SMP_LOCK "sub %0; beq %1"
                     :"=m" (v->counter), "=qm" (c)
                     :"m" (v->counter) : "memory");
        return c != 0;
}

//increment and test
static __inline__ int atomic_inc_and_test(atomic_t *v)
{
        unsigned char c;

            __asm__ __volatile__(
                     SMP_LOCK "add %0; beq %1"
                         :"=m" (v->counter), "=qm" (c)
                       :"m" (v->counter) : "memory");
        return c != 0;
}

//Check to see if addition results in negative

static __inline__ int atomic_add_negative(int i, atomic_t *v)
{
        unsigned char c;

        __asm__ __volatile__(
               SMP_LOCK "add %2,%0; bne %1"
                       :"=m" (v->counter), "=qm" (c)
                       :"ir" (i), "m" (v->counter) : "memory");
        return c;
}

What remains of the atomic.h is a mask clear and set, and I need to track down the proper aarch64 versions of  logical 'andl' as well as 'orl', which thanks to the ARM pdf file we grabbed ages ago for class, have made themselves evident. I am not sure if they even require a port, since the code comments say they are x86 specific, Better safe than sorry I say (and say only for this particular instance). 

It's also not immediately clear if exclusive OR rather than inclusive OR needs to be used, which is another issue, but I would think the comments would mentioned if it wasn't inclusive. Inclusive it is (and firing off an email to the devs just to be sure).

//Mask code:
#define atomic_clear_mask(mask, addr) \
__asm__ __volatile__(
SMP_LOCK "AND %0,%1" \
: : "r" (~(mask)),"m" (*addr) : "memory")

#define atomic_set_mask(mask, addr) \
__asm__ __volatile__(SMP_LOCK "ORR %0,%1" \
: : "r" (mask),"m" (*addr) : "memory")


So for the purposes of SPO600, I do believe that's about all she wrote.
Ported atomics.h code for aarch64, the other asm is in dependency files, which obviously I have no control over, but all of which have aarch64/noarch versions for arm64 either out in the wild or just not in yum repositories (i'm looking at you, fftw3), and that odd "you're missing these tools" problem when I try to run the ./autoregen.sh.

I will likely still continue to work on this package over the summer with the community on my own time, and fix any issues with my code, and perhaps even submit it. Sadly, that can't be taken into account for marking purposes, but moral victories have value too

For For The Win 3 or how I got fftw3 installed on qemu

So, as I mentioned in an early blog post, fftw3 is required for Rubberband to operate and install smoothly on Linux, but v3 does not exist on the yum repository. What is a person to do!?

Well, I went ahead and grabbed the aarch64 compatible rpm source from rpmfind (as well as the fftw3-libs long and single sources), and ftp'd the files into my directory on Ireland in both x86 and arm64.

So while I cannot definitively say "yes" to fftw3 working on arm64 at the moment (all of the files installed properly at the very least), all the issues with x86 are out of the way, and the one hanging thread of a dependency on arm64 is but a tar unpack/install away from also being as such. Again, I went to work porting code, since that seemed of more pressing interest to the sake of this blog/course.

Of course running /autoregen.sh still gives me this:
----------------------------------------------------------------------
Checking basic compilation tools ...

    pkg-config: found.
    autoconf: found.
    aclocal: found.
    automake: found.
    libtool: found.
    gettext: found.
You do not have autopoint correctly installed. You cannot build SooperLooper without this tool.

No matter how many times I've gone back and made sure those particular files/packages are installed on arm64.

The code is more important...

Saturday, 12 April 2014

Code Snippet

Since I won't have much time after today, before Wednesday to get a lot of coding in (exams come first), I thought I'd at least post the bits of aarch64 assembler code that I've managed to template out on my machine.

It's not too much, and I think I might need to edit a line or two of it so far, but it's coming along. I guess this is like my rough draft:

#if defined(__aarch64__) || defined(__arm64__)

#ifndef __ARCH_AARCH64_ATOMIC__
#define __ARCH_AARCH64_ATOMIC__

#ifdef CONFIG_SMP
#define SMP_LOCK “lock ; ”
#else
#define SMP_LOCK "”
#endif

typedef struct {volatile int counter;} atomic_t;

#define ATOMIC_INIT(I) { (i) }

#define atomic_read(v)  ((v)->counter)

#define atomic_set(v,i)  (((v)->counter) = (i))

static __inline__ void atomic_add(int i, atomic_t *v)
{
    __asm__ __volatile__(
           SMP_LOCK “add  x1, x0, x0“
            :“=m” (v->counter)
            :"ir” (i), “m” (v->counter))    ; - this needs fixing/porting to proper aarch64
}   

static __inline__ void atomic_sub(int i, atomic_t *v)
{
    __asm__ __volatile__(
            SMP_LOCK “sub x2, x1“
            : “=m” (v->counter)
            :"ir” (i), “m” (v->counter));
}

Friday, 11 April 2014

Looking before I leap

In the interest of getting anything done, as the communication has slowed between myself and the dev's, I'm going to attempt something outrageous, which is this.
I will at least begin transcribing the x86/i386 assembly code into an aarch64 version, without implementing it directly into the program itself, as this is still likely a bit far off in terms of viability.

First the atomic.h, and then down the rabbit hole I go for any other assembly relating to it, of which there is some).

More on this, with samples of code/the entirety of what I right to come.

Sunday, 6 April 2014

March Roundup

So it seems that a dependency issue with sooperlooper may prevent it at the moment from being build-able on aarch64. Dammit fftw3, why do you and Rubberband have to have such a close relationship? It doesn't appear to exist in the yum repository, so I'll have to scrounge around for it somewhere else.

Anyways, soldiering on in spite of that issue. Still working out bugs in the x86 build, which the upstream team have been adding back into their own code repository. I guess even if I end up doing rather poorly in a course designed to produce working aarch64 code, I may end up helping get sooperlooper updated from its previous state to one that is slightly more up to date. Still hoping to jump itno the atomic.h (which, oddly enough is where mediatomb led me for month+) this time, no arm64 code and from the look of it, no fall backs either. Should be fun.

With any luck, Chris will allow me to have access to australia/ireland after the class ends so I can continue to work on the package, as I certainly wish to. Maybe even work on others if possible in my spare time (Who am I, and what have I done with the other me?)

Sooper [dooper] looper! draft

So I've picked Sooper looper to work with mainly because I a) got a response from a member of the community (the main developer I believe), and b)said developer (Jesse) has been quite helpful in helping me solve issues with building the x86 version. In this case, I had to rewrite a couple lines of code, both relating to the wxWidgets dependency package. Code examples below.

This particular error:
gui_app.cpp:308:18: error: invalid conversion from ‘const char*’ to ‘wxChar {aka wchar_t}’ [-fpermissive]
if (_host == "127.0.0.1" && _never_spawn) {

Needed to altered ever so slightly to:

if (_host == wxT("127.0.0.1") && _never_spawn).

Then:
gui_app.cpp: In member function ‘virtual bool SooperLooperGui::GuiApp::
OnInit()’ :
gui_app.cpp:250:38: error: ‘SetAppDisplayName’ was not declared in this scope
SetAppDisplayName(wxT("SooperLooper"));
Needed to changed to:
 #if wxCHECK_VERSION(2,9,0)
 SetAppDisplayName(wxT("SooperLooper"));
#endif, simply because wXWidgets has new features that old versions do not support.

I went straight from contacting Jesse to trying to build, without a thorough look at the code, so it may be a disaster (again), but I'm going to find out quicker if it builds on arm64 now and jump to libmad if that is the case.

Friday, 4 April 2014

A humorous aside

So we've seen that systemd spits out errors a lot, in our case, with the Foundation Model.
Well, it seems the spat that Chris mentioned today between Kernal devs and systemd devs has taken its next obvious step.

systemd developer susepended by tux's daddy himself, Linux Torvald.

Looks like it's put up or shut up time.

Just thought some might find this funny.

Friday, 28 March 2014

A view to code... Hopefully

So, it would seem the libmad community is all but dead, the last release of the package itself being a decade ago. It seems to have plenty of assembly to work with, thoug that's from at best a cursory glance. i haven't tried to build it on x86 or arm64 yet. That will be today's task.

Sooperlooper has a more active community, in fact the dev got back to me rather quickly, but the news was  that "SooperLooper doesn't use any x86 specific code so it won't be the holdup. The 3rd party dependencies should be OK as well, as far as I know!"

I wasn't able to dig into the code too far, so I don't know if that means what it meant for mediatomb (aka this will build already), or if that means it should be pretty easy to work over and run on aarch64 with a few tweaks. Again, today.They both have the same amount of code in them, more or less, so it's going to be whichever I can dig into quickest, and since sooperlooper is still active, that may help a lot.

This is going to be an emergency landing, with a real bumpy ride getting there.

Wednesday, 26 March 2014

Swing and a miss - strikes left

First attempt at a different package from the list, got a hold of someone from the community, and the files that have the ASM aren't compiled, so there's no need to change the asm/port it to aarch64.
Well so much for that. Frysk is way too much to tackle in the amount of time left. Libmad and/or sooperlooper however look more promising, few files, not a lot of asm to switch over. Just need to hear back from either community, as sooperlooper doesn't build on x86 and that'll make aarch64 nearly impossible. (missing headers), and libmad seems to have died as a thing nearly a decade ago.

Still it's mostly atomics in Sooperlooper and some performance in libmad. Not too hard to at least port.
Still trekking forward. Much thanks to Chris for floating the goal posts on this one yesterday morning.

Tuesday, 25 March 2014

There are worse things than death..

And it turns out, having your package compile due to C-fall backs after you finally manage to get your aarch64 environment built is one of those things. Now I scramble to get another package from the list, and this time I'm contacting the community first, if only to see if there's anything I can conceivably work with in the short amount of time I have left.

I'm looking into Frysk, SooperLooper, and Ardour are all packages that are still free on the list that I'm looking into as quickly as I can. joining mailing lists and trying test builds on Ireland will be the quickest methods after doing a short prelim to grep for asm in the code.

Ugh... Not a good day.

Friday, 21 March 2014

Death of a Laptop

So Chris, taking some class time (this class must have been scheduled as a 'Intro To', only way to explain some of it) has solved at least a few of my issues, and a few from around the class.

It has become all too clear to me, Chris, and every one else in the room that this laptop is not capable of carrying on the heavy lifting. So I now have access to australia and can do aarch64 testing/work/etc on there, and can leave both Chris Markieta and Chris Tyler alone (so long as the Foundation model is installed on there already. I dont want to sftp a 13-16gb image file onto from anywhere unless it's over a cross over cable and i'm sitting at the machine itself.

So with that, today I will do my damndest to build mediatomb on arm64 and test the crap out of it. Well, in so far as I can get the dependencies installed. Oi.

I even got this hunk of junk to display on the monitors in class, albiet I had to actually look at the monitors on the wall to see what I was doing because the screen is too small to display to both properly. Oh and we found an ML language in one of the packages. To that student, I say goodluck.

Tuesday, 18 March 2014

And now something completely unemulated

So part of this weeks blog post is meant to deal with installing and running a different ARM64 emulator and testing the time and performance of our packages (at a point farther along than I am to be honest). And while 13gb's isn't something I normally throw around for just any file, but for the Foundation emulator I'll make an exception.

Everything went swimmingly, until of course, like all things in life, it no longer did. In this case the emulator ended up not running.

As I've never used this particular piece of software before, I have no idea why those devices aren't being found and the Foundation binary isn't being executed. I think I'll be firing off an email to Chris about this in the mean time unless someone, anyone other there in the wide world web can point to my stupidity and say "that's where you went wrong", which I would love at this point.

Sunday, 9 March 2014

Assembler in aarch64

So it seems like the only actual code required to change is rather small. Thanks to Nick for giving me a hint in the right direction there. Would likely have rewritten the entire header file had he not. Forest for the trees. Leave the output/input/clobbering alone.

So yeah the first bit includes: __asm__ __volatile__ ("incl %0" - In essence, increment  "at->x" by one, discard the value currently in m, and replace it with the incremented value, and the cc just shows that the value is going to be clobbered/changed by the end of the function.
Increment register 0
In aarch64, this should simply be an 'ADD %0' or possibly 'inc %0'

The second line, as shown here:
decrement register 0, set byte e to one, otherwise zero
This one is a little more complex, but not too much really. It should be 'SUB %0,' since it's decrementing the at->x value, but I'm still looking for the an aarch64 variation on 'sete %1'. Shouldn't take too long really.
Inline assembly is a pain the in the butt to find specific answers for some time.

Monday, 3 March 2014

SPO600 Roadmap - Mediatomb, Lightspark, sanity

Who knew? I most definitely did not. After running into seemingly every possible problem with mediatomb I could, ./configure errors, make errors, I decided to leave it on the back burner for a day and try to get lightspark to build on x86, especially after Chris and I had talked about it for nearly an hour. Even worse. It seems the code is calling a header that is not up to date with where ffmpeg install certain header files, and I've yet to hear back from the dev team. That feeling in the pit of my stomach grows ever worse.

I've spent the past hour googling for various libav build dependencies (after the rather large list of dependencies from their own wiki requires google-fu). Searching 'fedora yum install <package name>' comes in REALLY handy for finding out actual package names when yum search fails on its own. Still, no luck libav causes headaches for a few other packages, but nothing helps my situation with lightspark any. Lovely. So hey, here's a thought, try medaitomb again. I mean, I can't feel any worse.

So I rebuild the configuration file. Yep, same old terminal output. Now to the usual fail of make... or so I thought! Somehow in the crazed state getting all the package dependencies for Lightspark, I managed to solve my mediatomb problem (me thinks ffmpeg was involved) and make ran! Not only did make uh make, but it also installed! I nearly jumped out of my chair. This excitement was tempered somewhat knowing I may have annoyed the crap out of the one developer involved with maintaining the code who I had been talking to and brought up the previous error with.

So to recap where I am:

mediatomb - works on x86 no problem (well, now). I'm going to continue talking with the community about what may be required for ARM64 implementation aside from converting/porting a few lines of x86 atomics to C or aarch64 assembly respectively. With a little more research into inline assembly, this shouldn't be too difficult a task to tackle and I look forward to jumping into the guts of the code vi editor in hand. I could have a test up by next week if all goes well.

As far as Lightspark goes, I've reached out to the community about both ARM64 and that make error that's preventing me from completing a successful build (and maintaining stress levels). But I'm unsure of where that sits. I've looked into another package or two, but I'll hold off a day or so and give the mailing list a change to get back to me. Another possibility has been brought to my attention from classmate Nick, that being the possibility of working on a more difficult package (a 2 on the scale), which Chris has apparently given tentative thumbs up to. If Lightspark falls through, I would gladly jump at such an opportunity to share in the pain that is Assembly coding and Linux package building.

--note --
I'll add in a picture or two of mediatomb in action/configuring or lightspark failing soon. 2am is too late to be swapping images around on this blog.





Friday, 21 February 2014

SPO600 - And now the package picking begins

So it seems I've landed myself lightspark, a free, open source Flash alternative and mediatomb, a plug and play media streaming app.

Media Tomb, which did not take an arm (ahem..) and a leg to get the source files for, uses assembly in two places that I can see, and they don't seem to be used elsewhere in the program at all (or at least outside of documentation).

The assembley is inline and looks a bit like this:
#ifdef ATOMIC_X86_SMP
    #ifdef ATOMIC_X86
        #error ATOMIC_X86_SMP and ATOMIC_X86 are defined at the same time!
    #endif
    #define ASM_LOCK "lock; "
#endif

#ifdef ATOMIC_X86
    #define ASM_LOCK
#endif

#if defined(ATOMIC_X86_SMP) || defined(ATOMIC_X86)
    #define ATOMIC_DEFINED
    static inline void atomic_inc(mt_atomic_t *at)
    {
        __asm__ __volatile__(
            ASM_LOCK "incl %0"
            :"=m" (at->x)
            :"m" (at->x)
            :"cc"
        );
    }

If I'm reading this correctly, and there is a big chance I'm not, it's saying 'use this assembly only if you're using an x86 processor', otherwise, don't (more or less).

Lightspark however, has a lot more assembly for x86 working with video files, Chris and I looked at this, and determined that it had nearly 120 lines of the stuff and not only that, it uses NASM, an x86-only assembly compiler. However, it also appears that it should be rather straight forward to at least attempt to port to aarch64 assembly, if not try to get some C variations in there. Or at least see if there are C falls backs for that (I don't have the Lightspark code in front of me, so excuse the lack of examples for it.

Media Tomb should be a lot less work in terms of practical coding, Lightspark will likely take more time, but we'll see if the work is doable in the timeframe for this project.

I will be honest, I chose these two, for now at least, because they were both 0 on our list, and I didn't want to stray too far from my comfort zone,Lightspark is a flash alternative, and that is ALWAYS a plus in my mind. The less adobe I can put on a machine of any kind, the better. And a small media server app that isn't XBMC is good too, since somehow that thing has become all-pervasive. If Lightspark ends up being too much work, I may end up passing on it and moving on to a new package.

Saturday, 8 February 2014

SPO600 Assembler in Linux Packages lab 4

As part of group four, I ended up looking at Ogre and another group member looked into NSPR, discussing amongst ourselves the questions required to present to the rest of the class (that honour went to NSPR).

Ogre is an open source, multiplatform 3d rendering engine, used to make 3d applications (though I'd think it's most often used for game creation, where I first came across it) more so than any other type of application.

Most of the assembly code was spread across a small number of different source file directories, all located in the 'Main' Ogre directory, OgreMain in the 'src'. The assembler code seemed to be split into two separate uses. One was used for CPUID, in other words, to determine the type of CPU a particular user is running on when executing code, to determine which instruction set to follow. The other use seemed to be for atomics, and locking registers/the stack under specific circumstance, of which I couldn't 100% suss out from its use in the code (and my lack of experience with reading real world code on a regular basis). However, most of the time it still seemed to relate back to which set of CPU instructions were to be used. Oddly enough, within the code itself, the Assembler used for atomics made reference to its inclusion for only slight performance gains over C++ variations of the same coding.

One example of assembler's use in Ogre (found in OgreMain/src/nedmalloc/malloc.c.h.ppc):

/* place args to cmpxchgl in locals to evade oddities in some gccs */
int cmp = 0;
int val = 1;
int ret;
__asm__ __volatile__ ("lock; cmpxchgl %1, %2"
: "=a" (ret)
: "r" (val), "m" (*(lp)), "0"(cmp)
: "memory", "cc");

It's not 100% apparent whether or not the Assembler in Ogre was written specifically for it, or taken from an existing library, though some of the comments included in-line with the code lead me to believe it was a little of both. From those same comments and the syntax used, the assembly code is meant for x86_64 and for most of it, more architecture agnostic versions (C++ specifically) exist, with slight performance loss. If the team is putting those kinds of comments in their own source code, I have to imagine that yes, this version of Ogre could relatively easily be ported to and built for aarch64, with very little difference to from the current x86_64 version.

Here is one of the comments on Assembly use in Ogre:

"USE_BUILTIN_FFS default: 0 (i.e., not used)
Causes malloc to use the builtin ffs() function to compute indices.
Some compilers may recognize and intrinsify ffs to be faster than the
supplied C version. Also, the case of x86 using gcc is special-cased
to an asm instruction, so is already as fast as it can be, and so
this setting has no effect. Similarly for Win32 under recent MS compilers.
(On most x86s, the asm version is only slightly faster than the C version.)"

The other, NSPR, seems to only have a small amount of Assembly used for atomics in a small number of files, which could probably be rewritten in C/C++, especially since the Netscape Portable Runtime is meant to be platform neutral API - meant for web and server based applications (though in could be used for anything I suppose). However, speed might play a bigger role for NSPR than it would for CPUID function in Ogre. It was hard to tell whether the Assembler was hand written or taken from a library. My guess, library code. It could probably be built on aarch64 without too much trouble at the moment, even though the Assembly code here is for x86. I didn't access NSPR directly, as our group simply split the responsibility for each package to one a piece, and as such, I do not have any code snippets for it.

Saturday, 1 February 2014

OOP344 Macros Workshop

So it seems swapping int's and doubles using macros or simple, me-defined functions doesn't really change too much in terms of processing time. There is of course the possibility that I screwed up in the macro version and forgot to add something to it to speed things up, but for the most part, both version of the code process in about the same amount of time, 5 seconds and change.

my hand written function's processing time for 1 billion swaps
There we can see, nearly six seconds for swapping integers.
Macro time to swap 1 billion times
It makes sense, to me at least, that using a #define macro to do the swapping instead of writing the entire function out in long form wouldn't change the times too drastically.

I mean, a #define is only going to be pasted into the main (in this case) when called, and nothing more.




It is a little comforting that the times seem to make more sense - with double's taking a slight amount longer to process than the int's, which was the reverse of my function. I do of course leave the door open that I'm completely wrong in my coding. It was also weird to go back and write code in C style after a full semester of C++ changing everything to classes. I much prefer classes but didn't feel something this size required it.

Macro code

My function







Farewell to 64bit Arms - Assembly, Loops, 64 Arms Part 2

So we had finished ten loops. Now we moved on to 30 for both x86 and aarch64.
From there, we had to move on to a loop of 30 while also splitting loop iterations greater than 9 into two separate digits to be inserted into the byte, and printed, and (whoops, unbeknown-st to use at the time) then remove leading zeros (eg print out "9" instead of "09").

This required a few, but not difficult changes to both x86 and aarch64 code.
Divide the loop iteration by 10, convert the 1's column of the resulting quotient (by adding zero as before), and writing it to the low half of the specified register, and while the iteration is less than 10, put a '0' in the ten's column, and compare whether or not the register holding the 10's column is holding a zero with (in our case) hex 0. If it is equal, we used je (jump if equal) to not write, or skip writing to the 10's column. If the 10's column is not '0', convert it to ascii, move it to the write byte index in our string message register.

       cmp $0x00, %rax /* if the 10's column is 0 */
       je skip_10s /* don't write it */
       ...
       skip_10s:

      mov $len, %rdx /*print out the string with the new byte*/
      mov $msg, %rsi
      mov $1, %rdi
      mov $1, %rax
      syscall 

From there on the code is mostly the same, and by this point, making the 'skip_10's' its own little function was the most work aside from keeping track of safe registers remaining (we always seem to get caught up on the little stuff). The aarch64 was, as it was before, done second and was mostly about converting between syntax. Why, when Assembly is this old, can newer versions of it not use syntax from existing variations? I mean really.

As for which I prefer of the two. x86_64 I feel has more straight forward, relatively speaking, I mean opcodes are only 3 characters long, and often times (regardless of platform syntax) have several different meanings attached to them. However, the aarch64 syntax does make keeping track of registers so much easier, with each simply having a numeric value/name from rzr to 30, and subdividing them into groups that are relatively easy to remember (and simply change letter denominations for different memory length). With that said, msub is one of those 'love to hate' opcodes that will take time to warm to after this experience. It was, with the help of Starter Kit relatively easy to write the code, compared to some other languages (C, C++), but isn't quite at the code-in-English level of say, COBOL. With more practical practice, I can see myself quite enjoying writing/working with Assembler (of either variety for this classes purpose)

Here is a link to the first teration of the code (10 iteration loop x86), and here is aarch64 version.
The finalized versions are x86 here and aarch64 here





Tuesday, 28 January 2014

SPO600 - Assembly, Loops, 64 Arms Part 1

For this SPO600 lab, we were tasked with writing (or at least combing and building upon) a loop in both x86 and aarch64 that would run 10, then 30 times printing out a small string of text that ended with the instance of the loop (eg 1-10/30) and a new line character. When the loop was lengthened to 30 iterations, the iterations 10 and above were to be split in two, dividing by 10, and placed into the "Loop" string as two separate digits when printed


As Chris had already written a 'hello world' program, our group (Myself, Nick, and Yoav) went about copying the hello word.s code into the loop code and modifying it to print with each iteration of the loop. A harder task that it initially appeared, and that was before we tried to conquer aarch64.

To say this was difficult does not do the stumbling blocks we encountered justice. Our initial troubles were with the .data section of the code, trying to find proper, efficient mathematical way to insert the resulting loop digit bytes into the proper string index locations. That one line took more time to figure out than most of the loop.

Finding the right equation took longer than expected.
When that was solved to our satisfaction, we compiled and received a segmentation fault at (what was then) line 12 for our troubles, this was the index variable (aka where our loop iteration variable was being stored).  Enclosing 'index' in parentheses was the way to go (thank you Starter Kit!), but we still had faults.

After much trial and error going it on our own, gbd showed us that there was a weird variable being stored in the r12 register. So, despite copying the code from Chris' 'hello world' program earlier, we had made a mistake somewhere in our code additions and changes (like assuming all registers will work equally in certain conditions). On a rather Hail Mary move, we switched register from %rcx to %rdx and the fault was corrected.

%rcx does not like. Who knew?






 Thus we had it printing, but not in the right format:
Loop 0Loop 1Loop 2Loop 3Loop 4Loop5Loop 6Loop 7Loop 8Loop 9Loop10.

If the index and rdx conundrum were a challenge, this... this was kryptonite to our moral (and by far the thing we spent the most time working on). Eventually, I came to the conclusion that the new line character wasn't being used because it was being erased/overwritten by the data we were adding to the loop. It turns out, whoops, adding an entire 64bits registry to our already 64bit string-held registry was much more than we needed. Nick, in a moment of programming clarity switched from a full 64 bit to only taking the low byte of the registry (r13b instead of r13) when moving the value into 'index'. Now we had a loop printing out 10 times.  There was much rejoicing.

We don't need no stinkin' 64bits registry!
So now we had one half of one third of the lab done. On to the aarch64 conversion, which itself wasn't too problematic, simply switch most of the register and variable/values to the opposite of x86_64.

I mean, not too problematic outside of finding a replacement in aarch64 Assembler for storing 'index', since we no longer had the syntax of 'r_b' to work with. This necessitated more trial and error (add, str) until we used 'strb', because we like to inflict mental pain on ourselves during the learning process.

However, despite a 'hello world' example, and there being three of us, none of notices that we needed to change how our 'msg' was stored in a register. So we were stuck with odd segmentation faults that looped and printed out ten times, but it was displaying "qemu: Unsupported syscall: 0" ten times, which is pretty close to what we wanted. A letter or two off really. It wasn't until later that Nick realized our mistake and swapped the 'mov' we had been trying with an 'adr', thus at least solving the first part of the lab.

loop:
mov x27, x28
add x27, x27, 0x30 /* convert to ascii */
--> adr x26, msg

I will post the rest (30 iteration loop, remove leading zero, my thoughts on Assembler) in a second post just to keep things a decent length

Sunday, 26 January 2014

OOP344 FieldLen Not Zero

Between trying to figure out void pointers and the assignment, I'm quite a bit lost in C++. Partly because we've not discussed enough in class, partly because the scope of the assignment is way beyond anything I've done up to this point
After going over what the function is meant to do, I started writing some pseudo code/code in comment for the display function.

Something along these lines:

//if fieldLen !=0, blank the rest of the line
//if(fieldLen=!0)
//{
// for(int i = 0; i < fieldLen; i++)
// {
// cio::console.setCharacter('\0');
// cio::console.setPosition(row, col + i);
// cio::console.drawCharacter();
// }

 //If the cursor is in the last column position in a row, move it down a single row underneath the final
//character
//if(the aren't any columns left)
// cio::console.setPosition(row + 1, col);

I get the feeling the fieldLen is closer to correct than not, but still don't understand what I'm trying to do well enough to compile and check.

Edit is still blank at this point. Want to finish more of display first.

Tuesday, 21 January 2014

Hello World and the Compiler

SPO600 lab 2 instructions can be found here.

To begin, write a basic ‘hello world’-esque program in C. You know, a printf(“hello world\n") sort of thing. After writing that program, we were to add different options to the compiler or objdump as required for each test case.

TL;DR - compilers are power tools, not be underestimated. Respect them, or they will retaliate.

Test 4 with additional
 integer arguments
As part of the group who was tasked with Test 4, we were to add additional arguments to the code (simple integers 1 to 10) and use the objdump command with a few additional options to examine the binary file created by the compile. When additional arguments are added to the basic ‘hello world’ program, the compiler includes additional mov’s in the form of move-loads (movl) for each individual argument (ten more in our case). That the integers were not moved into a register, but straight to the stack is part of the infinite wisdom of the gcc compiler. The movl to the %esp is the equivalent to stack pushes, as apparently our additional argument (18 digits long) was too large to push all the values into a register. When the additional argument was changed from a simple integer to a long integer, as seen in the screenshots below, it appears as if the compiler set aside two separate memory locations for the long integer instead of one, and in an act of Little Endian vs Big Endian, listed the sections out of order (in regards to hex address). All other arguments hex addresses/values decreased by 4, as the values of each argument decreased (starting from nine, down to one) like normal.

Test 4 with ridiculous long integer argument
For the next trick, Test 1, adding ‘-static’ when compiling changes the size dramatically from eight kilobytes without -static when compiling to over 800 kilobytes when including -static while compiling. The biggest change in the main function is the change of the callq from printf@plt without -static to IO_printf with it. printf@plt is essentially a pointer to the library functions printf requires to run properly, and the compiler simply follows that pointer to the required code when printf is called. IO_printf on the other hand takes all the libraries used by the program and actually adds their code to the program source and as such, changes the order of the functions, with so many more added that make sense. With printf, IO_printf skips the pointer and takes a more direct route where printf is held. <main> is now no where near the end of the actual source code, with all the library functions coming after it.

With the -fno-builtin removed from the compile as part of Test 2, the compiler uses its own built-in (ahem) optimizers to make the source code more efficient than it would be normally (with -fno-builtin enabled for compilation). In this case, the callq doesn't call printf, instead calling puts, which is a more direct way of showing output on screen than printf which has to jump through other hoops before printing to the screen. (The executable size is also slightly smaller. It also removes a mov (0x0 int %eax, so a 0 is no longer being moved into the eax register), and changes the nopw to %cs:0x0(%rax,%rax,1) from 0x0(%rar, %rar,1), which I believe is a change to where the No Operation instruction is stored (from memory to the top of the stack).

--Note, on my laptop, even though I'm doing all the processing on the Ireland server, there was no change in the object dump between enabling and disabling -fno-builtin. I had another student watch me compile, and objdump both versions with no noticeable difference. My initial thought was that my laptop's cpu being as old as it is (and 32bit to as well), may be the cause of these issues

Test 3 asked to have the -g option disabled during compilation. Without -g, debug sections of code are removed from the elf file, as is any extra debugging information provided for each section of the file. Without the debug section headers, there is a lot less code (relatively speaking), and as such the program itself is about a kilobyte smaller than with -g enabled (8k without comparable to 9k with). There is a lot more debugging information than one might think for such a small, simple program like "Hello world\n".

Test 5 required moving the printf into a separate function on its own, which causes the string and integer additions to be loaded outside the main. Instead, the compiler pushes the main straight to the stack, calls the printf function, does all instructions/work to be done with the printf function, finishes with that function, and retrieves the original information from the stack with pop %rbp. The printf function is right below the main.


The last test, test 6 called for the replacing the -O0 from the compile with -O3, that increases optimization from nothing to [shall we say] level 3. Doing so removes push and pop instructions, in other words, nothing is being put on the stack for later use. O3 simply moves the code/values directly into memory, uses xor to automatically switch the value in the eax register to 0 (whereas -O0 moved a 0 value from memory into that register to achieve the same goal) and instead of the compiler using callq on address 400410 <printf@ptl> it uses jmpq 400410 <printf@ptl>, which means it jumps directly to the value at that address (and in this case, that's our line of text to be printed on screen). After jumping to that memory location, the no operation instruction is given (nop), which more or less ends the program.



Sunday, 19 January 2014

OOP344 Exercise 1

Okay, I'll admit it. Using the command prompt/bash shell, even for the Windows version of  GitHub is preferable to the GUI version. Intuitive the GUI is not. At the very least, the exercise showed me just how rusty I can be with nearly a month of time between writing any programs in C++. Only the most obvious errors stood out at first (adding the '+1' to when dynamically allocating memory for a new object for instance). And then there is Git, and well, Git, like all Linux-related software that isn't an Android front-end for a phone, is not about ease of use, but function. Maybe I'll try TortoiseGit and see how different that is.

Now on to the labs and reading, woo!!

Wednesday, 15 January 2014

SPO600 Lab1 - Licenses

SPO600 Lab 1 - Linux Package Licensing

SPO600's first lab looks at two different Linux packages and the license agreements involved with open source software.

Firstly, looking at Audacious, a Linux audio player, which uses the GPL3 license, which is, from their website (and within the application itself), almost impossible to find on its own - Google search to the rescue it seems. The project uses a Redmine-based bug tracking system to track and submit patches/etc, and most patch implementations take anywhere from a few hours to a few months, depending on the size and severity of the problem, and from the look of things, involves the person submitting the bug/patch, and a project manager, with occasional bugs involving up to 3 or 4 people discussing an issue, but always the same project manager in the end. It seems that such a small team doesn’t require much more than a simple bug tracking system, limiting the team members responsible for implementing/upstreaming changes to repositories and keeping them in direct contact with their community. This of course also has the knock on effect that bug fixes may not always be timely, since it appears that one or two people may be doing that whole process.

Another package, x11-common, is part of the X Windows system for linux/unix distributions and uses the Mozilla-based Bugzilla bug tracking system. It’s uses a variation of the MIT license (the X11 License). As the entire X Windows system is run by an educational non-profit foundation with a dedicated board of directors, and other contributors, the team’s involvement is much less direct than a smaller team like Audacious. As such, it makes sense that they’d use a Bugzilla tracking system which is feature rich but decentralized, as the x11-common package is but one part of the larger X Windows project. Larger team means more people are able to look at and fix bugs in a more timely manner, but also makes things a bit more faceless, unless one is intimately involved with the project.

Saturday, 11 January 2014

A new blog for a new course [two even!], OOP344 and SPO600. It'll probably be just blank text for a while, until I feel like mocking up templates.

Until then, lots of programming talk. C++, Assembler, and so forth in ugly plain text.