Tuesday, 28 January 2014

SPO600 - Assembly, Loops, 64 Arms Part 1

For this SPO600 lab, we were tasked with writing (or at least combing and building upon) a loop in both x86 and aarch64 that would run 10, then 30 times printing out a small string of text that ended with the instance of the loop (eg 1-10/30) and a new line character. When the loop was lengthened to 30 iterations, the iterations 10 and above were to be split in two, dividing by 10, and placed into the "Loop" string as two separate digits when printed


As Chris had already written a 'hello world' program, our group (Myself, Nick, and Yoav) went about copying the hello word.s code into the loop code and modifying it to print with each iteration of the loop. A harder task that it initially appeared, and that was before we tried to conquer aarch64.

To say this was difficult does not do the stumbling blocks we encountered justice. Our initial troubles were with the .data section of the code, trying to find proper, efficient mathematical way to insert the resulting loop digit bytes into the proper string index locations. That one line took more time to figure out than most of the loop.

Finding the right equation took longer than expected.
When that was solved to our satisfaction, we compiled and received a segmentation fault at (what was then) line 12 for our troubles, this was the index variable (aka where our loop iteration variable was being stored).  Enclosing 'index' in parentheses was the way to go (thank you Starter Kit!), but we still had faults.

After much trial and error going it on our own, gbd showed us that there was a weird variable being stored in the r12 register. So, despite copying the code from Chris' 'hello world' program earlier, we had made a mistake somewhere in our code additions and changes (like assuming all registers will work equally in certain conditions). On a rather Hail Mary move, we switched register from %rcx to %rdx and the fault was corrected.

%rcx does not like. Who knew?






 Thus we had it printing, but not in the right format:
Loop 0Loop 1Loop 2Loop 3Loop 4Loop5Loop 6Loop 7Loop 8Loop 9Loop10.

If the index and rdx conundrum were a challenge, this... this was kryptonite to our moral (and by far the thing we spent the most time working on). Eventually, I came to the conclusion that the new line character wasn't being used because it was being erased/overwritten by the data we were adding to the loop. It turns out, whoops, adding an entire 64bits registry to our already 64bit string-held registry was much more than we needed. Nick, in a moment of programming clarity switched from a full 64 bit to only taking the low byte of the registry (r13b instead of r13) when moving the value into 'index'. Now we had a loop printing out 10 times.  There was much rejoicing.

We don't need no stinkin' 64bits registry!
So now we had one half of one third of the lab done. On to the aarch64 conversion, which itself wasn't too problematic, simply switch most of the register and variable/values to the opposite of x86_64.

I mean, not too problematic outside of finding a replacement in aarch64 Assembler for storing 'index', since we no longer had the syntax of 'r_b' to work with. This necessitated more trial and error (add, str) until we used 'strb', because we like to inflict mental pain on ourselves during the learning process.

However, despite a 'hello world' example, and there being three of us, none of notices that we needed to change how our 'msg' was stored in a register. So we were stuck with odd segmentation faults that looped and printed out ten times, but it was displaying "qemu: Unsupported syscall: 0" ten times, which is pretty close to what we wanted. A letter or two off really. It wasn't until later that Nick realized our mistake and swapped the 'mov' we had been trying with an 'adr', thus at least solving the first part of the lab.

loop:
mov x27, x28
add x27, x27, 0x30 /* convert to ascii */
--> adr x26, msg

I will post the rest (30 iteration loop, remove leading zero, my thoughts on Assembler) in a second post just to keep things a decent length

Sunday, 26 January 2014

OOP344 FieldLen Not Zero

Between trying to figure out void pointers and the assignment, I'm quite a bit lost in C++. Partly because we've not discussed enough in class, partly because the scope of the assignment is way beyond anything I've done up to this point
After going over what the function is meant to do, I started writing some pseudo code/code in comment for the display function.

Something along these lines:

//if fieldLen !=0, blank the rest of the line
//if(fieldLen=!0)
//{
// for(int i = 0; i < fieldLen; i++)
// {
// cio::console.setCharacter('\0');
// cio::console.setPosition(row, col + i);
// cio::console.drawCharacter();
// }

 //If the cursor is in the last column position in a row, move it down a single row underneath the final
//character
//if(the aren't any columns left)
// cio::console.setPosition(row + 1, col);

I get the feeling the fieldLen is closer to correct than not, but still don't understand what I'm trying to do well enough to compile and check.

Edit is still blank at this point. Want to finish more of display first.

Tuesday, 21 January 2014

Hello World and the Compiler

SPO600 lab 2 instructions can be found here.

To begin, write a basic ‘hello world’-esque program in C. You know, a printf(“hello world\n") sort of thing. After writing that program, we were to add different options to the compiler or objdump as required for each test case.

TL;DR - compilers are power tools, not be underestimated. Respect them, or they will retaliate.

Test 4 with additional
 integer arguments
As part of the group who was tasked with Test 4, we were to add additional arguments to the code (simple integers 1 to 10) and use the objdump command with a few additional options to examine the binary file created by the compile. When additional arguments are added to the basic ‘hello world’ program, the compiler includes additional mov’s in the form of move-loads (movl) for each individual argument (ten more in our case). That the integers were not moved into a register, but straight to the stack is part of the infinite wisdom of the gcc compiler. The movl to the %esp is the equivalent to stack pushes, as apparently our additional argument (18 digits long) was too large to push all the values into a register. When the additional argument was changed from a simple integer to a long integer, as seen in the screenshots below, it appears as if the compiler set aside two separate memory locations for the long integer instead of one, and in an act of Little Endian vs Big Endian, listed the sections out of order (in regards to hex address). All other arguments hex addresses/values decreased by 4, as the values of each argument decreased (starting from nine, down to one) like normal.

Test 4 with ridiculous long integer argument
For the next trick, Test 1, adding ‘-static’ when compiling changes the size dramatically from eight kilobytes without -static when compiling to over 800 kilobytes when including -static while compiling. The biggest change in the main function is the change of the callq from printf@plt without -static to IO_printf with it. printf@plt is essentially a pointer to the library functions printf requires to run properly, and the compiler simply follows that pointer to the required code when printf is called. IO_printf on the other hand takes all the libraries used by the program and actually adds their code to the program source and as such, changes the order of the functions, with so many more added that make sense. With printf, IO_printf skips the pointer and takes a more direct route where printf is held. <main> is now no where near the end of the actual source code, with all the library functions coming after it.

With the -fno-builtin removed from the compile as part of Test 2, the compiler uses its own built-in (ahem) optimizers to make the source code more efficient than it would be normally (with -fno-builtin enabled for compilation). In this case, the callq doesn't call printf, instead calling puts, which is a more direct way of showing output on screen than printf which has to jump through other hoops before printing to the screen. (The executable size is also slightly smaller. It also removes a mov (0x0 int %eax, so a 0 is no longer being moved into the eax register), and changes the nopw to %cs:0x0(%rax,%rax,1) from 0x0(%rar, %rar,1), which I believe is a change to where the No Operation instruction is stored (from memory to the top of the stack).

--Note, on my laptop, even though I'm doing all the processing on the Ireland server, there was no change in the object dump between enabling and disabling -fno-builtin. I had another student watch me compile, and objdump both versions with no noticeable difference. My initial thought was that my laptop's cpu being as old as it is (and 32bit to as well), may be the cause of these issues

Test 3 asked to have the -g option disabled during compilation. Without -g, debug sections of code are removed from the elf file, as is any extra debugging information provided for each section of the file. Without the debug section headers, there is a lot less code (relatively speaking), and as such the program itself is about a kilobyte smaller than with -g enabled (8k without comparable to 9k with). There is a lot more debugging information than one might think for such a small, simple program like "Hello world\n".

Test 5 required moving the printf into a separate function on its own, which causes the string and integer additions to be loaded outside the main. Instead, the compiler pushes the main straight to the stack, calls the printf function, does all instructions/work to be done with the printf function, finishes with that function, and retrieves the original information from the stack with pop %rbp. The printf function is right below the main.


The last test, test 6 called for the replacing the -O0 from the compile with -O3, that increases optimization from nothing to [shall we say] level 3. Doing so removes push and pop instructions, in other words, nothing is being put on the stack for later use. O3 simply moves the code/values directly into memory, uses xor to automatically switch the value in the eax register to 0 (whereas -O0 moved a 0 value from memory into that register to achieve the same goal) and instead of the compiler using callq on address 400410 <printf@ptl> it uses jmpq 400410 <printf@ptl>, which means it jumps directly to the value at that address (and in this case, that's our line of text to be printed on screen). After jumping to that memory location, the no operation instruction is given (nop), which more or less ends the program.



Sunday, 19 January 2014

OOP344 Exercise 1

Okay, I'll admit it. Using the command prompt/bash shell, even for the Windows version of  GitHub is preferable to the GUI version. Intuitive the GUI is not. At the very least, the exercise showed me just how rusty I can be with nearly a month of time between writing any programs in C++. Only the most obvious errors stood out at first (adding the '+1' to when dynamically allocating memory for a new object for instance). And then there is Git, and well, Git, like all Linux-related software that isn't an Android front-end for a phone, is not about ease of use, but function. Maybe I'll try TortoiseGit and see how different that is.

Now on to the labs and reading, woo!!

Wednesday, 15 January 2014

SPO600 Lab1 - Licenses

SPO600 Lab 1 - Linux Package Licensing

SPO600's first lab looks at two different Linux packages and the license agreements involved with open source software.

Firstly, looking at Audacious, a Linux audio player, which uses the GPL3 license, which is, from their website (and within the application itself), almost impossible to find on its own - Google search to the rescue it seems. The project uses a Redmine-based bug tracking system to track and submit patches/etc, and most patch implementations take anywhere from a few hours to a few months, depending on the size and severity of the problem, and from the look of things, involves the person submitting the bug/patch, and a project manager, with occasional bugs involving up to 3 or 4 people discussing an issue, but always the same project manager in the end. It seems that such a small team doesn’t require much more than a simple bug tracking system, limiting the team members responsible for implementing/upstreaming changes to repositories and keeping them in direct contact with their community. This of course also has the knock on effect that bug fixes may not always be timely, since it appears that one or two people may be doing that whole process.

Another package, x11-common, is part of the X Windows system for linux/unix distributions and uses the Mozilla-based Bugzilla bug tracking system. It’s uses a variation of the MIT license (the X11 License). As the entire X Windows system is run by an educational non-profit foundation with a dedicated board of directors, and other contributors, the team’s involvement is much less direct than a smaller team like Audacious. As such, it makes sense that they’d use a Bugzilla tracking system which is feature rich but decentralized, as the x11-common package is but one part of the larger X Windows project. Larger team means more people are able to look at and fix bugs in a more timely manner, but also makes things a bit more faceless, unless one is intimately involved with the project.

Saturday, 11 January 2014

A new blog for a new course [two even!], OOP344 and SPO600. It'll probably be just blank text for a while, until I feel like mocking up templates.

Until then, lots of programming talk. C++, Assembler, and so forth in ugly plain text.