Monday 19 August 2013

Assembly Language

  -----------------------------
  -----------------------------


A look at the SPARC assembly code of a sample C program:
-------------------------------------------------------


Consider the following C program:

#include<stdio.h>
#include<math.h>

double a[100];

main()
{
  int i;
  double sum;

  for(i=0, sum=0.0; i<100; i++)
  {
       a[i] = sqrt(a[i]);
       sum += a[i];
  } 
  printf("sum = %4.2f\n", sum);
}


The optimized assembly code of this program on a SPARC machine (obtained
using the CC compiler with optimization flag -O) is given below. 
A SPARC instruction has the format
         add g1, g2, g3
where g1 and g2 are source registers and g3 is the destination 
register.
The code can be understood by noting the following features about 
SPARC architecture and instruction set. 

Register Window:
----------------
SPARC is a RISC machine. An important feature of SPARC is register windowing.
When a program is running, it has access to 32 32-bit processor registers which 
include eight global registers plus 24 registers that belong to the current
register window. The first 8 registers in the window are called the `IN' registers (i0-i7). 
When a function is called, these registers may contain arguments that
can be used. The next 8 are local registers which are scratch registers that can
be used for anything while the function executes. The last 8 registers are the 
`OUT' registers (o0-o7)which the function uses to pass arguments to functions 
that it calls.

                                                                           
                                                                           
         R0  +-------+                 R0  +-------+
             |       |                     |       |
             | global|                     | global|
             |  reg. |                     | reg.  |
             |       |                     |       |
         R7  +-------+                 R7  +-------+
                                                                           
             +-------+                     +-------+
             |       |                     |       |
             |       |                     |       |
             |       |                     |       |
             |       |                     +-------+ <--- CWP
             |       |                 R8  |       |
             |       |                     |  out  |
             |       |                     |  reg. |
             |       |                     |       |
             |       |                R15  +-------+
             |       |                R16  |       |
             |       |                     | local |
             |       |                     |  reg. |
             |       |                     |       |
          R8 +-------+ <--- CWP       R23  +-------+
             |   o0  |                R24  |  i0   |
             |   ::  |                     |  ::   |
             |   o5  |                     |  i5   |
             |   sp  |                     |  fp   |
             |  temp |                R31  |  lr   |    (link reg)
        R15  +-------+                     +-------+
        R16  |       |                     |       |
             | local |                     |       |
             |  reg. |                     |       |
             |       |                     |       |
        R23  +-------+                     +-------+
        R24  |       |                     |       |
             |   in  |                     |       |
             |  reg. |                     |       |
             |       |                     |       |
        R31  +-------+                     +-------+
             |       |                     |       |

When one function calls another, the callee can choose to execute a SAVE 
instruction. This instruction decrements current workspace pointer (CWP), 
a pointer to current register window, shifting the window downward. 
The caller's OUT registers then become the callee's IN registers, and 
the callee gets a new set of local and OUT
registers for its own use. Only the pointer changes because the registers 
and return address do not need to be stored on a stack.
Similarly the RESTORE instruction (executed upon a return from function)
restores the register pointer to the caller's register window. 

A few special registers in the SPARC register window are:
 o6 is sp  in caller / i6 is fp  in callee;
 07 in caller/i7 in callee is return addr. register 
                           (a.k.a. link register) 

SPARC's delayed branches:
------------------------

SPARC handles branching in a very interesting way. Delayed branch means that 
the instruction following a branch instruction is executed (irrespective of 
whether the branch is taken or not)  while the processor prepares to transfer 
control to the destination.  Delayed branching  is implemented in order to 
avoid "pipeline stalls" (wasted cycles) due to branching. (See discussion
on Control Hazards in Pipelining to come in a future class/lecture notes)

-------------------------------------------------------------------------------

The program is organized into the 'text' and 'data' sections corresponding to
the code and global data of the program. The data section in turn consists of 
the initialized global data and the bss section for uninitialized global data.

The sethi instruction:
---------------------
This instruction moves the most significant 22 bits of a constant into a 
specified register, the low order 10 bits of the register are cleared (set 
to 0). Since the length of an instruction in SPARC is 32-bits, a 32-bit
address cannot be directly moved with a single instruction. The approach used in
SPARC is to move a 32-bit address using two instructions, one which moves the
first 22 bits using the sethi instruction to a register, and then adding the 
least significant 10 bits to that register.  

The ld instruction: 
------------------
This instruction loads a word from memory to a specified register. The first 
operand specifies a memory address and the second operand (destination) 
specifies the register. The ldd(load double) instruction moves 
two consecutive words into the register pair denoted by the second operand. 

The store instruction:
---------------------
This instruction stores a word into a memory location. The first operand is a 
register containing the word, and the second operand specifies a memory location.
The std instruction stores a double word. 

-----------------------------------------------------------------------------

.section ".text", #alloc, #execinstr
.file  "t1.c"

.section ".data1", #alloc, #write
.align 4

.L147:
.ascii "Sum = %4.2f\n\000"              ;; format string used in printf

.section ".bss", #alloc, #write
.common a,800,8                         ;; alloc. for array a (100 x 8)

.section ".text", #alloc, #execinstr

/* 00000       0 */   .align 8
!
!   CONSTANT POOL
!
  .L_const_seg_900000101:
/* 000000 0 */   .word 0, 0
/* 0x0008 0 */   .align 4

!
!  SUBROUTINE main
!
! OFFSET SOURCE LINE LABEL INSTRUCTION

                   .global main
                 main:

/* 000000      */        save %sp ,-112, %sp  ;; alloc. in stack for function 
           ;;  frame -- for storing local 
           ;;  vars. functions.

! FILE t1.c

! 1 !#include<stdio.h>
! 2 !#include<math.h>
! 4 !double a[100];
! 6 !main()
! 7 !{
! 8 !    int i;
! 9 !    double sum;
!       11 !    for(i=0, sum=0.0; i < 100; i++)

/* 0x0004   */             sethi          %hi (.L_const_seg_90000101), %g2
/* 0x0008   */             or             %g0, 0, %i1
/* 0x000c   */             ldd            [%g2+%lo (.L_const_seg_900000101)], %f30
/* 0x0010   */             sethi          %hi(a), %g2


!  12 ! {
! 13 !     a[i] = sqrt(a[i]);


/* 0x0014   13  */         ld             [%g2+%lo(a)], %o0   !volatile
/* 0x0018   11  */         add            %g2, %lo(a), %i0

 .L(00000110:
/* 0x001c   13  */     ld             [%i0+4], %o1  !volatile
!     !         sum += a[i];

/* 0x0020   14  */         add            %i1, 1, %i1
/* 0x0024   13  */         call           sqrt           ;; params = %o0 %o1 
                                                         ;; Result = %f0


/* 0x0028       */         std            %f30, [%sp + 96]      
/* 0x002c       */    st             %f0, [%i0]     ;; volatile
/* 0x0030   14  */         cmp            %i1, 100
/* 0x0034   13  */         st             %f1, [%i0 + 4] ;; volatile
/* 0x0038   14  */         ldd            [%sp + 96], %f30
/* 0x003c       */         ld             [%i0], %f6     ;; volatile
/* 0x0040       */         ld             [%i0+4], %f7   ;; volatile
/* 0x0044       */         add            %i0, 8, %i0
/* 0x0048       */         faddd          %f30, %f6, %f30
/* 0x004c       */         bl, a          .L900000110
/* 0x0050       */         ld             [%i0], %o0     ;; volatile

        .L77000005:

! 15       ! }
!       16       ! printf("Sum: %4.2f\n", sum);

/* 0x0054   16  */         st             %f30, [%sp + 108]
/* 0x0058       */         sethi          %hi(.L147), %g2
/* 0x005c       */         st             %f31, [%sp + 104]
/* 0x0060       */         add            %g2, %lo(.L147), %i0
/* 0x0064       */         ld             [%sp + 108], %i1
/* 0x0068       */         ld             [%sp + 104], %i2
/* 0x006c       */         call           printf       ;; params = %i0 %i1 %i2 
                                                       ;; Result = ! (tail call)

/* 0x0070       */         restore        %g0, %g0, %g0
/* 0x0074    0  */         .type          main, 2
/* 0x0074       */         .size          main, (.-main)
/* 0x0074    0  */         .global        _fsr_init_value
/* 0x0074       */         _fsr_init_value =  0
 

No comments:

Post a Comment