Georg-Johann Lay <address@hidden> wrote:
The algo is rather slow because it always iterates over all
digits, i.e. it won't run faster for small numbers.
Have fun!
Code size is ~140 bytes.
Well, it's bigger (140 > 124), slower, and doesn't handle sizes *other*
than 64 bits, so that's not terribly useful.
I think you could shrink it a bit, replacing these 16 instructions of messy
digit output code (why are you looping incrementing DIGIT2 when you know it
is never more than 1?):
clr DIGIT2
1:
inc DIGIT2
subi DIGIT, 10
brcc 1b
brts 2f
;; T = 0 is the first round. Output the high digit if it's not '0'.
set
subi DIGIT2, 1-'0'
;; Initialize nonZero. We only output digits if we saw a digit != '0'.
mov nonZero, DIGIT2
cpi nonZero, '0'
breq 2f
st X+, DIGIT2
2:
;; Output digits except the highest (except that for 10^19).
subi DIGIT, -10-'0'
or nonZero, DIGIT
;; We only output digits if we saw a digit != '0', i.e. strip leading
'0's.
cpi nonZero, '0'
breq 3f
st X+, DIGIT
With these 9 instructions:
cpi DIGIT, 10 ;; First "digit" can be as high as 18
brcs 2f
ldi nonZero, '1' ;; '1' is non-zero, which is perfect
st X+, nonZero
subi DIGIT, 10
2:
or nonZero, DIGIT
breq 3f ;; Don't print leading zeros
subi DIGIT, -10-'0'
st X+, DIGIT
3:
With this, you can also delete the leading clt. It eliminates DIGIT2,
but unfortunately that doesn't save a spill. You also have to adapt
the final "lone zero" printing code to print if nonZero == 0, but that's
the same size.
Also, this is just silly:
dec Count
cpse Count, Zero
rjmp .Loop
"dec" sets the zero flag, so that can just be "dec Count $ brnz .Loop".
And finally, your multiply loop is wasting two instructions:
mul A0,Ten $ mov A0,r0 $ add A0,Cy $ mov Cy,r1 $ adc Cy,Zero
mov __tmp_reg__,A0
mov A0,A1 $ mov A1,A2 $ mov A2,A3 $ mov A3,A4
mov A4,A5 $ mov A5,A6 $ mov A6,A7 $ mov A7,__tmp_reg__
"mov A0,r0" and "mov __tmp_reg__,A0" are cancelling each other out
and should both be deleted (with the "A0 += Cy" adjusted to add to r0,
of course). Just make it:
mul A0,Ten $ mov A0,r0 $ add r0,Cy $ adc r1,Zero $ mov Cy,r1
mov A0,A1 $ mov A1,A2 $ mov A2,A3 $ mov A3,A4
mov A4,A5 $ mov A5,A6 $ mov A6,A7 $ mov A7,r0
That saves 22 bytes, leaving it 6 bytes smaller than mine. Nice to have
available!