内容简介:Not the answer you're looking for? Browse other questions taggedz80
If you try translating C into Z80, you'll see that Z80 index registers and stack don't behave quite as you expect. So, let us begin with
Arrays
Suppose you have a standard C construction
int c[10]; for (int i=0; i<10; i++) c[i]=0;
Your compiler is pretty much required to use 16-bit value for i. So, you have &c somewhere, maybe even in your index register!, so let us have IX=&c. However, the operations with index registers only allow constant offsets, which are also single signed bytes. So, you do not have a command to read from (IX+16-bit value in a register). Thus, you would end up using things like
ld ix,c_addr ; the array address ld de,(i_addr) ; the counter value add ix,de ld a,0 ld (ix+0),a ; 14+20+15+7+19 = 75t (per byte)
Most compilers will output code that is pretty close to what I wrote. Actually, experienced Z80 programmers know - IX and IY are hopeless for most operations with memory - they are far too slow and awkward. A good compiler writer would probably make his/her compiler do something like
ld hl,c_addr ; the array address ld de,(i_addr) ; the counter value add hl,de ld a,0 ld (hl),a ; 10+20+11+7+7 = 55t (per byte)
which is 25% faster without breaking a sweat. Nevertheless, this is far from great Z80 code even though I made my i variable static to make my - and the compiler's - life easier!.
A good Z80 programmer would simply write the equivalent loop as
ld hl,c_addr ld b,10 xor a loop: ld (hl),a inc hl djnz loop
The actual full loop would take (7+6+13)*10-5 = 255/10 ~ 25.5 t-states per byte. And this is really not optimized code, this is a kind of code one writes where optimization does not matter. One can do partial unrolling, one can make sure that array c does not cross 256 byte boundaries and replace INC HL by INC L. The fastest filling is actually done using the stack. In other words, Z80 does not fit the C paradigm.
Of course, one can write a similar loop in C (using a pointer instead of an array, using countdown loop instead of counting up), which would then increase chances of it being translated into a decent Z80 code. However, this would not be you writing a regular C code; this would be you trying to work around limitations of C when it is meant to be translated into Z80.
Let me give you another example.
Local variables.
Raffzahn is correct when he says that one does not have to use stack for local variables. But there must be a stack of some kind if you want recursive functions. So let us try to do it the PC way, via the stack. How do you implement a call to something like
int inc(int x) { return x+1; }
Suppose even that current value for x is in one of your registers, say HL. So, you'd have something like
push hl call addr_inc ...
How do we actually recover the address (and value) of x? It is stored at SP+2. However, we have to be careful with SP, because we want to return back to the calling program, so maybe we do something like
addr_inc: ld hl,2 add hl,sp ld e,(hl) inc hl ld d,(hl) ; 10+11+7+6+7 = 41t
Now we have x in DE. You can see how much work this was.
So, when people complain about C compilers for Z80, they do not mean it would not be possible to do. It is something else entirely. In any kind of programming, there are patterns, some are good, some are not so good. My point is, a lot of things that C does are simply bad patterns from the point of view of Z80 coding. One simply does not do things on Z80 that C pretty much requires you to be fluent at.
share | improve this answer | |
answered Mar 28 '18 at 19:45
-
5
IDK about Z80 but if the compiler uses 16-bit for such
i
values then it's a garbage compiler. Most modern compilers for 8-bit microcontrollers know to optimize for those cases when you don't takei
's address – phuclv Mar 29 '18 at 4:30 -
13
Re, "Your compiler is pretty much required to use 16-bit value for i." Simply not true. Any modern compiler would be smart enough to know that the values of
i
in your example all fall in the range 0..9, and any modern compiler would be smart enough to allocate whatever register was the most appropriate to hold those values and use them as array indices. The only question is, whether any compiler exists with that much smarts, and the ability to target the Z80. – Solomon Slow Mar 29 '18 at 14:13 -
5
– supercat Mar 29 '18 at 18:46
-
7
Compilers already know how to turn array-indexing into pointer-increments, and do so to save a register, and to reduce the size of the instruction on x86 (where an index takes an extra byte). Also other advantages, like not breaking micro-fusion on Sandybridge-family or being able to use the port7 AGU on Haswell for stores. It's entirely reasonable to expect a compiler to make a loop like your
inc hl
/djnz loop
for this case where the trip-count is a compile-time constant. Somewhat reasonable otherwise. – Peter Cordes Mar 30 '18 at 3:51 -
6
@phuclv Most modern compilers for 8-bit microcontrollers know to optimize for those cases when you don't take i's address -- modern 8 bit microcontrollers typically have somewhere between 32 and 128 general purpose registers. The Z80 has 6(ish), and 2 of those basically have to be reserved for use as a pointer for almost all nontrivial code. This gives compilers for those architectures a lot more scope to optimize. – Jules Jun 19 '18 at 21:40
|
The main downside of "historic" CPU's (non?)-suitability for C programs is the lack of capability to form more than one register into an address without using the ALU.
Most more modern CPUs can use base + index + offset register addressing modes to address complex data structures like arrays and structures - The Z80 needs to painstakingly go through the 4-bit ALU to add an offset + an index to a base register like HL - most modern CPUs use separate address calculation instances for the various addressing modes.
Another reason is the lack of real multipurpose registers - You simply cannot do everything with every register in the Z80 - Its pure register count is somewhat impressive, but using the alternate register set is probably too complicated for a compiler, and thus the possible choice of registers for a compiler is limited. This is even more valid for the 6502 that has even fewer registers.
Yet another downside is: You can't get a decently modern C compiler for the Z80 - clang or GCC with their aggressive optimizers don't bother for this old CPUs, and hobbyists' produces are just not that sophisticated. Even if you could, GCC and clang concentrate to optimize for code locality , something a CPU without a cache can't even benefit from, but really boosts a modern CPU.
I personally don't think (even non-optimal) compilers would be useless for old CPUs - There is always a lot of stuff in a program that isn't fun to do anyhow and just tedious to write in assembler (and after all, the only reason why we would still do this would be fun, wouldn't it?) - So I tend to write the boring, non-time-critical parts of a program in C, the other, the "fun" part in assembly. Perfect of both worlds.
share | improve this answer | |
-
6
– Rich Mar 29 '18 at 2:54
-
3
@LưuVĩnhPhúc What do you consider
LDRLS x,[r1,r0,LSL #2]
then (ARM)? – tofro Mar 29 '18 at 5:14 -
4
I'm not familiar with ARM ISA but Even though the ARM is a RISC architecture, it does not strictly follow the RISC principles as does the MIPS... In addition, it provides a large number of addressing modes and uses a somewhat complex instruction format – phuclv Mar 29 '18 at 8:55
-
4
ARM is not really a RISC ISA. It's somewhat RISCy, or shares some of their features, like fixed-width instructions (except Thumb2...), but an ISA with an instruction that does anywhere from 1 to 16 loads or stores depending on bits in a bit-field in the instruction is not a RISC. (I'm talking about ARM's
push {r4, r5, r6, ..., lr}
aka STMDB and correspondingpop
instruction. The load/store-multiple instructions are microcoded because they're too complex and do a variable amount of work. – Peter Cordes Mar 30 '18 at 3:22 -
6
– Raffzahn Mar 30 '18 at 12:35
|
Quite often people don't know how to use the compilers or don't understand fully the consequences of code they write. There is optimization going on in the z80 c compilers but it's not as complete as, say, gcc. And I often see people fail to turn up the optimization when they compile.
There is an example here in introspec's post that I am not allowed to comment on due to reputation points:
char i,data[10]; void main(void) { for (i=0; i<10; i++) data[i]=0; }
There are lots of problems with this code that he is not considering. By declaring i as char, he's possibly making it signed (that is the compiler's discretion). That means, in comparisons, the 8-bit quantity is sign extended before being compared because normally, unless you specify in code properly, the c compiler may promote to ints before doing those comparisons. And by making it global, he makes sure the compiler cannot hold the for-loop index in a register inside the loop.
There are two c compilers in z88dk. One is sccz80 which is the most advanced iteration of Ron Cain's original compiler from the late 70s; it's mostly C90 now. This compiler is not an optimizing compiler - it's intention is to generate small code instead. So you will see many compiler primitives being carried out in subroutine calls. The idea behind it is that z88dk provides a substantial c library that is written entirely in asm language so the c compiler is intended to produce glue code while the execution time is spent in hand-written assembler.
The other c compiler is a fork of sdcc called zsdcc. This one has been improved on and produces better & smaller code than sdcc itself does. sdcc is an optimizing compiler but it tends to produce larger code than sccz80 and overuses the z80's index registers. The version in z88dk, zsdcc, fixes many of these sorts of issues and now produces comparable code size to sccz80 when the --opt-code-size switch is used.
This is what I get for the above when I compile using sccz80:
zcc +zx -vn -a -clib=new test.c
(the -O3 switch is for code size reduction but I prefer the default -O2 most of the time)
._main ld hl,0 ;const ld a,l ld (_i),a jp i_4 .i_2 ld hl,_i call l_gchar inc hl ld a,l ld (_i),a dec hl .i_4 ld hl,_i call l_gchar ld de,10 ;const ex de,hl call l_lt jp nc,i_3 ld hl,_data push hl ld hl,_i call l_gchar pop de add hl,de ld (hl),#(0 % 256) ld l,(hl) ld h,0 jp i_2 .i_3 ret
Here you see the subroutine calls for compiler primitives and the fact the compiler is forced to use memory to hold the for-loop index. "l_lt" is a signed comparison.
A zsdcc compile with optimization turned up:
zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c
_main: ld hl,_i ld (hl),0x00 l_main_00102: ld hl,(_i) ld h,0x00 ld bc,_data add hl,bc xor a,a ld (hl),a ld hl,_i ld a,(hl) inc a ld (hl),a sub a,0x0a jr C,l_main_00102 ret
By default char is unsigned in zsdcc and it's noticed that the comparison "i<10" can be done in 8-bits. C rules say both sides should be promoted to int but it's ok not to do that if the compiler can figure out the comparison can be equivalently done another way. When you don't specify that your chars are unsigned, this promotion can lead to insertion of sign extension code.
If I now make the char explicitly unsigned and declare i inside the for-loop:
unsigned char data[10]; void main(void) { for (unsigned char i=0; i<10; i++) data[i]=0; }
sccz80 does this:
zcc +zx -vn -a -clib=new test.c
._main dec sp pop hl ld l,#(0 % 256) push hl jp i_4 .i_2 ld hl,0 ;const add hl,sp inc (hl) .i_4 ld hl,0 ;const add hl,sp ld a,(hl) cp #(10 % 256) jp nc,i_3 ld de,_data ld hl,2-2 ;const add hl,sp ld l,(hl) ld h,0 add hl,de ld (hl),#(0 % 256 % 256) ld l,(hl) ld h,0 jp i_2 .i_3 inc sp ret
The comparison is now 8-bit and no subroutine calls are used. However, sccz80 cannot put the index i into a register - it does not carry enough information to do that so it instead makes it a stack variable.
The same for zsdcc:
zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c
_main: ld bc,_data+0 ld e,0x00 l_main_00103: ld a, e sub a,0x0a ret NC ld l,e ld h,0x00 add hl, bc ld (hl),0x00 inc e jr l_main_00103
Comparisons are unsigned and 8-bit. The for loop variable is kept in register E.
What about if we walk the array instead of indexing it?
unsigned char data[10]; void main(void) { for (unsigned char *p = data; p != data+10; ++p) *p = 0; }
zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c
_main: ld bc,_data l_main_00103: ld a, c sub a,+((_data+0x000a) & 0xFF) jr NZ,l_main_00116 ld a, b sub a,+((_data+0x000a) / 256) jr Z,l_main_00105 l_main_00116: xor a, a ld (bc), a inc bc jr l_main_00103 l_main_00105: ret
The pointer is held in BC, the end condition is a 16-bit comparison and the result is the main loop takes about the same amount of time.
Then the question is why isn't this done with a memset?
#include <string.h> unsigned char data[10]; void main(void) { memset(data, 0, 10); }
zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c
_main: ld b,0x0a ld hl,_data l_main_00103: ld (hl),0x00 inc hl djnz l_main_00103 ret
For larger transfers this becomes an inlined ldir.
In general the c compilers cannot currently generate the common z80 cisc instructions ldir, cpir, djnz, etc but they do in certain circumstances as shown above. They are also not able to use the exx set. However, the substantial c library that comes with z88dk does make full use of the z80 architecture so anyone using the library will benefit from asm level performance (sdcc's own library is written in c so is not at the same performance level). However, beginner c programmers are usually not using the library either because they're not familiar with it and that's on top of making performance mistakes when they don't understand how the c maps to the underlying processor.
The c compilers are not able to do everything, however they're not helpless either. To get the best code out, you have to understand the consequences of the kind of c code you write and not just throw something together.
share | improve this answer | |
-
6
– wizzwizz4 ♦ Mar 30 '18 at 19:17
-
2
Lovely answer! I'd like to add here that I made my variable global specifically to confirm to the recommendations on z88dk website: item 2 at z88dk.org/wiki/doku.php?id=optimization I am not using memset intentionally because there is no ready-made memset for every small loop that you write, so it is the generic behaviour on compiler on small loops that concerns me. – introspec Mar 31 '18 at 8:45
-
7
And by making it global, he makes sure the compiler cannot hold the for-loop index in a register inside the loop.
Again this is purely a limitation of compilers that don't know how to optimize well. It's notvolatile
, and the compiler can prove the stores intodata[]
don't alias it (because it's also a global array, not a pointer, and the compiler knows that two globals don't overlap each other). So the compiler is allowed to sink the stores to the counter out of the loop and do one store of10
after the loop. The "as-if" rule allows compile-time reordering of loads/stores. – Peter Cordes Mar 31 '18 at 19:54 -
3
But well spotted, that is a seriously bad way to write code that makes life difficult for compilers. It's disappointing (but not too surprising considering their age) that real Z80 compilers can't do that optimization, or turn simple array indexing into pointer increments.
gcc
could turn the loop into amemset
call and/or inline known good memset code :P – Peter Cordes Mar 31 '18 at 19:56 -
1
– introspec Apr 1 '18 at 10:41
|
Simple answers one easily gets to this question are The Z80 Sucks and C Sucks - depending on the side someone is on. While they are of course, untrue (*1), there are real issues. A major argument for both sides is that
-
C is at core tied to a PDP-11(ish) CPU architecture and the Z80 isn't one.
-
The Z80 is a rather special CPU, created with a focus on maxing abilities, not beauty.
-
C is a language without, or at best a very minimum runtime (*2).
All these points are linked. Like the question mentioned, C implies a simple and rather symmetric pointer model which is originated in what the PDP-11 offered. This includes the direct conversion to a memory address which in turn allowed to skip the creation of a more sophisticated data model and the use of pointers to realize functions that would otherwise be handled by some language runtime.
Now the Z80 is (like its predecessor, the 8080) quite able to perform everything needed. Due to its (inherited) structure of a single memory pointer it does, however, need to replace a single (PDP-11 based) C-operation with several machine instructions. So far not a real issue. Except, when an assembly programmer looks at the result, he immediately sees Z80 specific ways to improve the result - like holding two pointers and exchanging HL/DE when needed. That's hard to 'understand' for a C compiler, as it is based on semantics - the knowledge 'why' something is done - not just being told 'how' it's done.
It is not strictly a C problem,
but an issue with all high-level languages. They compile best to a simple symmetric CPU model with a set of equal resources, offering exactly the operations the abstraction layer needs. The higher the language's abstraction is, the better the underlying 'CPU' level can perform. That's why the UCSD P-Code System did perform so well across many platforms. The offering of its virtual CPU was exactly what a compiler wants. Despite being an interpreter at the core, performance was, on many machines, comparable to native code generated from the same language source. The reason for this platform optimization lies within the interpreter. Here, each rather abstract function gets performed by optimized routines. A string move might have the same invocation (due to the P-Code) across all platforms, but its implementation is CPU specific, using all advantages the specific CPU offers - like the mentioned working of 8-bit register pointers and only increasing memory base pointers every 256th cycle on a 6502. Operating on a greater abstraction in a language allows the compiler and/or runtime do employ greater optimization than fixing low level-detail within the source code.
C, in turn, exaggerates this by being tied to very specific low-level operations and using them all over and in every application source. Much without an intermediate runtime layer. In this respect, C is way less a high -level language than others, and way more prone to CPU specific issues.
Learning from History
Looking back (*3), the last 30 years do show two developments to bridge the problems of less than 'simple' CPUs and too simple languages. The 8086 family is not only an important, but eventually the best, example for changes in CPUs, as it is a not a simple CPU at first. Sure, compared to the Z80, it is much more powerful and symmetric - still, not as simple as C assumes it to be.
Over time, the x86 got not only instruction set additions such as scaling factors to move array indexing calculations into microcode, but the whole CPU got redesigned in a way that instruction sequences are analyzed, reordered and reformed to make C-like operations perform better. Bottom Line, the 8086 became more PDP-11ish. One way to close the gap.
At the same time, the C Standard development worked hard to define a common set of data types and functions thereon that now can be used by the compiler to get a glimpse of the why instead of the how . These source statements (may) no longer be directly translated into function calls, but be used by the compiler to generate different, more specialized, target optimized code. In the end, a way to make C a bit more high level than originally intended.
What's the Lesson for Z80 Users?
Well, one might be not using C at all :) (*4)
Another, more practical, way is to go the same path that standard C is doing: Use more task-specific high-level functions and optimize them (in assembly) for the Z80.
The last would be to optimize existing C compilers for the Z80 to generate a more CPU-embracing code structure. For example with different ways of parameter passing depending on functions' use and so on.
BTW: The 6502's short call stack is often cited here, but there is no relation to C. C doesn't require the usage of the return stack for parameters. It can as well be a separate parameter stack. In fact, strictly speaking, C doesn't require a stack at all.
C does require a way of bookkeeping for nested calls, some way of parameter passing (with undefined length) and a way to handle local variables. How this is done is up to the compiler (or its creator). Using some hardware stack is one (simple) way, but not necessarily the best with a given CPU.
*1 - As a 6502 and Assembly guy I do feel deep down they are not false :))
*2 - No, the C-LIB isn't a runtime as part of the language: it is a collection of standard functions, itself (almost) completely written in C, and compiled/linked at compile time.
*3 - Looking back is rather rare in IT, but we are Retrocomputing - we not only play nostalgia but also try to learn from history, don't we?
*4 - A serious choice could be Ada . Due its declarative nature, code generation can be way better optimized for individual CPUs. After all, it was one of the main goals of Ada's development to be able to produce good code no only for mainframes but also for little bastards like an 8048. There have been several special Z80 compilers during the 1980s; most prominent may be RR Software 's Janus/Ada 83 . While no longer mentioned, there was also a Z80 version.
share | improve this answer | |
answered Mar 28 '18 at 17:38
-
7
okay but not ADA, Ada. It's a noun, not initials. – Jean-François Fabre Mar 29 '18 at 8:02
-
5
– Max Barraclough Mar 29 '18 at 14:55
-
5
The Z80 sucks a bit and C sucks a bit, but contemporary C compilers sucked a lot . Yesterday I tried compiling a simple C program with Hisoft C on a Spectrum +3. What a pain! And the code sucked. A much better compiler could be developed, but it would take a lot more effort (and be less enjoyable) than just continuing to code in assembler. – Bruce Abbott Mar 29 '18 at 19:33
-
3
– T.E.D. Mar 30 '18 at 15:46
-
3
@JdeBP I think you try to argue with today's compiler technology and philosophy against stuff from 20, 30 years ago. The use of compiler intrinsics for standard features like
memcpy
et all, for example, only started seriously about 10, 15 years ago. So, for a today gcc or clang, you are absolutely right. For a HISOFT C compiler in 1985, quite not so. – tofro Mar 31 '18 at 12:55
|
The Motorola 6809 is probably the only legacy CPU of the 80's which is well suited for C compiler, thanks to several advanced features (for the time) : - orthogonal instruction set - rich addressing mode - hardware multiplier, to quickly compute addresses - position independant code
This kind of CPU (and the improved 6309) can be find in some home computers (Vectrex, Tandy Coco, Thomson, ...) and a lot of embedded systems.
share | improve this answer | |
-
3
– Chenmunka ♦ Mar 29 '18 at 15:28
-
@Chenmunka that would be a strange argument, as C wasn't any important language back then . Even less a reason to make a CPU fit it. But yes, the 6809 was (much like the 8086) especially designed with high level languages producing linkable modularized code in mind. – Raffzahn Mar 29 '18 at 20:19
-
1
This could answer the question with a little re-wording. These are features that the 6809 had that made it well suited to C, but what features does the Z80 not have that makes it not well suited? – wizzwizz4 ♦ Mar 31 '18 at 8:10
-
– Chris Stratton Mar 31 '18 at 19:41
-
@ChrisStratton: Perhaps he meant the one 8-bit CPU of that era. Microchip has added some features to some of their 8-bit line in an effort to make them compiler-friendly, though IMHO they made some significant missteps in their design. – supercat Mar 31 '18 at 23:41
|
Well, I personally find it annoying reading so many comments here about what modern compilers can and cannot easily do. It is terrible what wishful thinking does to your brain. OK. Let me show why people who still remember how to code Z80 hate C compilers. This is a trivial C code that I was hoping to compile:
int i,data[10]; main() { for (i=0; i<10; i++) data[i]=0; }
This is the Z88DK output using zcc -O3 -a trivial.c
:
._main ld hl,0 ;const ; i=0 ld (_i),hl jp i_5 .i_3 ld hl,(_i) ; i++ inc hl ld (_i),hl dec hl .i_5 ld hl,(_i) ; if i>=10 GOTO i_4 ld de,10 ;const ex de,hl call l_lt jp nc,i_4 ld hl,_data ; HL = data + i push hl ld hl,(_i) add hl,hl pop de add hl,de ld de,0 ;const ; (HL) = DE ex de,hl call l_pint jp i_3 .i_4 ret
I am not counting t-states and not including the code in the case when i
and data[10]
are declared as char
, because I do not have a goal to embarrass the compiler authors.
OK, maybe SDCC can do better? At least it can deal with char data type in a sane way. So we create
char i,data[10]; main() { for (i=0; i<10; i++) data[i]=0; }
and SDCC compiles it using sdcc -mz80 --opt-code-speed
into
;trivial.c:21: for (i=0; i<10; i++) ld hl,#_i + 0 ld (hl), #0x00 ld bc,#_data+0 00102$: ;trivial.c:22: data[i]=0; ld hl,(_i) ld h,#0x00 add hl,bc ld (hl),#0x00 ;trivial.c:21: for (i=0; i<10; i++) ld iy,#_i inc 0 (iy) ld a,0 (iy) sub a, #0x0a jr C,00102$
So, the addition of char to pointer is done in 16 bits, the index registers are used for some unknown reason, but otherwise this at least begins to look like an assembly program. So, if I ignore the preamble and just count t-states per iteration of the main loop from 00102$
:
16+7+11+10 + 14+23+19+7+12 = 119 t-states per byte
As a comparison, this is what a relatively inefficient assembly code may look like (I wrote this very closely to what my C for-loop implies, so that compiler at least has a chance of getting this right):
ld hl,data_addr ld a,0 loop: ld (hl),0 inc hl inc a cp 10 jr nz,loop ; 10+6+4+7+12 = 39t
If counter is allowed to go in the opposite direction, a similar loop in my other answer to this question does the job in 25.5 t-states per byte. The fastest Z80 code for memory filling can average below 10 t-states per byte, but this is not an exercise in memory-filling, this is a simple test of what some trivially simple code tends to be compiles into.
So, this is my brutally honest answer to your question why people like myself say that C compilers for Z80 produce poor code: BECAUSE THEY DO.
share | improve this answer | |
answered Mar 29 '18 at 20:10
-
Just to finish off the thought, presumably if you were writing itself you'd store a zero byte then
LDIR
the rest? Without being explicit, it's not likely to be clear to everyone why 119 is a bad number. – Tommy Mar 29 '18 at 20:12 -
1
Actually, not too worth getting involved in whether a C compiler should use
LDIR
here, because I think the answer is likely to be: it should, but you should usememset
or some other overly-specific take on the example when the point is clear as is. But I just meant: to the casual reader, coming along and reading this answer, you assert that the generated code is awful — and I'm not disputing that — but it might be more convincing if you showed non-awful code for comparison. That's all. No dispute as to information and data stated. – Tommy Mar 29 '18 at 20:23 -
2
– cup Mar 30 '18 at 5:16
-
1
@introspec: my comments on other answers saying what modern compilers (e.g. for x86) can do were making the same point that you are here. Efficient compilation would be possible given a smart optimizing compiler, so the terrible code-gen from real Z80 compilers is more a result of massive missed-optimizations, not of C being inherently impossible to compile efficiently (although C source with multiple pointers used at once would be a problem!) – Peter Cordes Mar 30 '18 at 15:55
-
1
e.g. a Z80 backend for modern gcc or LLVM could do a lot better cross-compiling from a powerful computer (if anyone put in the amount of development time it would take to find target-specific optimizations, too), vs. real historical Z80 compilers. Writing an optimizing compiler is a huge challenge / amount of work . My point was always that compilers could do whatever optimizations (and do for x86 / ARM / whatever), not that any good Z80 compilers exist or could be made easily. – Peter Cordes Mar 30 '18 at 15:58
|
While the Z80 is definitely an 8-bit processor rather than a 16-bit one, the instruction set makes some operations easier with 16-bit values than 8-bit values. For example, given something like: a=b+c+d; with all variables being 16 bit types and having static duration could be realized as:
ld hl,(_b) ld de,(_c) add hl,de ld de,(_d) add hl,de ld (_a),hl
but trying to do it as 8 bits would require a different approach:
ld a,(_b) ld hl,_c add a,(hl) ld hl,_d add a,(hl) ld (_e),a
It's possible to generate efficient code if all operations use 8-bit math or if all use 16-bit math, but 8-bit and 16-bit operations require totally different approaches, and trying to combine them gets awkward (e.g. if b
and c
were 16-bit values, but d
was an 8-bit one, the most efficient way to add d
would be to load it and the following byte into DE, then clear D, and then add DE to HL). If a compiler wants to try to handle 8-bit math efficiently, it will have to use code generation logic that's very different from what's needed for 16-bit math, and a lot of compiler writers aren't going to want to massively increase the size of their code generator for that.
share | improve this answer | |
-
Interesting elements, answer requires proofreading. Typo misses closing parenthesis in first code paragraph. In second paragraph,
mov
is not Z80 asm keyword, and there's no addition at all, so that code can't do what it's supposed to do. Can you clarify? Thanks. – Stéphane Gourichon Sep 17 '18 at 9:16 -
@StéphaneGourichon: Does that make more sense? – supercat Sep 17 '18 at 14:46
-
That's better yet addresses only 2 of the 3 items in my comment:
mov
is not part the usual Z80 ASM syntax. The second paragraph of code still does not make sense. Only one Z80 ADD operation cannot add b+c+d. – Stéphane Gourichon Sep 17 '18 at 15:04 -
1
@StéphaneGourichon: Incidentally, after writing the answer above, I discovered that the Z80 has some 8-bit and even 16-bit internal data paths, its primary ALU is only 4 bits. An instruction like
INC HL
uses a 16-bit limited-purpose ALU which takes two cycles to perform an operation, butINC HL
takes six cycles because that ALU gets used twice during each instruction fetch (once to increment PC, and once to increment R), thus requiring that two cycles actually performing the operation get added to that. – supercat Sep 17 '18 at 20:12 -
1
@StéphaneGourichon: Something line
INC A
actually requires using the four-bit ALU twice, but it's faster thanINC HL
because both operations can be done at the same time as the 16-bit ALU is being used to increment PC and R. – supercat Sep 17 '18 at 20:13
|
The answer to this question must be opinion-based anyway, and written by the specialist who was designing Z80 C compiler. I will give it a try though.
I used MSX-C compiler made by ASCII together with Microsoft back in old 80-90's days; the platform was MSX. I do not recall if it used stack to pass arguments, however it would be logical given compiler can use IX and IY assigning them to stack pointer and addressing arguments by bytes through (IX+n). I am more than sure Turbo-C version 2.0 for PC XT/AT I have used back in 90s was doing the same using register BP.
One remarkable thing I recall from using MSX-C was that its output was not Z80 code, but 8080 code. Most probably compiler was originally designed for 8080, and then just ported to Z80, thus was not aware about IX and IY registers.
Regarding (IX+n) and (IY+n) commands. N is signed byte, thus you can address -128 to +127 from the base of the index register. Then, n
must be a constant, thus changing it is possible within RAM by replacing byte of the executable code, which is another level of the optimization which most probably was not considered those old days.
So what are the reasons C fits badly
My personal opinion:
- For old compiler software developed back on old days, compiler developers were (1) focusing on reliability of the compiler's job; (2) speed of compilation; also keeping in mind that (3) register set is not so big to have much optimization with it.
- For new compiler software it must be either developed by the real enthusiasts who are also experts in compilers (that is, to my knowledge, special field in computing), or have commercial interest (questionable if it is possible though these days).
So what are the reasons C fits badly
In general I would like to see example. MSX-C did job in four steps (yes, four!).
- CF.COM was parsing the C code, creating some output file;
- CG.COM was "code generator" which generated assembly language text file;
- M80.COM was creating .REL object file, which then
- linked by the L80 with other object code (e.g. libraries).
There're pros and cons for this architecture, and there should be also historical reasons. CF and CG are about 30-40KB each, thus you can not "merge" them into one executable because it will then simply not fit into the RAM (not talking about work area); M80 used human-readable assembly text files, thus programmer had an opportunity to look at assembly code and get an idea what real executable could look like and what s/he can do to improve it, or inject own assembler routines at the linking stage.
share | improve this answer | |
-
5
– Jules Mar 28 '18 at 16:15
-
2
– Anonymous Mar 28 '18 at 18:16
-
– phuclv Mar 29 '18 at 4:39
-
– phuclv Mar 29 '18 at 8:59
-
1
– Thorbjørn Ravn Andersen Dec 23 '18 at 11:50
|
Not the answer you're looking for? Browse other questions taggedz80 c compilers or ask your own question .
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
The Elements of Statistical Learning
Trevor Hastie、Robert Tibshirani、Jerome Friedman / Springer / 2009-10-1 / GBP 62.99
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and mark......一起来看看 《The Elements of Statistical Learning》 这本书的介绍吧!