PISC – Glaring Deficiencies

In the source article for my PISC project “A Minimal TTL Processor for Architecture Exploration”, by Bradford J. Rodriguez. Brad wrote the following:

Glaring Deficiencies

Many weaknesses of the PISC become evident after a short period of use, including:

a) no conditional branch microinstruction — an important need [6];
b) no provision for literal values in the microinstruction;
c) no ALU logic for multiply, divide, and right shift;
d) no logic for decoding of macroinstructions;
e) no provision for interrupts;
f) sparse coding of the ALU function select; and
g) two clocks required per microinstruction.

I have been successful in correcting a good number number of these “deficiencies” . In many cases by using Brad’s suggestion for a solution from the original article. So as a starting point towards documenting the changes I made between the original PISC 1.0a circuit and my PISC 1.0c derivative I’ll outline which of these have been solved and those that remain as “Glaring“. I’ll also provide some of my thoughts on the need for each of these features.

Glaring Deficiency “A” – No conditional branch micro-instruction. := SOLVED

Yeah, this is a biggie. Although I’ll quickly point out I have discovered that it is not totally impossible to write a useful program without such an instruction. Turns out that some very early computers built just after WWII didn’t have such an instruction either. So the early pioneers of computing programming resorted to self-modifying code. Where the program would overwrite the jump address in store. Just the thought of coding like that makes my head hurt!

So I provided some circuitry that would inhibit the write back of the ALU output to the Register File dependent on the state of the selected flag (EQ or CY). This essentially sets up the conditions required for an instruction like this:-

MOV      R7, $0200     IF CY

Which translated: copies address $0200 to Register R7, which is used as the Program Counter (PC) but only if the CY flag is set. If the CY flag is not set then the move operation (really a copy) is inhibited. Conditional jump.

Circuit details to follow.

Glaring Deficiency “B” – No provision for literal values in the micro-instruction := REMAINS (do we care?)

I started off thinking that this was pretty important issue. Also that it should be pretty easy to solve. After spending a lot of time considering various ways to add this functionality. I eventually came to the conclusion that there was probably isn’t much point.

Any circuit that I could dream up was overly expensive in terms of complexity and chip count. And I had already added one too many IC’s to Brad’s otherwise elegant design. But the real kicker was that I could not find a way to sensibly do this in a single execute cycle. It would need at least two.

Now here’s the thing. PISC v1.0a stock standard can already load a 16 bit literal value into a register in a single execute cycle. The only thing dubious about this is that it needs an entire word in memory to store the literal following the actual load instruction. Granted most of the literals programmers use are apparently quite small. So it would be a more efficient use of available memory if we could load-up say 8 bits (0-255) of data alongside an 8 bit instruction for a one word opcode. But memory in my PISC really isn’t in short supply (128K ROM, 128K RAM in 16 bit words). So the hardware complexity required to add the feature just didn’t seem worth the effort.

Since making this decision I’ve done a fair bit of coding and the kept a concerned eye on the resulting object code sizes. Yep, they are most likely larger than the same project done in Z80 or 6502 Assembler. But not orders of magnitude larger. So I stopped worrying about it.

Later I started coding in PL/0+ which generates byte-code. Which should probably be called Word-code in the context of PISC. Since my PL/0+ word-code is being interpreted by a virtual machine this allows me to encode small literals into a single word along with an instruction. The code density of the PL/0+ object files is very high. The trade off being a reduction in execution speed for an increase in code density. Problem, if there ever was one, solved.

Glaring Deficiency “C” – No ALU logic for multiply, divide, and right shift := SOLVED (Partially)

I have added a simple Shift Register board in the data path between A-Bus and the “A” input to the ALU. This is just a small bunch of 74ALS245 buffers switched around to create one of four possible outcomes:

  1. Data path normal
  2. Data is Logical Shifted Right one bit
  3. Data Arithmetic Shifted Right (a variant of case 2 requiring no additional buffers).
  4. High Byte for Low Byte swap.

So I have solved the issue of the missing right shift. The high for low byte swap in one cycle makes coding for those pesky ASCII bytes so much easier. And is the only hardware nod my PISC gives to byte sized chunks of data.

As for the missing Multiply and Divide instructions? Well I seem to remember that the Z80 and 6502 didn’t have any either. In fact I don’t think Intel gave us this until the 8086! So I just coded my own routines 🙂

Shift Register Circuit coming soon…

Glaring Deficiency “D” – No logic for decoding of macro-instructions := REMAINS (at the hardware level)

Yes, PISC has no micro-code and no micro-code sequencer. This keeps the hardware solution so miraculously, elegantly simple. So for those instructions where it would be really nice to have a “macro-instruction” I just coded this into the Assembler. So the assembler now offloads the complexity of sequencing several instructions together.

Turns out there aren’t that many instructions where I needed to do this. CALL, RET, PUSH and POP come to mind quickly but after that I’m struggling to think of another example. To give a quick illustration of  what I’m talking about. If I was to code:

RET

My PISC Assembler would generate something like:

00A3: ACEF 328: RET { rdd r5,r4 }
00A4: 2080 + inc r4
00A5: 349A + jmp r4

The three instructions required for a Return instruction.

  1. Read with post decrement the R5 stack pointer address contents into R4.
    (Stack grows upwards in PISC)
  2. Increment R4 so that it points to the location just after the original CALL instruction.
  3. Jump to the Address held in R4.

Of course all this would normally take place inside the CPU. But it would be a similar process in silicon and would likely take a similar number of cycles. In fact when I quickly checked the number of clock cycles required by my CALL and RET instructions, they seemed more or less on par with a 6502 and significantly better than a Z80. By way of quick example here are the number of clock cycles required for the return instruction for each of these three architectures:

Arch.  Mnemonic   Cycles

PISC   RET            6      (includes the Fetch cycles)
Z80    RET            10
6502  RTS            6

(The Z80 likely needs more cycles as internally it only has a 4 bit ALU).

Of course we have once again traded hardware simplicity for increased code size. But writing the Assembler source code it is no more difficult and the resulting object code in terms of clock cycles is just as fast (or faster).

Bottom line? So long as one has sufficient memory it would seem that not having the microcode baked into silicon is not such a bad thing and it sure makes it easy to change macro-instruction definitions.

Glaring Deficiency “E” – No provision for interrupts := SOLVED

Solved by implementation of one of Brad’s own suggestions in the original article. Additional hardware allows the I/O cards in the 8 bit expansion bus to raise a single shared interrupt. On detection of the Interrupt the machine switches from using R7 as the Program Counter to R6.

This means that R6 has to be pre-loaded with the memory address of the Interrupt Service Routine before interrupts are enabled. Once the ISR (Interrupt Service Routine) has run it’s course you need R6 positioned back again at the start of the ISR ready for the next cycle. Then on completion we switch back to using R7 again. Interrupt priority is supported by the slot number the card is installed in.

Sounds simple? It wasn’t! In fact this was the hardest part of the entire project. At several dark moments I almost gave up in despair of ever getting it working. After all, my Monitor program was working just fine without darn Interrupts. Does a single user, non multitasking machine really need this?

It was quite challenge working out how to save the CPU flags. Everything, both  hardware and software needed to be spot on before it would ‘fly’.  I spent much time debugging hardware when in fact the particular problem I was chasing was a software bug. We got there in the end and I don’t regret the time spent. It really is quite nice to press Ctrl-C and have the machine jump back into the BIOS (without having to poll the keyboard). The things we take for granted when using PC’s!

Glaring Deficiency “E” – Sparse coding of the ALU function := SOLVED (sort of)

I took Brad’s own suggestion of adding a single 74138 that provides eight supplementary control signals. I use the 3 bits in the control word that normally set the D-Reg (Data Bus Register) value as the selection input to the 74138. I inhibit both memory and register file read/write operations for this special “Control” function cycle.

I encode this “Control” function into the instruction by setting both MRD (Memory Read) and MWR (Memory Write) at the same time. Which would normally be quite illogical (if not damaging).

So I have not changed the actual ALU encoding as such. But I have managed to squeeze in 8 additional functions and they are:-

  1. Logical Shift Right
  2. Swap high byte for low byte
  3. Invert the Flags on the next cycle
  4. Memory Banking
  5. IRQ Enable/Disable
  6. IRQ Clear
  7. Arithmetic Shift Right
  8. Halt

Glaring Deficiency “G” – Two clocks required per micro-instruction := REMAINS

PISC runs two cycles. Fetch and Execute. In the Fetch clock cycle a fixed hardware encoded instruction is executed which loads the next program instruction from memory into a special Instruction Register. On the next clock cycle, the Execute cycle the instruction held in the Instruction Register is executed.

So 50% of the time PISC is not executing your program code but a fetch instruction to get your next program instruction. Not very efficient 🙁  But it sure is simple! 🙂

And besides, no machine of mine is going to have this “delayed branch” NOP instruction nonsense littered all throughout otherwise perfectly clean and logical code 😉 :lol

 

 

 

 

 

 

 

 

 

 

 

 

 

PISC – Addendum and Erratum

My original memory I/O board PISC 1.0a

 

During the build process of my PISC I did discover a few issues with the original circuit diagram as found here: https://bradrodriguez.com/papers/pisc.pdf

I’ll now document these discoveries.

One should perhaps mention (again) that this is largely of academic interest only. As duplication of this project would require two key Integrated Circuits. Namely 4 x 74181 ALUs and 8 x 74172 Register file chips. Sadly both of which are now so long out of production that they are nearly impossible to source.

But for the sake of completeness…

 

1. The 74LS541 being used as the 8 bit Parallel input port as shown on the Memory board (page three) is depicted in reverse. The data output pins (Y1-Y8) from the ‘541 chip should face inwards. Connected to internal data bus and not outwards as shown.

 

2. On page one of the circuit diagram set. The “ALU” circuit shows the “A=B” pull-up resistor to VCC (designated only as R?) as being 10K ohms. In practice I found that this value worked properly while clocking the circuit with a test signal in the sub Kilo-Hertz range. But as soon as the clock was lifted to 5 MHz the “A=B” signal could not be latched reliably. Reducing this resistor to 4.7K ohms solved the problem. This issue could be build specific of course.

 

3. The legacy 2651 IC used for the serial port, the “Programmable Communications Interface” is just not fast enough. At least when the system is clocked at 5 Mhz it isn’t. At 5 MHz we have a clock period of 200ns. My 2651 datasheet shows that after the 2651 CE/ pin is asserted the “Data Delay Time for Read (tDD)” is to be 250ns worst case. So we would be somewhat “over-clocking” the 2651 even if we had the entire 200ns.

We don’t. The SIO/ select line to the 2651 CE/ pin is gated via a 74F139 using the CLK signal. This means that SIO/ is only active for the second half of the clock cycle or 100ns in other words. The 2651 cannot operate successfully inside this 100ns window.

So just how fast can the 2651 actually run?

By replacing the master crystal with a pulse generator I was able to determine experimentally that my particular 2651 sample would run to about 2.89 MHz before it started dropping characters. This equates to a clock period of 346.02ns or a 2651 CE/ pulse width of about 173ns. So while still “over-clocking” according to the datasheet it looks like we could get away with 200ns. So my short term fix was to simply halve the clock speed. Changing the master crystal from 10 MHz to 5 MHz resulted in a 2.5 MHz system clock with a 400ns period. Half of which was 200ns which kept the 2651 entirely happy.

My PISC ran this way for well over a year, rock solid, never missing a beat. More recently I created a Wait State board which inserts a single wait state when accessing I/O devices in the 8 bit expansion bus. Now it runs at 5 MHz minus any wait states during I/O.

 

4. While still on the subject of serial ports and the 2651. There is a conceptual problem with the way the 2651 serial port has been interfaced. In the original PISC 1.0a schematic, the deliberately simplified device selection scheme has the 2651 chip enable signal CE/ active whenever address lines A15 and A14 were set low and high respectively. This means that the serial port will respond to any address in an entire 16K block from $4000 to $7FFF. This in itself this is not the issue.

The problem is that PISC uses the Address Bus, otherwise known as the A-Register Bus not only for memory addressing but also for all internal ALU computations. So any ALU operation that just happens to accidentally set the A-Register bits B15 low and B14 high will trigger the CE/ line of the 2651. Now you might expect this to result in a bus contention clash. It does not. The PISC Memory I/O board is tri-state buffered with two 74ALS245’s. Data from the Memory I/O board is only routed to the internal system bus when MRD/ is asserted. Something that does not happen during an ALU operation. So there are no bus contention issue. But the 2651 is still activated for a Read operation.

Why?

The 2651 is controlled by a chip enable CE/ and a single RD/ WR control pin. The 2651 chip has no separate control signals for Read and Write. It is a single pin acting as a toggle. The circuit shows this pin connected to the MWR signal (high on write). So this arrangement effectively places the 2651 in Read mode by default. All this added together results in the 2651 dropping any character stored in its single character receive buffer each time an ALU operation happens to set A-Reg B15 low and B14 high.

This character is placed on the Memory I/O cards local data bus by the 2651 and then simply ‘lost’. A most unfortunate state of affairs that somewhat hampers software development of file transfer protocols 🙂

The solution to this conundrum?

It was fairly trivial to re-arrange the circuit so that the CE/ signal to the 2651 is only enabled when the PISC is truly doing either a Read MRD/ or Write MWR/ operation and this is exactly what I did.

PISC – A Minimal TTL Processor for Architecture Exploration.

PISC – Pathetic Instruction Set Computer

Some time back now (nearly two years if recollection serves) I was web surfing for information regarding that most arcane of computer programming languages “Forth”. As so often happens when Surfing, I stumbled across something only loosely related but of particular interest – to me at least.

A paper by Bradford J. Rodriguez entitled “A Minimal TTL Processor for Architecture Exploration”. This was a design for a complete computer. Albeit with limited capabilities, built from approximately 20 odd TTL chips – but with NO conventional microprocessor or CPU. The TTL chips themselves serve to create the actual CPU.

Having built several home brew kit computers mostly with Zilog Z80 CPU’s. This paper fascinated me. The target audience were clearly intended to be educators in the Computer Science. But while the paper was very concise it was accompanied with a set of three beautifully presented circuit diagrams. So I read and re-read the paper while studying the circuits and eventually the whole picture started to make sense. I thought that I could probably build this.

Small problem. Two primary key ingredients in the form of the Arithmetic Logic Units (ALU) chips used 74181 and the Register File chip, the 74172 were both now so extinct as to be classified as “unobtainium”. Undeterred I went looking for a source for both anyway. To my surprise I found it wasn’t that hard. Perhaps I got lucky as one of my preferred Electrical Parts re-cycler’s here in Australia miraculously had enough stock of both these chips to commence construction.

So we set about building PISC. The “Pathetic Instruction Set Computer”.

Much more information coming soon. For now however I’ll sign-off with these three links to some Vimeo videos I created showing PISC at various stages.

PISC running it’s first complex program.

PISC board overview

PISC Monitor Program Demo