PISC – Conditional Jumps

The first of the “Glaring Deficiencies” in need of a solution was the missing conditional branch (jump). This is how I went about solving it. Firstly a review of how PISC encodes an instruction:-

                              IR 15:0
   1        0      111     xxx      1        11   0-0000       

A sixteen bit instruction where:-

MRD   = Memory/IO Read

MWR   = Memory/IO Write

ARG   = Address Bus Register Select x 3 bits (ALU A Input)

DRG   = Data Bus Register Select x 3 bits (ALU B Input)

LTS/  = Latch Status (Both CY & EQ) when low

CY-IN = Carry In Signal Select x 2 bits

ALU   = ALU Function x 5 bits

The actual values above are taken from Brad’s original paper and are in fact the hard coded Fetch Instruction. The data bus register select is shown as “xxx” for “don’t care” because during the Fetch Cycle the normal write to Register File (port C-in) is inhibited and the data bus contents are instead latched into the “Instruction Register”. If one looks to the actual circuit diagram for PISC 1.0a we can see that this register select happens to be wired all high (R7). So the actual Fetch Instruction is:-

10111111 11100000 or $BFE0

This instruction translates to:-

  1. Read memory (destination is the Instruction Latch during fetch cycle).
  2. Address Register set to R7 (the Program Counter).
  3. Do NOT latch the status bits (high)
  4. CY-IN Select = ’11’ which is a numerical one “1” indicated by a logic low.
  5. The five 74181 ALU Function Bits correspond to “Arithmetic Operation F=A+CY

So the result is that the register R7 the Program Counter is used as the address to read and is then post incremented because the CY-IN is hard selected for a “1”.

Probably also worth showing the Carry-In select truth table for the original PISC 1.0a at this point:-

00 = CY
01 = EQ
10 = "0" (logic level high)
11 = "1" (logic level low)

Now that all this is behind us let’s get down to the details of the modification. From Brad’s original paper he observes that the 74181 can be selected for either a Logic or Arithmetic operation. This is controlled by IR:4 which I separated out above as the leading “0-” in the fetch instruction to make it stand out.

Brad goes on to make the important observation that if IR:4 is set high for a logic operation then the CY-IN selection bits IR:6 and IR:5 are irrelevant because the Carry-In is not used for logic operations. So this is where we shall implement our conditional branch.

We are going to arrange it such that “if a carry-in *is* specified for a logic operation” which notionally makes no sense because why would you bother if it is going to be ignored. Then this will trigger the process of a conditional operation which uses the “truth” of the the CY-IN as a signal which will either inhibit or allow the write back to the Register File from the ALU output.

But for this to all work nicely we first need to re-order the selection table for the CY-IN select. So in my PISC 1.0c version this select table looks like this (obviously at this point any code Brad wrote for his 1.0a version is not going to run on mine ;-).

00 = "1" (logic level low)
01 = CY
10 = EQ
11 = "0" (logic level high)

Also note that this change plays with the order of the “0” and “1” as well. Which means that we must change the hard coded Fetch Instruction such that it is now:

10111111 10000000 or $BF80

The re-ordering of the “1” and “0” gives us some very important additional functionality which I’ll get to shortly. But was not strictly required for conditional jumps.

So this now gives us a instruction table subset for bits IR:6, IR:5 and IR:4 that looks like this:

00-1 = Valid Logic Function write to Register File enabled WEA/ = Low
01-1 = Conditional Write to Register File dependent on CY status bit
10-1 = Conditional Write to Register File dependant on EQ status bit
11-1 = Write back to Register File is forcibly inhibited WEA/ = High

The last combination in the table “11-1” is the reason for swapping the order of the “0” and “1” around. This arrangement enables us to programmatically stop PISC from writing the result of an ALU Logic Operation back to the A-Register. When would you wish to do this? When you are performing a logic operation and you are only interested in setting the Flags (EQ & CY). But *do not* wish to destroy the contents of your A-Register while doing so.

Normally PISC 1.0a will always write the result of the ALU logic function back into the register specified for the Address Bus or ALU “A” input (same thing). There is no way of stopping this. Now there is, so a instruction like this:

CMP   R0,R4

Which is “Compare registers R0 and R4 and set the EQ flag if equal”. Can be crafted such that the contents of register R0 is not overwritten by the operation. Quite a useful feature! Sadly this only works for ALU Logic operations and not for Arithmetic operations. Oh well, I’ll take those features I can get cheaply.

So what do the actual PISC Assembler instructions look like? We can now craft an instruction like this:

MOV   R7,R4    IF EQ

Translated: Copy the contents of R4 to R7 (R7 being the Program Counter) but only if the EQ flag is set “1” (logic level low). If the copy is successful then a jump to the location in R4 happens. If the copy is inhibited then the next instruction is executed. Conditional branch.

Of course my PISC Assembler allows you to normally code this instruction as something like:


Jump if Equal to address LOOP.

So how expensive is the modification? Three gates (besides the re-wiring mentioned above). Looks like this:

PISC – Glaring Deficiencies

In the source article for my PISC project “A Minimal TTL Processor for Architecture Exploration”, by Bradford J. Rodriguez. Brad wrote the following:

Glaring Deficiencies

Many weaknesses of the PISC become evident after a short period of use, including:

a) no conditional branch microinstruction — an important need [6];
b) no provision for literal values in the microinstruction;
c) no ALU logic for multiply, divide, and right shift;
d) no logic for decoding of macroinstructions;
e) no provision for interrupts;
f) sparse coding of the ALU function select; and
g) two clocks required per microinstruction.

I have been successful in correcting a good number number of these “deficiencies” . In many cases by using Brad’s suggestion for a solution from the original article. So as a starting point towards documenting the changes I made between the original PISC 1.0a circuit and my PISC 1.0c derivative I’ll outline which of these have been solved and those that remain as “Glaring“. I’ll also provide some of my thoughts on the need for each of these features.

Glaring Deficiency “A” – No conditional branch micro-instruction. := SOLVED

Yeah, this is a biggie. Although I’ll quickly point out I have discovered that it is not totally impossible to write a useful program without such an instruction. Turns out that some very early computers built just after WWII didn’t have such an instruction either. So the early pioneers of computing programming resorted to self-modifying code. Where the program would overwrite the jump address in store. Just the thought of coding like that makes my head hurt!

So I provided some circuitry that would inhibit the write back of the ALU output to the Register File dependent on the state of the selected flag (EQ or CY). This essentially sets up the conditions required for an instruction like this:-

MOV      R7, $0200     IF CY

Which translated: copies address $0200 to Register R7, which is used as the Program Counter (PC) but only if the CY flag is set. If the CY flag is not set then the move operation (really a copy) is inhibited. Conditional jump.

The solution is here.

Glaring Deficiency “B” – No provision for literal values in the micro-instruction := REMAINS (do we care?)

I started off thinking that this was pretty important issue. Also that it should be pretty easy to solve. After spending a lot of time considering various ways to add this functionality. I eventually came to the conclusion that there was probably isn’t much point.

Any circuit that I could dream up was overly expensive in terms of complexity and chip count. And I had already added one too many IC’s to Brad’s otherwise elegant design. But the real kicker was that I could not find a way to sensibly do this in a single execute cycle. It would need at least two.

Now here’s the thing. PISC v1.0a stock standard can already load a 16 bit literal value into a register in a single execute cycle. The only thing dubious about this is that it needs an entire word in memory to store the literal following the actual load instruction. Granted most of the literals programmers use are apparently quite small. So it would be a more efficient use of available memory if we could load-up say 8 bits (0-255) of data alongside an 8 bit instruction for a one word opcode. But memory in my PISC really isn’t in short supply (128K ROM, 128K RAM in 16 bit words). So the hardware complexity required to add the feature just didn’t seem worth the effort.

Since making this decision I’ve done a fair bit of coding and the kept a concerned eye on the resulting object code sizes. Yep, they are most likely larger than the same project done in Z80 or 6502 Assembler. But not orders of magnitude larger. So I stopped worrying about it.

Later I started coding in PL/0+ which generates byte-code. Which should probably be called Word-code in the context of PISC. Since my PL/0+ word-code is being interpreted by a virtual machine this allows me to encode small literals into a single word along with an instruction. The code density of the PL/0+ object files is very high. The trade off being a reduction in execution speed for an increase in code density. Problem, if there ever was one, solved.

Glaring Deficiency “C” – No ALU logic for multiply, divide, and right shift := SOLVED (Partially)

I have added a simple Shift Register board in the data path between A-Bus and the “A” input to the ALU. This is just a small bunch of 74ALS245 buffers switched around to create one of four possible outcomes:

  1. Data path normal
  2. Data is Logical Shifted Right one bit
  3. Data Arithmetic Shifted Right (a variant of case 2 requiring no additional buffers).
  4. High Byte for Low Byte swap.

So I have solved the issue of the missing right shift. The high for low byte swap in one cycle makes coding for those pesky ASCII bytes so much easier. And is the only hardware nod my PISC gives to byte sized chunks of data.

As for the missing Multiply and Divide instructions? Well I seem to remember that the Z80 and 6502 didn’t have any either. In fact I don’t think Intel gave us this until the 8086! So I just coded my own routines 🙂

Shift Register Circuit coming soon…

Glaring Deficiency “D” – No logic for decoding of macro-instructions := REMAINS (at the hardware level)

Yes, PISC has no micro-code and no micro-code sequencer. This keeps the hardware solution so miraculously, elegantly simple. So for those instructions where it would be really nice to have a “macro-instruction” I just coded this into the Assembler. So the assembler now offloads the complexity of sequencing several instructions together.

Turns out there aren’t that many instructions where I needed to do this. CALL, RET, PUSH and POP come to mind quickly but after that I’m struggling to think of another example. To give a quick illustration of  what I’m talking about. If I was to code:


My PISC Assembler would generate something like:

00A3: ACEF 328: RET { rdd r5,r4 }
00A4: 2080 + inc r4
00A5: 349A + jmp r4

The three instructions required for a Return instruction.

  1. Read with post decrement the R5 stack pointer address contents into R4.
    (Stack grows upwards in PISC)
  2. Increment R4 so that it points to the location just after the original CALL instruction.
  3. Jump to the Address held in R4.

Of course all this would normally take place inside the CPU. But it would be a similar process in silicon and would likely take a similar number of cycles. In fact when I quickly checked the number of clock cycles required by my CALL and RET instructions, they seemed more or less on par with a 6502 and significantly better than a Z80. By way of quick example here are the number of clock cycles required for the return instruction for each of these three architectures:

Arch.  Mnemonic   Cycles

PISC   RET            6      (includes the Fetch cycles)
Z80    RET            10
6502  RTS            6

(The Z80 likely needs more cycles as internally it only has a 4 bit ALU).

Of course we have once again traded hardware simplicity for increased code size. But writing the Assembler source code it is no more difficult and the resulting object code in terms of clock cycles is just as fast (or faster).

Bottom line? So long as one has sufficient memory it would seem that not having the microcode baked into silicon is not such a bad thing and it sure makes it easy to change macro-instruction definitions.

Glaring Deficiency “E” – No provision for interrupts := SOLVED

Solved by implementation of one of Brad’s own suggestions in the original article. Additional hardware allows the I/O cards in the 8 bit expansion bus to raise a single shared interrupt. On detection of the Interrupt the machine switches from using R7 as the Program Counter to R6.

This means that R6 has to be pre-loaded with the memory address of the Interrupt Service Routine before interrupts are enabled. Once the ISR (Interrupt Service Routine) has run it’s course you need R6 positioned back again at the start of the ISR ready for the next cycle. Then on completion we switch back to using R7 again. Interrupt priority is supported by the slot number the card is installed in.

Sounds simple? It wasn’t! In fact this was the hardest part of the entire project. At several dark moments I almost gave up in despair of ever getting it working. After all, my Monitor program was working just fine without darn Interrupts. Does a single user, non multitasking machine really need this?

It was quite challenge working out how to save the CPU flags. Everything, both  hardware and software needed to be spot on before it would ‘fly’.  I spent much time debugging hardware when in fact the particular problem I was chasing was a software bug. We got there in the end and I don’t regret the time spent. It really is quite nice to press Ctrl-C and have the machine jump back into the BIOS (without having to poll the keyboard). The things we take for granted when using PC’s!

Glaring Deficiency “E” – Sparse coding of the ALU function := SOLVED (sort of)

I took Brad’s own suggestion of adding a single 74138 that provides eight supplementary control signals. I use the 3 bits in the control word that normally set the D-Reg (Data Bus Register) value as the selection input to the 74138. I inhibit both memory and register file read/write operations for this special “Control” function cycle.

I encode this “Control” function into the instruction by setting both MRD (Memory Read) and MWR (Memory Write) at the same time. Which would normally be quite illogical (if not damaging).

So I have not changed the actual ALU encoding as such. But I have managed to squeeze in 8 additional functions and they are:-

  1. Logical Shift Right
  2. Swap high byte for low byte
  3. Invert the Flags on the next cycle
  4. Memory Banking
  5. IRQ Enable/Disable
  6. IRQ Clear
  7. Arithmetic Shift Right
  8. Halt

Glaring Deficiency “G” – Two clocks required per micro-instruction := REMAINS

PISC runs two cycles. Fetch and Execute. In the Fetch clock cycle a fixed hardware encoded instruction is executed which loads the next program instruction from memory into a special Instruction Register. On the next clock cycle, the Execute cycle the instruction held in the Instruction Register is executed.

So 50% of the time PISC is not executing your program code but a fetch instruction to get your next program instruction. Not very efficient 🙁  But it sure is simple! 🙂

And besides, no machine of mine is going to have this “delayed branch” NOP instruction nonsense littered all throughout otherwise perfectly clean and logical code 😉 :lol

PISC – Addendum and Erratum

My original memory I/O board PISC 1.0a


During the build process of my PISC I did discover a few issues with the original circuit diagram as found here: https://bradrodriguez.com/papers/pisc.pdf

I’ll now document these discoveries.

One should perhaps mention (again) that this is largely of academic interest only. As duplication of this project would require two key Integrated Circuits. Namely 4 x 74181 ALUs and 8 x 74172 Register file chips. Sadly both of which are now so long out of production that they are nearly impossible to source.

But for the sake of completeness…


1. The 74LS541 being used as the 8 bit Parallel input port as shown on the Memory board (page three) is depicted in reverse. The data output pins (Y1-Y8) from the ‘541 chip should face inwards. Connected to internal data bus and not outwards as shown.


2. On page one of the circuit diagram set. The “ALU” circuit shows the “A=B” pull-up resistor to VCC (designated only as R?) as being 10K ohms. In practice I found that this value worked properly while clocking the circuit with a test signal in the sub Kilo-Hertz range. But as soon as the clock was lifted to 5 MHz the “A=B” signal could not be latched reliably. Reducing this resistor to 4.7K ohms solved the problem. This issue could be build specific of course.


3. The legacy 2651 IC used for the serial port, the “Programmable Communications Interface” is just not fast enough. At least when the system is clocked at 5 Mhz it isn’t. At 5 MHz we have a clock period of 200ns. My 2651 datasheet shows that after the 2651 CE/ pin is asserted the “Data Delay Time for Read (tDD)” is to be 250ns worst case. So we would be somewhat “over-clocking” the 2651 even if we had the entire 200ns.

We don’t. The SIO/ select line to the 2651 CE/ pin is gated via a 74F139 using the CLK signal. This means that SIO/ is only active for the second half of the clock cycle or 100ns in other words. The 2651 cannot operate successfully inside this 100ns window.

So just how fast can the 2651 actually run?

By replacing the master crystal with a pulse generator I was able to determine experimentally that my particular 2651 sample would run to about 2.89 MHz before it started dropping characters. This equates to a clock period of 346.02ns or a 2651 CE/ pulse width of about 173ns. So while still “over-clocking” according to the datasheet it looks like we could get away with 200ns. So my short term fix was to simply halve the clock speed. Changing the master crystal from 10 MHz to 5 MHz resulted in a 2.5 MHz system clock with a 400ns period. Half of which was 200ns which kept the 2651 entirely happy.

My PISC ran this way for well over a year, rock solid, never missing a beat. More recently I created a Wait State board which inserts a single wait state when accessing I/O devices in the 8 bit expansion bus. This allowed me to try running at Brad’s original  5 MHz design specification. At 5 Mhz it gives a good illusion of running. But every so often I’ll get a ‘glitch’ which hangs the machine. So in the interim I have de-rated the system to run at 4 MHz minus any wait states during I/O. At 4 MHz it is once again reliable.

4. While still on the subject of serial ports and the 2651. There is a conceptual problem with the way the 2651 serial port has been interfaced. In the original PISC 1.0a schematic, the deliberately simplified device selection scheme has the 2651 chip enable signal CE/ active whenever address lines A15 and A14 were set low and high respectively. This means that the serial port will respond to any address in an entire 16K block from $4000 to $7FFF. This in itself this is not the issue.

The problem is that PISC uses the Address Bus, otherwise known as the A-Register Bus not only for memory addressing but also for all internal ALU computations. So any ALU operation that just happens to accidentally set the A-Register bits B15 low and B14 high will trigger the CE/ line of the 2651. Now you might expect this to result in a bus contention clash. It does not. The PISC Memory I/O board is tri-state buffered with two 74ALS245’s. Data from the Memory I/O board is only routed to the internal system bus when MRD/ is asserted. Something that does not happen during an ALU operation. So there are no bus contention issue. But the 2651 is still activated for a Read operation.


The 2651 is controlled by a chip enable CE/ and a single RD/ WR control pin. The 2651 chip has no separate control signals for Read and Write. It is a single pin acting as a toggle. The circuit shows this pin connected to the MWR signal (high on write). So this arrangement effectively places the 2651 in Read mode by default. All this added together results in the 2651 dropping any character stored in its single character receive buffer each time an ALU operation happens to set A-Reg B15 low and B14 high.

This character is placed on the Memory I/O cards local data bus by the 2651 and then simply ‘lost’. A most unfortunate state of affairs that somewhat hampers software development of file transfer protocols 🙂

The solution to this conundrum?

It was fairly trivial to re-arrange the circuit so that the CE/ signal to the 2651 is only enabled when the PISC is truly doing either a Read MRD/ or Write MWR/ operation and this is exactly what I did.