[ Home ]
SBCL Internals

The pages on this CLiki-driven site can be edited by anybody at any time. No warranty of any kind can therefore be made; any implied warranties of merchantability or fitness for a particular purpose are expressly disclaimed
[ Home ] [ Recent Changes ] [ About CLiki ] [ Text Formatting ]

pseudo-atomic is used to make a piece of code look as if were done atomically. If the flag is set, any interrupt that happens in the pseudo-atomic block is recognized but deferred until the end of the pseudo-atomic block. When the signal is received and the PA flag is set, the pseudo-atomic-interrupted flag is set and the interrupt handler returns.

Finally, at the end of the pseudo-atomic block we test if we were interrupted and deal with the pending signal if so, then turn off the pseudo-atomic flag and the pseudo-atomic-interrupted flags. How this is done is target-dependent:

x86

On the x86, these flags are actual symbols *pseudo-atomic-atomic* and *pseudo-atomic-interrupted* that are set to 0 or 1 depending on the state. In a multi-threaded implementation, we use the thread-local values of these symbols

On other ports we use spare bits in registers:

SPARC

We use bits 0 and 2 of the alloc-tn? register that aren't normally used (because all allocations are aligned on 8-word boundaries.)

Alpha

Alpha uses reg_ALLOC as the sparc does (though bits 0 and 1, not 0 and 2) and the test at the end of the pseudo-atomic block to see whether a signal is pending is

        /* Were we interrupted? */
        subq    reg_ALLOC,1,reg_ALLOC
        stl     reg_ZERO,0(reg_ALLOC)

If we were interrupted, reg_ALLOC will now contain an odd value, so when we store to it we expect an unaligned trap which the kernel collects and sends us a SIGBUS? instead. We catch this in sigbus_handler? and do the pending signal. In theory

In practice, although the above probably works fine on OSF/1, there's another wrinkle when it comes to linux: Linux doesn't trap on unaligned access, instead fixing them up in the kernel and carrying on. So, on linux/alpha the 'interrupted-p' bit is not bit 1, it's bit 63, and when we store 0 into reg_ALLOC we get SIGSEGV? because there isn't any memory mapped there. So, that's why sigsegv_handler?() tests for pseudo-atomic-interrupted as well as for need-to-do-gc

PPC

For PPC, CMUCL actually uses two registers: the atomic flag is bit 2 of $ALLOC, and the interrupted flag is the high bit in $NL3. At the end of the section we add $NL3 to $ALLOC then trap if $ALLOC is negative. I (Daniel Barlow) am not yet sure why this is necessary

MIPS

When doing the MIPS? port, Christophe Rhodes ran into GC problems that turned out to be pseudo-atomic related. Firstly, let's look at what PA ended up looking like on MIPS (the CMUCL implementation of PA for MIPS used integer overflow, which gives SIGFPE?; however, modern MIPS machines have lazy FPE delivery, and I didn't fancy trying to work around that).

(inst li ,flag-tn ,extra)
(inst addu alloc-tn 1)
;; PSEUDO-ATOMIC body goes here
(inst bgez ,flag-tn label)
(inst addu alloc-tn (1- ,extra))
(inst break 16)
(emit-label label)

How does this work? Pseudo-atomic blocks take a parameter, EXTRA, to add to the allocation pointer at the end of the pseudo atomic block. The first instruction loads this into a temporary FLAG (it doesn't have to, it turns out; this is vestigial from an earlier version and may disappear at any time).

The second sets the pseudo-atomic-active flag. If the process receives a signal while the low bit of reg_ALLOC is set, it will defer (via some C-level code) handling of the signal until the end of the PA block, as per description above; on the MIPS, it indicates that an interrupt needs to be processed by making the high bit of the FLAG TN 1.

We then execute the PA body. Usually, this involves loading a word with a widetag and a length, and storing it at the address pointed to by reg_ALLOC-1. The reason that this has to be done "atomically" is that if it is not, the GC routines will see dangling pointers and die with some nasty message.

The cleanup starts by jumping to the end if the FLAG TN is not negative. The EXTRA is added to reg_ALLOC even if the branch is taken, as the addition instruction is in the branch delay slot. However, if the FLAG was negative, then we hit the BREAK 16 instruction, and get catapulted back into C-land where the pending interrupts and/or GCs are done.

Well, that's the theory. In practice, though, there was a problem with this definition; GC tended to lose just after pseudo-atomic sections. The reason for this turned out to be that the stores in the body of the PA were made too closely to the break instruction; sometimes, the data were being incorrectly cached. So the actual implementation of PA on the MIPS has several NOP?s inserted at the head of the cleanup...

pseudo-atomic API questions

The above discussion of the implementation of PA on the various platforms is nice, but we should also document the API for _PA. Currently, the arch'-specific p-a calls look as if they were designed to be called from interrupt handlers and take an interrupt context that is used to locate the registers that are then frobbed for the various p-a flags. With the introduction of the gencgc (and maybe other places?) we now call p-a routines from non-interrupt-handled code and this code neeeds to set the various flags without the luxury of the interrupt context. Looks like some assembly snippets are going to be required to do this. Should we extend the arch-specific API to add other functions that should be called from non-interupt contexts? Should we use the existing functions with a magic NULL interrupt context that suggests that we should use the asm approach, as appropriate?

After some #lisp discussion with Gabor Melis, we have come to the conclusion that one set of p-a interfaces is enough and that they will take an optional interrupt context for getting access to registers in an interrupt handler, otherwise they will either use assembly snippets to frob the current registers or, in the case of x86 and x86-64, the global p-a symbols.

I'd still like to make some changes to the interface, however. Current interface is:

extern boolean arch_pseudo_atomic_atomic(os_context_t*);
extern void arch_set_pseudo_atomic_interrupted(os_context_t*);

I think something like this would be better:

extern boolean arch_pseudo_atomic_is_atomic(os_context_t*);
extern void arch_pseudo_atomic_set_atomic(os_context_t*);
extern void arch_pseudo_atomic_clear_atomic(os_context_t*);

extern boolean arch_pseudo_atomic_is_interrupted(os_context_t*); extern void arch_pseudo_atomic_set_interrupted(os_context_t*); extern void arch_pseudo_atomic_clear_interrupted(os_context_t*);

Complaints, suggestions?


This page is linked from: GENCGC   PA   Signal Handling   THREADING   x86-64  

CLiki pages can be edited by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively