Thinking Outside Python GIL

Thinking Outside Python GIL
Published on 17.05.2025
Python has this thing called a Global Interpreter Lock(GIL), which is just a mutex lock that allows only one thread to execute Python bytecode at a time. This mainly exists because the vm is just not thread safe yet.

// cpython/Include/object.h

#ifndef Py_GIL_DISABLED
struct _object {
// ...
    union {
#if SIZEOF_VOID_P > 4
        PY_INT64_T ob_refcnt_full; 
        struct {
#  if PY_BIG_ENDIAN
            uint16_t ob_flags;
            uint16_t ob_overflow;
            uint32_t ob_refcnt;
#  else
            uint32_t ob_refcnt;
            uint16_t ob_overflow;
            uint16_t ob_flags;
#  endif
        };
#else
        Py_ssize_t ob_refcnt;
#endif
    };
// ...
    PyTypeObject *ob_type;
};
#else
// ...
#endif
The `ob_refcnt`, object's reference counting, used during garbage collection, will be in a messy state if GIL didn't exist. Consider 2 threads writing to same shared state, it'll definitely messup the current reference counting. 

But apparently when this GIL is disabled in the future, there's a block of code that kinda hints what approach they might be taking. 

struct _object {
    // ob_tid stores the thread id (or zero). 
    // It is also used by the GC and the
    // trashcan mechanism as a linked list pointer 
    // and by the GC to store the
    // computed "gc_refs" refcount.
    uintptr_t ob_tid;
    uint16_t ob_flags;
    PyMutex ob_mutex;           // per-object lock
    uint8_t ob_gc_bits;         // gc-related state
    uint32_t ob_ref_local;      // local reference count
    Py_ssize_t ob_ref_shared;   // shared (atomic) reference count
    PyTypeObject *ob_type;
};
There seems to be states holding information like the thread id, local count and shared reference counts.

Regardless, the GIL exists today and there a some ways to get around it like multiprocessing or native C extensions. But this time we'll look at slightly different approach. Let's keep the GIL but move the abstraction layer lower. 

How does running multiple Python interpreters with shared state in one program sound to you?

I just learned this today, that apaprently you could spawn multiple interpreters. This would have different globals, builtins, eval frames and so on. 

// cpython/Include/internal/pycore_interp_structs.h
struct _is {
    //...

    // Dictionary of the sys module
    PyObject *sysdict;

    // Dictionary of the builtins module
    PyObject *builtins;

    struct _import_state imports;

    /* The per-interpreter GIL, which might not be used. */
    struct _gil_runtime_state _gil;

    // ...
};
Here's how you can leverage multiple interpreters.

#include <Python.h>
#include <stdio.h>

void run_in_interpreter(const char* code) {
    PyThreadState *main_tstate = PyThreadState_Get();

    PyThreadState *new_tstate = Py_NewInterpreter();  
    if (!new_tstate) {
        fprintf(stderr, "Failed to create interpreter\n");
        return;
    }

    PyRun_SimpleString(code);

    Py_EndInterpreter(new_tstate);  // cleanup
    PyThreadState_Swap(main_tstate);  // switch back
}

int main(int argc, char *argv[]) {
    Py_Initialize();

    run_in_interpreter("print('Hello from interpreter 1')");
    run_in_interpreter("print('Hello from interpreter 2')");

    Py_Finalize();
    return 0;
}

Hello from interpreter 1
Hello from interpreter 2
Simple and straightforward. But what about shared states?

Well,
1. Allocate some memory on the heap.
2. Share this heap pointer with both the interpreters.

Since we are sharing pointers from C world to Python world, this can be a little finiky, or perhaps dangerous. A safer solution would be to wrap this original pointer with `PyCapsule`. This ensures a safer way to expose C pointers in Python's world. Also it takes care of a few things like cleanup logic.

#include <Python.h>
#include <stdio.h>

int shared_counter = 42;

void run_in_interpreter(const char* name) {
    PyThreadState *main_tstate = PyThreadState_Get();
    PyThreadState *new_tstate = Py_NewInterpreter();
    if (!new_tstate) {
        fprintf(stderr, "Failed to create interpreter: %s\n", name);
        return;
    }

    PyThreadState_Swap(new_tstate);

    void *ptr = &shared_counter;
    PyObject *capsule = PyCapsule_New(ptr, "shared.counter", NULL);

    // Store raw pointer value as an int
    PyObject *main = PyImport_AddModule("__main__");
    PyObject *globals = PyModule_GetDict(main);
    PyDict_SetItemString(globals, "counter_address", PyLong_FromVoidPtr(ptr));

    char code[512];
    snprintf(code, sizeof(code),
        "import ctypes\n"
        "addr = %ld\n"
        "ptr = ctypes.cast(addr, ctypes.POINTER(ctypes.c_int))\n"
        "print('[%s] shared counter before =', ptr.contents.value)\n"
        "ptr.contents.value += 1\n"
        "print('[%s] shared counter after  =', ptr.contents.value)\n",
        (long)ptr, name, name);

    PyRun_SimpleString(code);

    Py_EndInterpreter(new_tstate);
    PyThreadState_Swap(main_tstate);
}

int main() {
    Py_Initialize();

    run_in_interpreter("interp1");
    run_in_interpreter("interp2");

    printf("[main] final shared counter = %d\n", shared_counter);

    Py_Finalize();
    return 0;
}
[interp1] shared counter before = 42
[interp1] shared counter after  = 43
[interp2] shared counter before = 43
[interp2] shared counter after  = 44
[main] final shared counter = 44
Here we're using `ctypes` to cast int to a typed pointer and then interact with it. This can look like a nice solution, but I still think one needs a really good reason to play around with C pointers inside the Python's world. Memory bugs are wild out here, be careful.

Wanna talk more about Python internals? DM me - @pwnfunction.