This page is a work in progress. Put everything related to multi-threading here, threads, synchronization, multi-core support, etc.
The Nintendo 3DS offers support for threading through use of SVC calls.
- 1 Processes
- 2 Threads
- 2.1 Usage
- 2.2 Core affinity
- 2.3 Debug
- 3 Synchronization
- 3.1 Mutex (normal)
- 3.2 Ciritical Section (light-weight mutex)
- 3.3 CriticalSection::Initialize
- 3.4 Semaphore
- 3.5 Light Semaphore ?
- 3.6 Event
- 3.7 Light Event
- 3.8 Address Arbiters
Allocates memory for a process according to the given CodeSetInfo contents and copies the segment data from the given memory locations to the allocated memory.
Sets up a process using the segments managed by the given CodeSet handle.
Sets up the main process thread and appends it to the scheduler queue.
The argc, argv, and envp fields from the given StartupInfo structure are ignored.
All addresses are given virtual for the process to be created. All sizes are given in 0x1000-pages.
|u16||Unknown, this is written to field 0x5A of KCodeSet|
|u32||RW addr (.data + .bss)|
|u32||RW size (.data + .bss)|
|u32||Total .text pages|
|u32||Total .rodata pages|
|u32||Total RW pages (.data + .bss)|
For Kernel implementation details, see KThread.
Though it is possible to run multi-threaded programs, running those on different cores is not possible "as-is". One core is always dedicated to the OS, hence you will never get 100% of both cores.
Using CloseHandle() with a KThread handle will terminate the specified thread only if the reference count reaches 0.
Lower priority values give the thread higher priority. For userland apps, priorities between 0x18 and 0x3F are allowed. The priority of the app's main thread seems to be 0x30.
The thread scheduler is cooperative, therefore if a thread takes up all the CPU time (for example if it enters an endless loop), all the other threads that run on the same CPU core won't get a chance to run. The main way of yielding another thread is using an address arbiter.
svc : 0x08
Result CreateThread(Handle* thread, func entrypoint, u32 arg, u32 stacktop, s32 threadpriority, s32 processorid);
R0=s32 threadpriority R1=func entrypoint R2=u32 arg R3=u32 stacktop R4=s32 processorid
Result result=R0 Handle* thread=R1
Creates a new thread in the current process which will begin execution at the given entrypoint. The SP CPU register will be initialized to stacktop, while r0 will be initialized to the given arg.
The input address used for Entrypoint_Param and StackTop are normally the same, but they may be chosen arbitrarily. For the main thread, the Entrypoint_Param is value 0.
The stacktop must be aligned to 0x8-bytes, otherwise when not aligned to 0x8-bytes the ARM11 kernel clears the low 3-bits of the stacktop address.
The processorid parameter specifies which processor the thread can run on. Non-negative values correspond to a specific CPU. (e.g. 0 for the Appcore and 1 for the Syscore on Old3DS) Special value -1 means all CPUs, and -2 means the default CPU for the process (Read from the Exheader, usually 0 for applications, 1 for system services). Games usually create threads using -2.
The thread priority value must be in the range 0x0..0x3F. Otherwise, error 0xE0E01BFD is returned.
With the Old3DS kernel, the s32 processorid must be <=2 (for the processorid validation check in the kernel). With the New3DS kernel, the processorid validation check requires processorid to be less than or equal to <total cores(MPCore "SCU Configuration Register" CPU number value + 1)>, and a number of additional constraints apply: When processorid==0x2 and the process is not a BASE mem-region process, exheader kernel-flags bitmask 0x2000 must be set (otherwise error 0xD9001BEA is returned). When processorid==0x3 and the process is not a BASE mem-region process, error 0xD9001BEA is returned. These are the only restriction checks done by the kernel for processorid.
svc : 0x09
svc : 0x0A
void SleepThread(s64 nanoseconds);
svc : 0x0B
Result GetThreadPriority(s32* priority, Handle thread);
.global svcGetThreadPriority .type svcGetThreadPriority, %function svcGetThreadPriority: str r0, [sp, #-0x4]! svc 0x0B ldr r3, [sp], #4 str r1, [r3] bx lr
svc : 0x0C
Result SetThreadPriority(Handle thread, s32 priority);
svc : 0x34
Result OpenThread(Handle* thread, Handle process, u32 threadId);
svc : 0x36
Result GetProcessIdOfThread(u32* processId, Handle thread);
svc : 0x37
Result GetThreadId(u32* threadId, Handle thread);
Details It seems that only the thread itself or one of its parent can get the ID. Calling this on the handle of a sibling or parent seems to always yield the ID 0.
svc : 0x2C
Result GetThreadInfo(s64* out, Handle thread, ThreadInfoType type);
Details This requests always return an error when called, it only checks if the handle is a thread or not. Hence, it will return 0xD8E007ED (BAD_ENUM) if the Handle is a Thread Handle, 0xD8E007F7 (BAD_HANDLE) if it isn't.
svc : 0x3B
Result GetThreadContext(ThreadContext* context, Handle thread);
The cores are numbered from 0 to 1 for Old 3DS and 0 to 3 for the new 3DS.
svc : 0x0D
Result GetThreadAffinityMask(u8* affinitymask, Handle thread, s32 processorcount);
svc : 0x0E
Result SetThreadAffinityMask(Handle thread, u8* affinitymask, s32 processorcount);
svc : 0x0F
Result GetThreadIdealProcessor(s32* processorid, Handle thread);
svc : 0x10
You are not able to use the system core (core1) by default. You have to first assign the amount of time dedicated to the system. The value is in percent, the higher it is, the more the system will be available for your application.
For example if you set this value to 25%, it means that your application will be able to use 25% of the system core at most, even if you never issue system calls.
If you set the value to a non-zero value, you will not be able to set it back to 0%. Keep in mind that if your application is heavily dependant on the system, setting a high value for your application might yield poorer performance than if you had set a low value.
Synchronization can be performed via WaitSynchronization on any handles deriving from KSynchronizationObject. The semantic meaning of the call depends on the particular object type referred to by the given handle:
- KClientPort: ???
- KClientSession: ???
- KDebug: ???
- KDmaObject: ???
- KEvent: Waits until the event is signaled
- KInterruptEvent: ???
- KMutex: Acquires a lock on the mutex (blocks until this succeeds)
- KProcess: Waits until the process exits
- KSemaphore: This consumes a value from the semaphore count, if possible, otherwise continues to wait
- KServerPort: Waits for a new client connection, upon which svcAcceptSession is ready to be called
- KServerSession: Waits for an IPC command to be submitted to the server process
- KThread: Waits until the thread terminates
- KTimer: ???
Most synchronization systems seem to have both a "normal" and "light-weight" version
For Kernel implementation details, see KMutex
/!\ It seems that the mutex will not be available once the thread that created it is destroyed
Ciritical Section (light-weight mutex)
Same thread ownership as a mutex ?
Light Semaphore ?
Does it exist ?
Address arbiters are a low-level primitive to implement synchronization based on a counter stored at some user-specified virtual memory address. Address arbiters are used to put the current thread to sleep until the counter is signaled. Both of these tasks are implemented in ArbitrateAddress.
Address arbiters are implemented by KAddressArbiter.
Result CreateAddressArbiter(Handle* arbiter)
Creates an address arbiter handle for use with ArbitrateAddress.
Result ArbitrateAddress(Handle arbiter, u32 addr, ArbitrationType type, s32 value, s64 nanoseconds)
type is SIGNAL, the ArbitrateAddress call will resume up to
value of the threads waiting on
addr using an arbiter, starting with the highest-priority threads. If
value is negative, all of these threads are released.
nanoseconds remains unused in this mode.
The other modes are used to (conditionally) put the current thread to sleep based on the memory word at virtual address
addr until another thread signals that address using ArbitrateAddress with the
type SIGNAL. WAIT_IF_LESS_THAN will put the current thread to sleep if that word is smaller than
value. DECREMENT_AND_WAIT_IF_LESS_THAN will furthermore decrement the memory value before the comparison. WAIT_IF_LESS_THAN_TIMEOUT and DECREMENT_AND_WAIT_IF_LESS_THAN_TIMEOUT will do the same as their counterparts, but will have thread execution resume if
nanoseconds nanoseconds pass without
addr being signaled.
|Address arbitration type||Value|