Seamonkey and SecurityExceptions related to localStorage

08. November 2013, admin - Allgemeines

So i was having a problem with certain sites using localStorage, like this one. The error messages varied(and the linked one just plain didn't work), but the Javascript console produced a "SecurityError" with the message being only "The operation is insecure.", as if localStorage was disabled, which it definitely was not.


Digging into the source code, i found the function DOMStorage::CanUseStorage(in mozilla/dom/src/storage/DOMStorage.cpp). The logic there uses various logic, like checking for the dom.storage.enabled config variable and checking if the permission manager denies the access. This last part effectively checks the cookie policy setting (not so surprisingly, thinking about it). When nothing site specific is set up, the default settings get checked, and if the default setting is deny cookies, allow cookies or only allow session cookies, all is fine. The only setting that leads to really surprising behaviour is when you have setup seamonkey to ask about cookies. In the case of accessing localStorage, that case leads to the unconditional denying of access, instead of asking for permission.

I know this now, but that seems to be a quite underdocumented sideeffect of that particular setting.


Status of new DSP LLE JIT in Dolphin-Emu

04. Juni 2013, Pierre Willenbrock - Dolphin-Emu

So, some people may have been aware that i am reworking the DSP LLE JIT of Dolphin-Emu. My main goal is to enable better optimizations during recompiling. To accomplish that, the original JIT has been split into a parser, optimizer and emitter, using a graph structure describing the instruction flow in between.


But first, let me step back a bit and explain the mess i just wrote(If you are already familiar with the acronyms used above, you can skip this paragraph). Dolphin-Emu is an emulator that emulates the Nintendo GameCube and Wii hardware. On these consoles, the Digital Signal Processor(DSP) is usually used to produce and reprocess audio signals. The DSP can be instructed to load a program(also called microcodes or ucodes) from main system memory. There are only few different programs in use. In emulation, there are two classes of emulation strategies: High Level Emulation(HLE) or Low Level Emulation(LLE). HLE recognizes programs or parts of programs and replaces them with native functions, while LLE uses the program itself to replicate the behavior of the real processor. With LLE, the programs can be either interpreted sequentially, one DSP instruction at a time, or recompiled from DSP instruction set to host machine instruction set in part or whole. If the program parts only are recompiled when they are needed, the program parts are recompiled Just In Time(JIT).

The DSP instruction set has many specialized instructions, like special instructions to form loops and instructions that (1)store a value from register to memory, (2) process contents of registers into another registers and (3)load a value from memory into a register, all in the same instruction. Loops are used quite a lot, and leaving the recompiled code and finding the place to reenter seems to be costly.

The original (and still only shipping) DSP LLE JIT recompiler goes through the code one instruction at a time, until it either hits a branch instruction or the maximum block size. There are some optimizations happening already, like keeping guest registers in host registers for as long as possible, but these need to work with the linear processing of the DSP instructions.

This way of doing business prevents some useful optimizations, like globally determining the value of certain registers. It is better to have a whole program block in some kind of intermediate representation, so the properties of instructions can be easily determined and the instruction flow easily manipulated. For this, the split into a parser, optimizer and emitter seems logical.


The parsers job is similar to the job of the original recompiler. But instead of creating an instruction stream for use by the host processor, it builds a graph of the instruction flow, containing all the information necessary for emitting it. It also inserts some special instructions used to check for interrupts and loop ends.

The Emitter just takes the "graph"(at this point, it has to be a linear chain of instructions, except for branches) and emits the host instruction stream for that. It handles branching, recombining and looping instruction flows mostly by itself(the actual branch still happens in the instruction emitter).

The optimization step is where all the interesting stuff happens. There are three things it is required to do before the emitter can do its job:

  • The instructions must be broken down into their constituents, that is, each instruction accessing a guest register gets split into guest loading instructions, executing instructions and guest storing instructions. The multi-action instructions mentioned above already get split up into parallel graph parts during parsing.
  • The instructions must be linearized, leaving only boring linear graph segments(with the exception of branching).
  • Host registers must be assigned to the instructions in the graph.

Currently, these steps are implemented, along with some simple constant propagation mechanisms. Performance is the same as or better than the original JIT, although host register lifetime is still limited to single instructions, i.E. the used guest registers get loaded and stored for every instruction.


GBEmu gets HQX upscaling

06. Januar 2011, admin - gbemu

The latest update of GBEmu (that is, the GBEmu-module-io branch) includes a new upscaler, based on the concept of the hqnx-upscalers, using GLSL.

The images show the quality differences between hqnx, bilinear and nearest upscaler. When looking closely, one can see that my implementation is not bug free yet, showing gradients where there should be solid areas.

These filters analyse the similarity of the 8 neighbors to the center pixel, as well as the similarity of the pixel-pairs on the four secondary diagonals. Using a threshold, each of these similarities is reduced to a single bit of information, resulting in 12 bits of similarity information.

The original hqnx upscalers use a switch block with code for each of these similarity codes. (To be exact, for the 8 bits from the similarities to the center points, the secondary diagonals are conditionals in the case blocks.) Each of these code blocks creates the needed output pixels directly using the 9 input pixels.

My implementation, on the other hand, uses small factor maps encoding the scaling factors for each of the four neighbor pixels in each given quadrant. The four factors are encoded in the RGBA components in a texture. Each map is 16 by 16 pixels in size, and there are 64 by 64 factor maps in the final texture, giving a modest 1024 by 1024 RGBA texture.

The 12 bit of similarity information extracted from the original image are put into yet another texture of the same size as the original. Since the shader needs to blend two frames together(to enable flicker-free transparency effects for example in Zelda 4), this index map contains the indexes for the current and the last frame. That way, each component in the index map carries 6 bit of similarity information, which then is directly used to address the correct factor map in the factor texture.

Since the factor map is undersampled at lower magnifications, the factor texture is kept in 5 resolutions. That way, the antialiasing of lines is preserved.

While implementing the fragment shader, I noticed that the MESA GLSL- parser is a lot more forgiving of small coding mistakes than its NVIDIA counterpart. For example, the NVIDIA parser complains when one uses "2" at places where floats are expected, forcing the more correct "2.0"(or a cast). On the other hand, the MESA parser seems to be unable to parse the fractional part of floating point numbers(maybe only in my locale), so i needed to work around this by using "1.0/2.0" for "0.5".