3Dmigoto now open-source...
  102 / 120    
NBA 2K17 crashes with 3Dmigoto 1.0.1. I also tried 1.2.36 (it doesn't have the skip_dxgi_device either), and it crashes. Then I tried 1.2.56, which has skip_dxgi_device, so I used it to not crash, but I don't have the OSD and hotkeys like the newest version. :( Edit: by the way, the 3Dmigoto DX10 build doesn't seem to be using iniparams constants correctly. It's as if they are always 0. The log shows this when pressing hotkeys (default value is 0, and I loaded iniparams like "float4 iniparams = IniParams.Load(0);"): User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 1.0 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.00 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.20 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.40 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.60 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.80 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 1.0 User key activation --> Override transition: 250ms transition_type: cosine x: 0.00 -> 0.00 Changing the default value to 1, for example, makes it still work as 0. Other weird stuff with x=1: User key deactivation <-- Override x: 1.0 -> 1.0 User key activation --> Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20 User key deactivation <-- Override x: 1.0 -> 1.0 User key activation --> Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20 User key deactivation <-- Override x: 1.0 -> 1.0 User key activation --> Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20
NBA 2K17 crashes with 3Dmigoto 1.0.1. I also tried 1.2.36 (it doesn't have the skip_dxgi_device either), and it crashes. Then I tried 1.2.56, which has skip_dxgi_device, so I used it to not crash, but I don't have the OSD and hotkeys like the newest version.

:(

Edit: by the way, the 3Dmigoto DX10 build doesn't seem to be using iniparams constants correctly. It's as if they are always 0. The log shows this when pressing hotkeys (default value is 0, and I loaded iniparams like "float4 iniparams = IniParams.Load(0);"):

User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 1.0
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.00
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.20
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.40
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.60
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.80
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 1.0
User key activation -->
Override transition: 250ms transition_type: cosine x: 0.00 -> 0.00


Changing the default value to 1, for example, makes it still work as 0.

Other weird stuff with x=1:

User key deactivation <--
Override x: 1.0 -> 1.0
User key activation -->
Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20
User key deactivation <--
Override x: 1.0 -> 1.0
User key activation -->
Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20
User key deactivation <--
Override x: 1.0 -> 1.0
User key activation -->
Override transition: 250ms transition_type: cosine x: 1.0 -> 0.20

Email for PayPal donations: masterotakusuko@gmail.com
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: Gainward Phoenix 1080 GLH
Monitor: Asus PG278QR
Speakers: Logitech Z506

Posted 08/25/2017 04:45 PM   
Another bad thing about the DX10 3Dmigoto build: it looks like shaders in ShaderFixes don't override the original ones unless they are edited and reloaded at least once after booting the game (this for every shader). Even cache_shaders=1 doesn't fix this. So if I release a fix for DMC4SE, I think I'll have to leave hunting enabled (at least it doesn't have an OSD that can annoy the end user) and make a bat file that edits and saves all shaders. People would have to execute it after booting the game and then press F10. Mmmm, annoying... This is what I get for having to use such an old build :p. I'm starting to debate if this is worth it or not for just HUD depth and a more or less fixed water shader.
Another bad thing about the DX10 3Dmigoto build: it looks like shaders in ShaderFixes don't override the original ones unless they are edited and reloaded at least once after booting the game (this for every shader). Even cache_shaders=1 doesn't fix this. So if I release a fix for DMC4SE, I think I'll have to leave hunting enabled (at least it doesn't have an OSD that can annoy the end user) and make a bat file that edits and saves all shaders. People would have to execute it after booting the game and then press F10. Mmmm, annoying...

This is what I get for having to use such an old build :p. I'm starting to debate if this is worth it or not for just HUD depth and a more or less fixed water shader.

Email for PayPal donations: masterotakusuko@gmail.com
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: Gainward Phoenix 1080 GLH
Monitor: Asus PG278QR
Speakers: Logitech Z506

Posted 08/25/2017 06:39 PM   
Is it possible to define an array in the ini and then use it in the shader to get a specific array element by calling that array using an integer argument provided by the shader? Explanation in more detail: The game provides an unique number for each stage to the shader in one of the constant buffers. I would like to use it as an array index to get the value specified in my custom array defined in the ini, if' it's possible of course. Edit: Seems like I'm overthinking it a bit. I can just use an include file with that array defined, can I? Edit2: How should I define the arrays to be the most efficient? As a simple static const array or as cbuffer? I should have mentioned that I have about 10 arrays of 54 element static floats to load in certain shader. Should I be worried abourt performance otr some shader limitations? Here is an example of one of my arrays: [code]static const float skyOffset[54] = { 0.02, //finland.finland.rally.day 0.02, //finland.finland.rally.day.alt 0.02, //finland.finland.rally.night 0.02, //finland.finland.rally.wet 0.02, //finland.finland.trail.day 0.02, //finland.finland.trail.day.alt 0.02, //finland.finland.trail.night 0.02, //finland.finland.trail.wet 0.0, //france.montecarlo.rally.day 0.0, //france.montecarlo.rally.day.alt 0.0, //france.montecarlo.rally.night 0.0, //frontend 0.0, //japan.shibuya 0.08, //kenya.kenya.rally.day 0.08, //kenya.kenya.rally.day.alt 0.08, //kenya.kenya.rally.night 0.08, //kenya.kenya.trail.day 0.08, //kenya.kenya.trail.day.alt 0.08, //kenya.kenya.trail.night 0.0, //london.battersea.day 0.0, //london.battersea.night 0.0, //london.dirt3.night.test 0.0, //london.dirt3.night.test.night 0.0, //material.testbed 0.0, //monaco.monaco.day 0.0, //monaco.monaco.day.alt 0.0, //monaco.monaco.night 0.04, //norway.norway.rally.day 0.04, //norway.norway.rally.day.alt 0.04, //norway.norway.rally.night 0.04, //norway.norway.rally.snowing 0.04, //norway.norway.trail.day 0.04, //norway.norway.trail.day.alt 0.04, //norway.norway.trail.night 0.04, //norway.norway.trail.snowing 0.0, //tutorials.battersea 0.0, //uk.battersea 0.08, //usa.aspen.day 0.08, //usa.aspen.day.alt 0.08, //usa.aspen.night 0.08, //usa.aspen.snowing 0.0, //usa.lacoliseum.day 0.0, //usa.lacoliseum.day.alt 0.06, //usa.michigan.rally.day 0.06, //usa.michigan.rally.day.alt 0.06, //usa.michigan.rally.night 0.06, //usa.michigan.rally.wet 0.06, //usa.michigan.trail.day 0.06, //usa.michigan.trail.day.alt 0.06, //usa.michigan.trail.night 0.06, //usa.michigan.trail.wet 0.04, //usa.smelter.day 0.04, //usa.smelter.day.alt 0.04, //usa.smelter.wet }; [/code]
Is it possible to define an array in the ini and then use it in the shader to get a specific array element by calling that array using an integer argument provided by the shader?

Explanation in more detail:
The game provides an unique number for each stage to the shader in one of the constant buffers. I would like to use it as an array index to get the value specified in my custom array defined in the ini, if' it's possible of course.

Edit: Seems like I'm overthinking it a bit. I can just use an include file with that array defined, can I?

Edit2: How should I define the arrays to be the most efficient? As a simple static const array or as cbuffer?

I should have mentioned that I have about 10 arrays of 54 element static floats to load in certain shader. Should I be worried abourt performance otr some shader limitations?

Here is an example of one of my arrays:
static const float skyOffset[54] =
{
0.02, //finland.finland.rally.day
0.02, //finland.finland.rally.day.alt
0.02, //finland.finland.rally.night
0.02, //finland.finland.rally.wet
0.02, //finland.finland.trail.day
0.02, //finland.finland.trail.day.alt
0.02, //finland.finland.trail.night
0.02, //finland.finland.trail.wet
0.0, //france.montecarlo.rally.day
0.0, //france.montecarlo.rally.day.alt
0.0, //france.montecarlo.rally.night
0.0, //frontend
0.0, //japan.shibuya
0.08, //kenya.kenya.rally.day
0.08, //kenya.kenya.rally.day.alt
0.08, //kenya.kenya.rally.night
0.08, //kenya.kenya.trail.day
0.08, //kenya.kenya.trail.day.alt
0.08, //kenya.kenya.trail.night
0.0, //london.battersea.day
0.0, //london.battersea.night
0.0, //london.dirt3.night.test
0.0, //london.dirt3.night.test.night
0.0, //material.testbed
0.0, //monaco.monaco.day
0.0, //monaco.monaco.day.alt
0.0, //monaco.monaco.night
0.04, //norway.norway.rally.day
0.04, //norway.norway.rally.day.alt
0.04, //norway.norway.rally.night
0.04, //norway.norway.rally.snowing
0.04, //norway.norway.trail.day
0.04, //norway.norway.trail.day.alt
0.04, //norway.norway.trail.night
0.04, //norway.norway.trail.snowing
0.0, //tutorials.battersea
0.0, //uk.battersea
0.08, //usa.aspen.day
0.08, //usa.aspen.day.alt
0.08, //usa.aspen.night
0.08, //usa.aspen.snowing
0.0, //usa.lacoliseum.day
0.0, //usa.lacoliseum.day.alt
0.06, //usa.michigan.rally.day
0.06, //usa.michigan.rally.day.alt
0.06, //usa.michigan.rally.night
0.06, //usa.michigan.rally.wet
0.06, //usa.michigan.trail.day
0.06, //usa.michigan.trail.day.alt
0.06, //usa.michigan.trail.night
0.06, //usa.michigan.trail.wet
0.04, //usa.smelter.day
0.04, //usa.smelter.day.alt
0.04, //usa.smelter.wet
};

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/05/2017 10:09 AM   
Let me google that for you... [url]https://msdn.microsoft.com/en-us/library/windows/desktop/bb509581(v=vs.85).aspx[/url]: [quote]In Shader Model 4, shader constants are stored in one or more buffer resources in memory. They can be organized into two types of buffers: constant buffers (cbuffers) and texture buffers (tbuffers). Constant buffers are optimized for constant-variable usage, which is characterized by lower-latency access and more frequent update from the CPU. For this reason, additional size, layout, and access restrictions apply to these resources. Texture buffers are accessed like textures and perform better for arbitrarily indexed data. Regardless of which type of resource you use, there is no limit to the number of constant buffers or texture buffers an application can create.[/quote]MSDN says arbitrary indexed data performs better in a texture buffer (also, from experience this compiles faster than indexing into a constant buffer, which can be significant if the offset calculation is non-trivial). [quote]A constant buffer is a specialized buffer resource that is accessed like a buffer. Each constant buffer can hold up to 4096 vectors; each vector contains up to four 32-bit values. You can bind up to 14 constant buffers per pipeline stage (2 additional slots are reserved for internal use). A texture buffer is a specialized buffer resource that is accessed like a texture. Texture access (as compared with buffer access) can have better performance for arbitrarily indexed data. You can bind up to 128 texture buffers per pipeline stage.[/quote]If you declare it statically like you posted above it will actually use a special "immediate constant buffer" embedded in the shader, which AFAIK has the same limitations as any other constant buffer, and the buffers you describe will easily fit within that... In fact, it will look something like this (I copied your buffer ten times and added just enough logic in the shader to make sure the compiler didn't optimise anything out. I also had to change the first digit in each of the arrays otherwise the compiler was smart enough to notice that they were identical and only included one): [code] dcl_immediateConstantBuffer { { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0.020000, 1.020000, 2.020000, 3.020000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0.080000, 1.080000, 2.080000, 3.080000}, { 0, 1.000000, 2.000000, 3.000000}, { 0, 1.000000, 2.000000, 3.000000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.060000, 1.060000, 2.060000, 3.060000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 0.040000, 1.040000, 2.040000, 3.040000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.020000, 5.020000, 6.020000, 7.020000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.080000, 5.080000, 6.080000, 7.080000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.000000, 5.000000, 6.000000, 7.000000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.060000, 5.060000, 6.060000, 7.060000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 4.040000, 5.040000, 6.040000, 7.040000}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.020000, 9.020000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.080000, 9.080000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.000000, 9.000000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.060000, 9.060000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0}, { 8.040000, 9.040000, 0, 0} } [/code] Frankly, unless you have a performance issue I'd just go with that for simplicity. If you do need to load it as a texture or constant buffer you can do that with 3DMigoto, but you will have to construct a binary file containing the values encoded as 32bit little endian floats (I'd like to add a way to write these as text in the d3dx.ini, but that is a low priority and would probably only support limited use cases), then do something like this in the d3dx.ini: [code] [ResourceSkyOffsetArray] type = Buffer filename = ShaderFixes/SkyOffset.buf format = R32_FLOAT [/code] Then bind that in a constant buffer or texture slot (stick to one or the other - if you try to bind it by reference to both types you will end up with bad bind flags, and binding one using a full copy is going to kill any performance gain you might get by using this) as you would any other resource. (format is technically optional and you can use it as a constant buffer without that, but you need either a format or stride to use it as a texture, and stride is mostly intended for type=StructuredBuffer. [s]There's also type=ByteAddressBuffer, which shouldn't need either[/s]Actually, that one is probably broken)
Let me google that for you...

https://msdn.microsoft.com/en-us/library/windows/desktop/bb509581(v=vs.85).aspx:
In Shader Model 4, shader constants are stored in one or more buffer resources in memory. They can be organized into two types of buffers: constant buffers (cbuffers) and texture buffers (tbuffers). Constant buffers are optimized for constant-variable usage, which is characterized by lower-latency access and more frequent update from the CPU. For this reason, additional size, layout, and access restrictions apply to these resources. Texture buffers are accessed like textures and perform better for arbitrarily indexed data. Regardless of which type of resource you use, there is no limit to the number of constant buffers or texture buffers an application can create.
MSDN says arbitrary indexed data performs better in a texture buffer (also, from experience this compiles faster than indexing into a constant buffer, which can be significant if the offset calculation is non-trivial).

A constant buffer is a specialized buffer resource that is accessed like a buffer. Each constant buffer can hold up to 4096 vectors; each vector contains up to four 32-bit values. You can bind up to 14 constant buffers per pipeline stage (2 additional slots are reserved for internal use).

A texture buffer is a specialized buffer resource that is accessed like a texture. Texture access (as compared with buffer access) can have better performance for arbitrarily indexed data. You can bind up to 128 texture buffers per pipeline stage.
If you declare it statically like you posted above it will actually use a special "immediate constant buffer" embedded in the shader, which AFAIK has the same limitations as any other constant buffer, and the buffers you describe will easily fit within that... In fact, it will look something like this (I copied your buffer ten times and added just enough logic in the shader to make sure the compiler didn't optimise anything out. I also had to change the first digit in each of the arrays otherwise the compiler was smart enough to notice that they were identical and only included one):

dcl_immediateConstantBuffer { { 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0.020000, 1.020000, 2.020000, 3.020000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0.080000, 1.080000, 2.080000, 3.080000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0, 1.000000, 2.000000, 3.000000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.060000, 1.060000, 2.060000, 3.060000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 0.040000, 1.040000, 2.040000, 3.040000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.020000, 5.020000, 6.020000, 7.020000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.080000, 5.080000, 6.080000, 7.080000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.000000, 5.000000, 6.000000, 7.000000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.060000, 5.060000, 6.060000, 7.060000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 4.040000, 5.040000, 6.040000, 7.040000},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.020000, 9.020000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.080000, 9.080000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.000000, 9.000000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.060000, 9.060000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0},
{ 8.040000, 9.040000, 0, 0} }

Frankly, unless you have a performance issue I'd just go with that for simplicity. If you do need to load it as a texture or constant buffer you can do that with 3DMigoto, but you will have to construct a binary file containing the values encoded as 32bit little endian floats (I'd like to add a way to write these as text in the d3dx.ini, but that is a low priority and would probably only support limited use cases), then do something like this in the d3dx.ini:

[ResourceSkyOffsetArray]
type = Buffer
filename = ShaderFixes/SkyOffset.buf
format = R32_FLOAT

Then bind that in a constant buffer or texture slot (stick to one or the other - if you try to bind it by reference to both types you will end up with bad bind flags, and binding one using a full copy is going to kill any performance gain you might get by using this) as you would any other resource.

(format is technically optional and you can use it as a constant buffer without that, but you need either a format or stride to use it as a texture, and stride is mostly intended for type=StructuredBuffer. There's also type=ByteAddressBuffer, which shouldn't need eitherActually, that one is probably broken)

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/05/2017 04:50 PM   
And here again, DSS with a comprehensive answer. I know my question mutated from the 3DM related to a general HLSL one, so I deserved a LMGTFY :) I will stick with what I've got, thank you.
And here again, DSS with a comprehensive answer. I know my question mutated from the 3DM related to a general HLSL one, so I deserved a LMGTFY :) I will stick with what I've got, thank you.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/06/2017 08:32 AM   
Does 3DMigoto it the current form allow to write to the IniParams from the pixel shader? If not, can I create a RWTexture1D buffer and read and write to it in the same pixel shader? Here is a concept that doesn't work. DSS mentioned a while ago that RWTextures are untested from within the pixel shaders, but it's been some time and maybe something has changed, which I'm not aware of. [code][ResourceTrackID] [ShaderOverride-KeyValue] hash = bdc94f312b8472a7 ps-u1 = ref ResourceTrackID ps-t101 = ResourceTrackID post ps-u1 = null post ps-t101 = null[/code] [code]//disabled vignette shader cbuffer _Globals : register(b0) { float keyValue : packoffset(c48.w); } // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); RWTexture1D<float4> RWMyIni : register(u1); Texture1D<float4> MyIni : register(t101); void main( float4 v0 : SV_Position0, linear centroid float4 v1 : TEXCOORD0, linear centroid float4 v2 : TEXCOORD1, out float4 o0 : SV_Target0) { if (MyIni.Load(0) != 0) RWMyIni[0] = keyValue; o0.x = MyIni.Load(0); //just for debug, MyIni is being used in other shaders return; }[/code] A bit more explanation: I use the keyValue as an index for an array of configuration parametrers for each in game track. The problem is that this value for some camera angles is 0 as I use the vignetteAmount parameter form the game config files which I set as a unique integer for each track. I need to make sure that 3DM remembers the last non 0 value which I feed to other shaders.
Does 3DMigoto it the current form allow to write to the IniParams from the pixel shader? If not, can I create a RWTexture1D buffer and read and write to it in the same pixel shader?

Here is a concept that doesn't work. DSS mentioned a while ago that RWTextures are untested from within the pixel shaders, but it's been some time and maybe something has changed, which I'm not aware of.

[ResourceTrackID]

[ShaderOverride-KeyValue]
hash = bdc94f312b8472a7
ps-u1 = ref ResourceTrackID
ps-t101 = ResourceTrackID
post ps-u1 = null
post ps-t101 = null


//disabled vignette shader

cbuffer _Globals : register(b0)
{
float keyValue : packoffset(c48.w);
}


// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);

RWTexture1D<float4> RWMyIni : register(u1);
Texture1D<float4> MyIni : register(t101);

void main(
float4 v0 : SV_Position0,
linear centroid float4 v1 : TEXCOORD0,
linear centroid float4 v2 : TEXCOORD1,
out float4 o0 : SV_Target0)
{
if (MyIni.Load(0) != 0) RWMyIni[0] = keyValue;
o0.x = MyIni.Load(0); //just for debug, MyIni is being used in other shaders
return;
}



A bit more explanation:
I use the keyValue as an index for an array of configuration parametrers for each in game track. The problem is that this value for some camera angles is 0 as I use the vignetteAmount parameter form the game config files which I set as a unique integer for each track. I need to make sure that 3DM remembers the last non 0 value which I feed to other shaders.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/07/2017 08:32 PM   
I have a problem with One Piece Burning Blood. There are two vertex/pixel shader pairs that are hard to fix. The strangest one is this (unmodified): 429c49f40e31af4e-vs_replace.txt: [code] //Haloing 1 (and decals). // ---- Created with 3Dmigoto v1.2.64 on Fri Sep 08 20:28:34 2017 cbuffer _Globals : register(b0) { row_major float4x4 _sp_world_view_projection_matrix : packoffset(c0); } // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); void main( float4 v0 : POSITION0, float4 v1 : COLOR0, float2 v2 : TEXCOORD0, out float4 o0 : SV_Position0, out float4 o1 : TEXCOORD7, out float4 o2 : TEXCOORD0, out float4 o3 : TEXCOORD3) { float4 r0; uint4 bitmask, uiDest; float4 fDest; r0.xyzw = _sp_world_view_projection_matrix._m10_m11_m12_m13 * v0.yyyy; r0.xyzw = v0.xxxx * _sp_world_view_projection_matrix._m00_m01_m02_m03 + r0.xyzw; r0.xyzw = v0.zzzz * _sp_world_view_projection_matrix._m20_m21_m22_m23 + r0.xyzw; r0.xyzw = v0.wwww * _sp_world_view_projection_matrix._m30_m31_m32_m33 + r0.xyzw; o0.xyzw = r0.xyzw; o3.xyzw = r0.xyzw; o1.xyzw = v1.xyzw + v1.xyzw; o2.xy = v2.xy; return; } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789 // // using 3Dmigoto v1.2.64 on Fri Sep 08 20:28:34 2017 // // // Buffer Definitions: // // cbuffer $Globals // { // // row_major float4x4 _sp_world_view_projection_matrix;// Offset: 0 Size: 64 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // $Globals cbuffer NA NA 0 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // POSITION 0 xyzw 0 NONE float xyzw // COLOR 0 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float xy // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float xyzw // TEXCOORD 7 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float xy // TEXCOORD 3 xyzw 3 NONE float xyzw // vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[4], immediateIndexed dcl_input v0.xyzw dcl_input v1.xyzw dcl_input v2.xy dcl_output_siv o0.xyzw, position dcl_output o1.xyzw dcl_output o2.xy dcl_output o3.xyzw dcl_temps 1 mul r0.xyzw, v0.yyyy, cb0[1].xyzw mad r0.xyzw, v0.xxxx, cb0[0].xyzw, r0.xyzw mad r0.xyzw, v0.zzzz, cb0[2].xyzw, r0.xyzw mad r0.xyzw, v0.wwww, cb0[3].xyzw, r0.xyzw mov o0.xyzw, r0.xyzw mov o3.xyzw, r0.xyzw add o1.xyzw, v1.xyzw, v1.xyzw mov o2.xy, v2.xyxx ret // Approximately 9 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/ [/code] 14e62becc66862ae-ps_replace.txt: [code] //Decals (haloing 1). // ---- Created with 3Dmigoto v1.2.64 on Fri Sep 08 21:13:39 2017 cbuffer _Globals : register(b0) { row_major float4x4 _sp_view_projection_inverse_matrix : packoffset(c0); row_major float4x4 _sp_view_projection_texture_matrix : packoffset(c4); float4 _sp_screen_coord_offset : packoffset(c8); } SamplerState _samp0_samplerstate_s : register(s0); SamplerState _samp4_samplerstate_s : register(s4); Texture2D<float4> _samp0 : register(t0); Texture2D<float4> _samp4 : register(t4); // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); #include "matrix.hlsl" void main( float4 v0 : SV_Position0, float4 v1 : TEXCOORD7, float4 v2 : TEXCOORD0, float4 v3 : TEXCOORD3, out float4 o0 : SV_Target0) { float4 r0,r1; uint4 bitmask, uiDest; float4 fDest; r0.xy = v3.xy / v3.ww; r1.xyzw = _sp_view_projection_inverse_matrix._m10_m11_m12_m13 * r0.yyyy; r1.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m00_m01_m02_m03 + r1.xyzw; r0.xy = r0.xy * _sp_screen_coord_offset.xy + _sp_screen_coord_offset.zw; r0.x = _samp4.Sample(_samp4_samplerstate_s, r0.xy).x; r0.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m20_m21_m22_m23 + r1.xyzw; r0.xyzw = _sp_view_projection_inverse_matrix._m30_m31_m32_m33 + r0.xyzw; r0.xyzw = r0.xyzw / r0.wwww; r1.xy = cmp(r0.xz < float2(-1,-1)); r1.x = (int)r1.y | (int)r1.x; r1.yz = cmp(float2(1,1) < r0.xz); r1.x = (int)r1.y | (int)r1.x; r1.x = (int)r1.z | (int)r1.x; if (r1.x != 0) discard; r1.xy = _sp_view_projection_texture_matrix._m10_m12 * r0.yy; r1.xy = r0.xx * _sp_view_projection_texture_matrix._m00_m02 + r1.xy; r0.xz = r0.zz * _sp_view_projection_texture_matrix._m20_m22 + r1.xy; r0.xz = r0.ww * _sp_view_projection_texture_matrix._m30_m32 + r0.xz; r0.y = abs(r0.y) * 2 + -1; r0.y = saturate(1 + -r0.y); r1.w = v1.w * r0.y; r0.xyzw = _samp0.Sample(_samp0_samplerstate_s, r0.xz).xyzw; r1.xyz = v1.xyz; o0.xyzw = saturate(r0.xyzw * r1.xyzw); return; } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789 // // using 3Dmigoto v1.2.64 on Fri Sep 08 21:13:39 2017 // // // Buffer Definitions: // // cbuffer $Globals // { // // row_major float4x4 _sp_view_projection_inverse_matrix;// Offset: 0 Size: 64 // row_major float4x4 _sp_view_projection_texture_matrix;// Offset: 64 Size: 64 // float4 _sp_screen_coord_offset; // Offset: 128 Size: 16 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // _samp0_samplerstate sampler NA NA 0 1 // _samp4_samplerstate sampler NA NA 4 1 // _samp0 texture float4 2d 0 1 // _samp4 texture float4 2d 4 1 // $Globals cbuffer NA NA 0 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float // TEXCOORD 7 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float // TEXCOORD 3 xyzw 3 NONE float xy w // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET float xyzw // ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[9], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s4, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t4 dcl_input_ps linear v1.xyzw dcl_input_ps linear v3.xyw dcl_output o0.xyzw dcl_temps 2 div r0.xy, v3.xyxx, v3.wwww mul r1.xyzw, r0.yyyy, cb0[1].xyzw mad r1.xyzw, r0.xxxx, cb0[0].xyzw, r1.xyzw mad r0.xy, r0.xyxx, cb0[8].xyxx, cb0[8].zwzz sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t4.xyzw, s4 mad r0.xyzw, r0.xxxx, cb0[2].xyzw, r1.xyzw add r0.xyzw, r0.xyzw, cb0[3].xyzw div r0.xyzw, r0.xyzw, r0.wwww lt r1.xy, r0.xzxx, l(-1.000000, -1.000000, 0.000000, 0.000000) or r1.x, r1.y, r1.x lt r1.yz, l(0.000000, 1.000000, 1.000000, 0.000000), r0.xxzx or r1.x, r1.y, r1.x or r1.x, r1.z, r1.x discard_nz r1.x mul r1.xy, r0.yyyy, cb0[5].xzxx mad r1.xy, r0.xxxx, cb0[4].xzxx, r1.xyxx mad r0.xz, r0.zzzz, cb0[6].xxzx, r1.xxyx mad r0.xz, r0.wwww, cb0[7].xxzx, r0.xxzx mad r0.y, |r0.y|, l(2.000000), l(-1.000000) add_sat r0.y, -r0.y, l(1.000000) mul r1.w, r0.y, v1.w sample_indexable(texture2d)(float,float,float,float) r0.xyzw, r0.xzxx, t0.xyzw, s0 mov r1.xyz, v1.xyzx mul_sat o0.xyzw, r1.xyzw, r0.xyzw ret // Approximately 25 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/ [/code] By default, the ground decal is correct only with 0 convergence, but there's a character silhouette to the right of the character in both eyes that depends on separation. Doing a "r0.x+=stereo.x*stereo.z;" just after r0.xy is defined in the PS puts that silhouette into the correct place at 0 convergence, but the ground decal is moved and incorrect. Correcting o3 with "o3.x+=stereo.x*(o3.w-stereo.y);" fixes some haloing problems unrelated to the decal, but now the decal is at screen depth... The other pair is a bit different but with the same decal issue. 03fa26c97bff244f-vs_replace.txt: [code] //Decals 2 and haloing. // ---- Created with 3Dmigoto v1.2.64 on Sat Sep 09 12:06:41 2017 cbuffer _Globals : register(b0) { row_major float4x4 _sp_world_view_projection_matrix : packoffset(c0); row_major float4x4 _sp_world_matrix : packoffset(c4); float4 _sp_light_direction : packoffset(c8); } // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); void main( float4 v0 : POSITION0, float4 v1 : COLOR0, float2 v2 : TEXCOORD0, float3 v3 : NORMAL0, float3 v4 : TANGENT0, out float4 o0 : SV_Position0, out float4 o1 : TEXCOORD7, out float4 o2 : TEXCOORD0, out float4 o3 : TEXCOORD2, out float4 o4 : TEXCOORD3) { float4 r0,r1,r2; uint4 bitmask, uiDest; float4 fDest; r0.xyzw = _sp_world_view_projection_matrix._m10_m11_m12_m13 * v0.yyyy; r0.xyzw = v0.xxxx * _sp_world_view_projection_matrix._m00_m01_m02_m03 + r0.xyzw; r0.xyzw = v0.zzzz * _sp_world_view_projection_matrix._m20_m21_m22_m23 + r0.xyzw; r0.xyzw = v0.wwww * _sp_world_view_projection_matrix._m30_m31_m32_m33 + r0.xyzw; o0.xyzw = r0.xyzw; o4.xyzw = r0.xyzw; o1.xyzw = v1.xyzw + v1.xyzw; o2.xy = v2.xy; r0.xyz = _sp_world_matrix._m11_m12_m10 * v4.yyy; r0.xyz = v4.xxx * _sp_world_matrix._m01_m02_m00 + r0.xyz; r0.xyz = v4.zzz * _sp_world_matrix._m21_m22_m20 + r0.xyz; r0.w = dot(r0.xyz, r0.xyz); r0.w = rsqrt(r0.w); r1.xyz = r0.zxy * r0.www; o3.x = dot(-_sp_light_direction.xyz, r1.xyz); r1.xyz = _sp_world_matrix._m10_m11_m12 * v3.yyy; r1.xyz = v3.xxx * _sp_world_matrix._m00_m01_m02 + r1.xyz; r1.xyz = v3.zzz * _sp_world_matrix._m20_m21_m22 + r1.xyz; r0.w = dot(r1.xyz, r1.xyz); r0.w = rsqrt(r0.w); r2.xyz = r1.xyz * r0.www; o3.z = dot(-_sp_light_direction.xyz, r2.xyz); r2.xyz = r1.zxy * r0.xyz; r0.xyz = r1.yzx * r0.yzx + -r2.xyz; r0.w = dot(r0.xyz, r0.xyz); r0.w = rsqrt(r0.w); r0.xyz = r0.xyz * r0.www; o3.y = dot(-_sp_light_direction.xyz, r0.xyz); return; } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789 // // using 3Dmigoto v1.2.64 on Sat Sep 09 12:06:41 2017 // // // Buffer Definitions: // // cbuffer $Globals // { // // row_major float4x4 _sp_world_view_projection_matrix;// Offset: 0 Size: 64 // row_major float4x4 _sp_world_matrix;// Offset: 64 Size: 64 // float4 _sp_light_direction; // Offset: 128 Size: 16 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // $Globals cbuffer NA NA 0 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // POSITION 0 xyzw 0 NONE float xyzw // COLOR 0 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float xy // NORMAL 0 xyz 3 NONE float xyz // TANGENT 0 xyz 4 NONE float xyz // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float xyzw // TEXCOORD 7 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float xy // TEXCOORD 2 xyz 3 NONE float xyz // TEXCOORD 3 xyzw 4 NONE float xyzw // vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[9], immediateIndexed dcl_input v0.xyzw dcl_input v1.xyzw dcl_input v2.xy dcl_input v3.xyz dcl_input v4.xyz dcl_output_siv o0.xyzw, position dcl_output o1.xyzw dcl_output o2.xy dcl_output o3.xyz dcl_output o4.xyzw dcl_temps 3 mul r0.xyzw, v0.yyyy, cb0[1].xyzw mad r0.xyzw, v0.xxxx, cb0[0].xyzw, r0.xyzw mad r0.xyzw, v0.zzzz, cb0[2].xyzw, r0.xyzw mad r0.xyzw, v0.wwww, cb0[3].xyzw, r0.xyzw mov o0.xyzw, r0.xyzw mov o4.xyzw, r0.xyzw add o1.xyzw, v1.xyzw, v1.xyzw mov o2.xy, v2.xyxx mul r0.xyz, v4.yyyy, cb0[5].yzxy mad r0.xyz, v4.xxxx, cb0[4].yzxy, r0.xyzx mad r0.xyz, v4.zzzz, cb0[6].yzxy, r0.xyzx dp3 r0.w, r0.xyzx, r0.xyzx rsq r0.w, r0.w mul r1.xyz, r0.wwww, r0.zxyz dp3 o3.x, -cb0[8].xyzx, r1.xyzx mul r1.xyz, v3.yyyy, cb0[5].xyzx mad r1.xyz, v3.xxxx, cb0[4].xyzx, r1.xyzx mad r1.xyz, v3.zzzz, cb0[6].xyzx, r1.xyzx dp3 r0.w, r1.xyzx, r1.xyzx rsq r0.w, r0.w mul r2.xyz, r0.wwww, r1.xyzx dp3 o3.z, -cb0[8].xyzx, r2.xyzx mul r2.xyz, r0.xyzx, r1.zxyz mad r0.xyz, r1.yzxy, r0.yzxy, -r2.xyzx dp3 r0.w, r0.xyzx, r0.xyzx rsq r0.w, r0.w mul r0.xyz, r0.wwww, r0.xyzx dp3 o3.y, -cb0[8].xyzx, r0.xyzx ret // Approximately 29 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/ [/code] 1aa9878c95cdef16-ps_replace.txt: [code] //Decals 2 and haloing PS. // ---- Created with 3Dmigoto v1.2.64 on Sat Sep 09 12:14:45 2017 cbuffer _Globals : register(b0) { row_major float4x4 _sp_view_projection_inverse_matrix : packoffset(c0); row_major float4x4 _sp_view_projection_texture_matrix : packoffset(c4); float4 _sp_screen_coord_offset : packoffset(c8); } SamplerState _samp0_samplerstate_s : register(s0); SamplerState _samp1_samplerstate_s : register(s1); SamplerState _samp4_samplerstate_s : register(s4); Texture2D<float4> _samp0 : register(t0); Texture2D<float4> _samp1 : register(t1); Texture2D<float4> _samp4 : register(t4); // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); void main( float4 v0 : SV_Position0, float4 v1 : TEXCOORD7, float4 v2 : TEXCOORD0, float4 v3 : TEXCOORD2, float4 v4 : TEXCOORD3, uint v5 : SV_IsFrontFace0, out float4 o0 : SV_Target0) { float4 r0,r1,r2; uint4 bitmask, uiDest; float4 fDest; r0.xy = v4.xy / v4.ww; r1.xyzw = _sp_view_projection_inverse_matrix._m10_m11_m12_m13 * r0.yyyy; r1.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m00_m01_m02_m03 + r1.xyzw; r0.xy = r0.xy * _sp_screen_coord_offset.xy + _sp_screen_coord_offset.zw; r0.x = _samp4.Sample(_samp4_samplerstate_s, r0.xy).x; r0.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m20_m21_m22_m23 + r1.xyzw; r0.xyzw = _sp_view_projection_inverse_matrix._m30_m31_m32_m33 + r0.xyzw; r0.xyzw = r0.xyzw / r0.wwww; r1.xy = cmp(r0.xz < float2(-1,-1)); r1.x = (int)r1.y | (int)r1.x; r1.yz = cmp(float2(1,1) < r0.xz); r1.x = (int)r1.y | (int)r1.x; r1.x = (int)r1.z | (int)r1.x; if (r1.x != 0) discard; r1.xy = _sp_view_projection_texture_matrix._m10_m12 * r0.yy; r1.xy = r0.xx * _sp_view_projection_texture_matrix._m00_m02 + r1.xy; r0.xz = r0.zz * _sp_view_projection_texture_matrix._m20_m22 + r1.xy; r0.xz = r0.ww * _sp_view_projection_texture_matrix._m30_m32 + r0.xz; r0.y = abs(r0.y) * 2 + -1; r0.y = saturate(1 + -r0.y); r1.w = v1.w * r0.y; r2.xyzw = _samp0.Sample(_samp0_samplerstate_s, r0.xz).xyzw; r0.xyz = _samp1.Sample(_samp1_samplerstate_s, r0.xz).xyz; r0.xyz = r0.xyz * float3(2,2,2) + float3(-1,-1,-1); r1.xyz = v1.xyz; r1.xyzw = saturate(r2.xyzw * r1.xyzw); r0.w = 1 + -r1.w; r2.x = v5.x ? 1 : -1; r0.xyz = r2.xxx * r0.xyz; r2.x = dot(r0.xyz, r0.xyz); r2.x = rsqrt(r2.x); r0.xyz = r2.xxx * r0.xyz; r2.x = dot(v3.xyz, v3.xyz); r2.x = rsqrt(r2.x); r2.xyz = v3.xyz * r2.xxx; r0.x = dot(r2.xyz, r0.xyz); r0.x = max(0, r0.x); r0.x = r0.x * 0.5 + 0.5; r0.x = r0.x * r0.x; r0.xyz = r1.xyz * r0.xxx + float3(0.100000001,0.100000001,0.100000001); o0.w = r1.w; r1.x = dot(r0.xyz, float3(0.298999995,0.587000012,0.114)); o0.xyz = r0.xyz; r0.x = cmp(r1.x < r0.w); if (r0.x != 0) discard; return; } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789 // // using 3Dmigoto v1.2.64 on Sat Sep 09 12:14:45 2017 // // // Buffer Definitions: // // cbuffer $Globals // { // // row_major float4x4 _sp_view_projection_inverse_matrix;// Offset: 0 Size: 64 // row_major float4x4 _sp_view_projection_texture_matrix;// Offset: 64 Size: 64 // float4 _sp_screen_coord_offset; // Offset: 128 Size: 16 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // _samp0_samplerstate sampler NA NA 0 1 // _samp1_samplerstate sampler NA NA 1 1 // _samp4_samplerstate sampler NA NA 4 1 // _samp0 texture float4 2d 0 1 // _samp1 texture float4 2d 1 1 // _samp4 texture float4 2d 4 1 // $Globals cbuffer NA NA 0 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float // TEXCOORD 7 xyzw 1 NONE float xyzw // TEXCOORD 0 xy 2 NONE float // TEXCOORD 2 xyz 3 NONE float xyz // TEXCOORD 3 xyzw 4 NONE float xy w // SV_IsFrontFace 0 x 5 FFACE uint x // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET float xyzw // ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[9], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s1, mode_default dcl_sampler s4, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_resource_texture2d (float,float,float,float) t4 dcl_input_ps linear v1.xyzw dcl_input_ps linear v3.xyz dcl_input_ps linear v4.xyw dcl_input_ps_sgv v5.x, is_front_face dcl_output o0.xyzw dcl_temps 3 div r0.xy, v4.xyxx, v4.wwww mul r1.xyzw, r0.yyyy, cb0[1].xyzw mad r1.xyzw, r0.xxxx, cb0[0].xyzw, r1.xyzw mad r0.xy, r0.xyxx, cb0[8].xyxx, cb0[8].zwzz sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t4.xyzw, s4 mad r0.xyzw, r0.xxxx, cb0[2].xyzw, r1.xyzw add r0.xyzw, r0.xyzw, cb0[3].xyzw div r0.xyzw, r0.xyzw, r0.wwww lt r1.xy, r0.xzxx, l(-1.000000, -1.000000, 0.000000, 0.000000) or r1.x, r1.y, r1.x lt r1.yz, l(0.000000, 1.000000, 1.000000, 0.000000), r0.xxzx or r1.x, r1.y, r1.x or r1.x, r1.z, r1.x discard_nz r1.x mul r1.xy, r0.yyyy, cb0[5].xzxx mad r1.xy, r0.xxxx, cb0[4].xzxx, r1.xyxx mad r0.xz, r0.zzzz, cb0[6].xxzx, r1.xxyx mad r0.xz, r0.wwww, cb0[7].xxzx, r0.xxzx mad r0.y, |r0.y|, l(2.000000), l(-1.000000) add_sat r0.y, -r0.y, l(1.000000) mul r1.w, r0.y, v1.w sample_indexable(texture2d)(float,float,float,float) r2.xyzw, r0.xzxx, t0.xyzw, s0 sample_indexable(texture2d)(float,float,float,float) r0.xyz, r0.xzxx, t1.xyzw, s1 mad r0.xyz, r0.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000) mov r1.xyz, v1.xyzx mul_sat r1.xyzw, r1.xyzw, r2.xyzw add r0.w, -r1.w, l(1.000000) movc r2.x, v5.x, l(1.000000), l(-1.000000) mul r0.xyz, r0.xyzx, r2.xxxx dp3 r2.x, r0.xyzx, r0.xyzx rsq r2.x, r2.x mul r0.xyz, r0.xyzx, r2.xxxx dp3 r2.x, v3.xyzx, v3.xyzx rsq r2.x, r2.x mul r2.xyz, r2.xxxx, v3.xyzx dp3 r0.x, r2.xyzx, r0.xyzx max r0.x, r0.x, l(0.000000) mad r0.x, r0.x, l(0.500000), l(0.500000) mul r0.x, r0.x, r0.x mad r0.xyz, r1.xyzx, r0.xxxx, l(0.100000, 0.100000, 0.100000, 0.000000) mov o0.w, r1.w dp3 r1.x, r0.xyzx, l(0.299000, 0.587000, 0.114000, 0.000000) mov o0.xyz, r0.xyzx lt r0.x, r1.x, r0.w discard_nz r0.x ret // Approximately 46 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/ [/code] Example of the second pair at 0 convergence: http://u.cubeupload.com/masterotaku/OPBB003.jpg I've looked at other fixes (Lords of the Fallen, Skyrim SE and Tekken 7), and this is too different to just copy-paste.
I have a problem with One Piece Burning Blood. There are two vertex/pixel shader pairs that are hard to fix. The strangest one is this (unmodified):

429c49f40e31af4e-vs_replace.txt:

//Haloing 1 (and decals).
// ---- Created with 3Dmigoto v1.2.64 on Fri Sep 08 20:28:34 2017

cbuffer _Globals : register(b0)
{
row_major float4x4 _sp_world_view_projection_matrix : packoffset(c0);
}



// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);


void main(
float4 v0 : POSITION0,
float4 v1 : COLOR0,
float2 v2 : TEXCOORD0,
out float4 o0 : SV_Position0,
out float4 o1 : TEXCOORD7,
out float4 o2 : TEXCOORD0,
out float4 o3 : TEXCOORD3)
{
float4 r0;
uint4 bitmask, uiDest;
float4 fDest;

r0.xyzw = _sp_world_view_projection_matrix._m10_m11_m12_m13 * v0.yyyy;
r0.xyzw = v0.xxxx * _sp_world_view_projection_matrix._m00_m01_m02_m03 + r0.xyzw;
r0.xyzw = v0.zzzz * _sp_world_view_projection_matrix._m20_m21_m22_m23 + r0.xyzw;
r0.xyzw = v0.wwww * _sp_world_view_projection_matrix._m30_m31_m32_m33 + r0.xyzw;
o0.xyzw = r0.xyzw;
o3.xyzw = r0.xyzw;
o1.xyzw = v1.xyzw + v1.xyzw;
o2.xy = v2.xy;
return;
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
//
// using 3Dmigoto v1.2.64 on Fri Sep 08 20:28:34 2017
//
//
// Buffer Definitions:
//
// cbuffer $Globals
// {
//
// row_major float4x4 _sp_world_view_projection_matrix;// Offset: 0 Size: 64
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// $Globals cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// POSITION 0 xyzw 0 NONE float xyzw
// COLOR 0 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float xy
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position 0 xyzw 0 POS float xyzw
// TEXCOORD 7 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float xy
// TEXCOORD 3 xyzw 3 NONE float xyzw
//
vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[4], immediateIndexed
dcl_input v0.xyzw
dcl_input v1.xyzw
dcl_input v2.xy
dcl_output_siv o0.xyzw, position
dcl_output o1.xyzw
dcl_output o2.xy
dcl_output o3.xyzw
dcl_temps 1
mul r0.xyzw, v0.yyyy, cb0[1].xyzw
mad r0.xyzw, v0.xxxx, cb0[0].xyzw, r0.xyzw
mad r0.xyzw, v0.zzzz, cb0[2].xyzw, r0.xyzw
mad r0.xyzw, v0.wwww, cb0[3].xyzw, r0.xyzw
mov o0.xyzw, r0.xyzw
mov o3.xyzw, r0.xyzw
add o1.xyzw, v1.xyzw, v1.xyzw
mov o2.xy, v2.xyxx
ret
// Approximately 9 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/


14e62becc66862ae-ps_replace.txt:

//Decals (haloing 1).
// ---- Created with 3Dmigoto v1.2.64 on Fri Sep 08 21:13:39 2017

cbuffer _Globals : register(b0)
{
row_major float4x4 _sp_view_projection_inverse_matrix : packoffset(c0);
row_major float4x4 _sp_view_projection_texture_matrix : packoffset(c4);
float4 _sp_screen_coord_offset : packoffset(c8);
}

SamplerState _samp0_samplerstate_s : register(s0);
SamplerState _samp4_samplerstate_s : register(s4);
Texture2D<float4> _samp0 : register(t0);
Texture2D<float4> _samp4 : register(t4);


// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);

#include "matrix.hlsl"

void main(
float4 v0 : SV_Position0,
float4 v1 : TEXCOORD7,
float4 v2 : TEXCOORD0,
float4 v3 : TEXCOORD3,
out float4 o0 : SV_Target0)
{
float4 r0,r1;
uint4 bitmask, uiDest;
float4 fDest;

r0.xy = v3.xy / v3.ww;
r1.xyzw = _sp_view_projection_inverse_matrix._m10_m11_m12_m13 * r0.yyyy;
r1.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m00_m01_m02_m03 + r1.xyzw;
r0.xy = r0.xy * _sp_screen_coord_offset.xy + _sp_screen_coord_offset.zw;
r0.x = _samp4.Sample(_samp4_samplerstate_s, r0.xy).x;
r0.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m20_m21_m22_m23 + r1.xyzw;
r0.xyzw = _sp_view_projection_inverse_matrix._m30_m31_m32_m33 + r0.xyzw;
r0.xyzw = r0.xyzw / r0.wwww;
r1.xy = cmp(r0.xz < float2(-1,-1));
r1.x = (int)r1.y | (int)r1.x;
r1.yz = cmp(float2(1,1) < r0.xz);
r1.x = (int)r1.y | (int)r1.x;
r1.x = (int)r1.z | (int)r1.x;
if (r1.x != 0) discard;
r1.xy = _sp_view_projection_texture_matrix._m10_m12 * r0.yy;
r1.xy = r0.xx * _sp_view_projection_texture_matrix._m00_m02 + r1.xy;
r0.xz = r0.zz * _sp_view_projection_texture_matrix._m20_m22 + r1.xy;
r0.xz = r0.ww * _sp_view_projection_texture_matrix._m30_m32 + r0.xz;
r0.y = abs(r0.y) * 2 + -1;
r0.y = saturate(1 + -r0.y);
r1.w = v1.w * r0.y;
r0.xyzw = _samp0.Sample(_samp0_samplerstate_s, r0.xz).xyzw;
r1.xyz = v1.xyz;
o0.xyzw = saturate(r0.xyzw * r1.xyzw);
return;
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
//
// using 3Dmigoto v1.2.64 on Fri Sep 08 21:13:39 2017
//
//
// Buffer Definitions:
//
// cbuffer $Globals
// {
//
// row_major float4x4 _sp_view_projection_inverse_matrix;// Offset: 0 Size: 64
// row_major float4x4 _sp_view_projection_texture_matrix;// Offset: 64 Size: 64
// float4 _sp_screen_coord_offset; // Offset: 128 Size: 16
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// _samp0_samplerstate sampler NA NA 0 1
// _samp4_samplerstate sampler NA NA 4 1
// _samp0 texture float4 2d 0 1
// _samp4 texture float4 2d 4 1
// $Globals cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position 0 xyzw 0 POS float
// TEXCOORD 7 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float
// TEXCOORD 3 xyzw 3 NONE float xy w
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Target 0 xyzw 0 TARGET float xyzw
//
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[9], immediateIndexed
dcl_sampler s0, mode_default
dcl_sampler s4, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_resource_texture2d (float,float,float,float) t4
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v3.xyw
dcl_output o0.xyzw
dcl_temps 2
div r0.xy, v3.xyxx, v3.wwww
mul r1.xyzw, r0.yyyy, cb0[1].xyzw
mad r1.xyzw, r0.xxxx, cb0[0].xyzw, r1.xyzw
mad r0.xy, r0.xyxx, cb0[8].xyxx, cb0[8].zwzz
sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t4.xyzw, s4
mad r0.xyzw, r0.xxxx, cb0[2].xyzw, r1.xyzw
add r0.xyzw, r0.xyzw, cb0[3].xyzw
div r0.xyzw, r0.xyzw, r0.wwww
lt r1.xy, r0.xzxx, l(-1.000000, -1.000000, 0.000000, 0.000000)
or r1.x, r1.y, r1.x
lt r1.yz, l(0.000000, 1.000000, 1.000000, 0.000000), r0.xxzx
or r1.x, r1.y, r1.x
or r1.x, r1.z, r1.x
discard_nz r1.x
mul r1.xy, r0.yyyy, cb0[5].xzxx
mad r1.xy, r0.xxxx, cb0[4].xzxx, r1.xyxx
mad r0.xz, r0.zzzz, cb0[6].xxzx, r1.xxyx
mad r0.xz, r0.wwww, cb0[7].xxzx, r0.xxzx
mad r0.y, |r0.y|, l(2.000000), l(-1.000000)
add_sat r0.y, -r0.y, l(1.000000)
mul r1.w, r0.y, v1.w
sample_indexable(texture2d)(float,float,float,float) r0.xyzw, r0.xzxx, t0.xyzw, s0
mov r1.xyz, v1.xyzx
mul_sat o0.xyzw, r1.xyzw, r0.xyzw
ret
// Approximately 25 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/



By default, the ground decal is correct only with 0 convergence, but there's a character silhouette to the right of the character in both eyes that depends on separation. Doing a "r0.x+=stereo.x*stereo.z;" just after r0.xy is defined in the PS puts that silhouette into the correct place at 0 convergence, but the ground decal is moved and incorrect.

Correcting o3 with "o3.x+=stereo.x*(o3.w-stereo.y);" fixes some haloing problems unrelated to the decal, but now the decal is at screen depth...


The other pair is a bit different but with the same decal issue. 03fa26c97bff244f-vs_replace.txt:

//Decals 2 and haloing.
// ---- Created with 3Dmigoto v1.2.64 on Sat Sep 09 12:06:41 2017

cbuffer _Globals : register(b0)
{
row_major float4x4 _sp_world_view_projection_matrix : packoffset(c0);
row_major float4x4 _sp_world_matrix : packoffset(c4);
float4 _sp_light_direction : packoffset(c8);
}



// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);


void main(
float4 v0 : POSITION0,
float4 v1 : COLOR0,
float2 v2 : TEXCOORD0,
float3 v3 : NORMAL0,
float3 v4 : TANGENT0,
out float4 o0 : SV_Position0,
out float4 o1 : TEXCOORD7,
out float4 o2 : TEXCOORD0,
out float4 o3 : TEXCOORD2,
out float4 o4 : TEXCOORD3)
{
float4 r0,r1,r2;
uint4 bitmask, uiDest;
float4 fDest;

r0.xyzw = _sp_world_view_projection_matrix._m10_m11_m12_m13 * v0.yyyy;
r0.xyzw = v0.xxxx * _sp_world_view_projection_matrix._m00_m01_m02_m03 + r0.xyzw;
r0.xyzw = v0.zzzz * _sp_world_view_projection_matrix._m20_m21_m22_m23 + r0.xyzw;
r0.xyzw = v0.wwww * _sp_world_view_projection_matrix._m30_m31_m32_m33 + r0.xyzw;
o0.xyzw = r0.xyzw;
o4.xyzw = r0.xyzw;
o1.xyzw = v1.xyzw + v1.xyzw;
o2.xy = v2.xy;
r0.xyz = _sp_world_matrix._m11_m12_m10 * v4.yyy;
r0.xyz = v4.xxx * _sp_world_matrix._m01_m02_m00 + r0.xyz;
r0.xyz = v4.zzz * _sp_world_matrix._m21_m22_m20 + r0.xyz;
r0.w = dot(r0.xyz, r0.xyz);
r0.w = rsqrt(r0.w);
r1.xyz = r0.zxy * r0.www;
o3.x = dot(-_sp_light_direction.xyz, r1.xyz);
r1.xyz = _sp_world_matrix._m10_m11_m12 * v3.yyy;
r1.xyz = v3.xxx * _sp_world_matrix._m00_m01_m02 + r1.xyz;
r1.xyz = v3.zzz * _sp_world_matrix._m20_m21_m22 + r1.xyz;
r0.w = dot(r1.xyz, r1.xyz);
r0.w = rsqrt(r0.w);
r2.xyz = r1.xyz * r0.www;
o3.z = dot(-_sp_light_direction.xyz, r2.xyz);
r2.xyz = r1.zxy * r0.xyz;
r0.xyz = r1.yzx * r0.yzx + -r2.xyz;
r0.w = dot(r0.xyz, r0.xyz);
r0.w = rsqrt(r0.w);
r0.xyz = r0.xyz * r0.www;
o3.y = dot(-_sp_light_direction.xyz, r0.xyz);
return;
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
//
// using 3Dmigoto v1.2.64 on Sat Sep 09 12:06:41 2017
//
//
// Buffer Definitions:
//
// cbuffer $Globals
// {
//
// row_major float4x4 _sp_world_view_projection_matrix;// Offset: 0 Size: 64
// row_major float4x4 _sp_world_matrix;// Offset: 64 Size: 64
// float4 _sp_light_direction; // Offset: 128 Size: 16
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// $Globals cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// POSITION 0 xyzw 0 NONE float xyzw
// COLOR 0 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float xy
// NORMAL 0 xyz 3 NONE float xyz
// TANGENT 0 xyz 4 NONE float xyz
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position 0 xyzw 0 POS float xyzw
// TEXCOORD 7 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float xy
// TEXCOORD 2 xyz 3 NONE float xyz
// TEXCOORD 3 xyzw 4 NONE float xyzw
//
vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[9], immediateIndexed
dcl_input v0.xyzw
dcl_input v1.xyzw
dcl_input v2.xy
dcl_input v3.xyz
dcl_input v4.xyz
dcl_output_siv o0.xyzw, position
dcl_output o1.xyzw
dcl_output o2.xy
dcl_output o3.xyz
dcl_output o4.xyzw
dcl_temps 3
mul r0.xyzw, v0.yyyy, cb0[1].xyzw
mad r0.xyzw, v0.xxxx, cb0[0].xyzw, r0.xyzw
mad r0.xyzw, v0.zzzz, cb0[2].xyzw, r0.xyzw
mad r0.xyzw, v0.wwww, cb0[3].xyzw, r0.xyzw
mov o0.xyzw, r0.xyzw
mov o4.xyzw, r0.xyzw
add o1.xyzw, v1.xyzw, v1.xyzw
mov o2.xy, v2.xyxx
mul r0.xyz, v4.yyyy, cb0[5].yzxy
mad r0.xyz, v4.xxxx, cb0[4].yzxy, r0.xyzx
mad r0.xyz, v4.zzzz, cb0[6].yzxy, r0.xyzx
dp3 r0.w, r0.xyzx, r0.xyzx
rsq r0.w, r0.w
mul r1.xyz, r0.wwww, r0.zxyz
dp3 o3.x, -cb0[8].xyzx, r1.xyzx
mul r1.xyz, v3.yyyy, cb0[5].xyzx
mad r1.xyz, v3.xxxx, cb0[4].xyzx, r1.xyzx
mad r1.xyz, v3.zzzz, cb0[6].xyzx, r1.xyzx
dp3 r0.w, r1.xyzx, r1.xyzx
rsq r0.w, r0.w
mul r2.xyz, r0.wwww, r1.xyzx
dp3 o3.z, -cb0[8].xyzx, r2.xyzx
mul r2.xyz, r0.xyzx, r1.zxyz
mad r0.xyz, r1.yzxy, r0.yzxy, -r2.xyzx
dp3 r0.w, r0.xyzx, r0.xyzx
rsq r0.w, r0.w
mul r0.xyz, r0.wwww, r0.xyzx
dp3 o3.y, -cb0[8].xyzx, r0.xyzx
ret
// Approximately 29 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/



1aa9878c95cdef16-ps_replace.txt:

//Decals 2 and haloing PS.
// ---- Created with 3Dmigoto v1.2.64 on Sat Sep 09 12:14:45 2017

cbuffer _Globals : register(b0)
{
row_major float4x4 _sp_view_projection_inverse_matrix : packoffset(c0);
row_major float4x4 _sp_view_projection_texture_matrix : packoffset(c4);
float4 _sp_screen_coord_offset : packoffset(c8);
}

SamplerState _samp0_samplerstate_s : register(s0);
SamplerState _samp1_samplerstate_s : register(s1);
SamplerState _samp4_samplerstate_s : register(s4);
Texture2D<float4> _samp0 : register(t0);
Texture2D<float4> _samp1 : register(t1);
Texture2D<float4> _samp4 : register(t4);


// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);


void main(
float4 v0 : SV_Position0,
float4 v1 : TEXCOORD7,
float4 v2 : TEXCOORD0,
float4 v3 : TEXCOORD2,
float4 v4 : TEXCOORD3,
uint v5 : SV_IsFrontFace0,
out float4 o0 : SV_Target0)
{
float4 r0,r1,r2;
uint4 bitmask, uiDest;
float4 fDest;

r0.xy = v4.xy / v4.ww;
r1.xyzw = _sp_view_projection_inverse_matrix._m10_m11_m12_m13 * r0.yyyy;
r1.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m00_m01_m02_m03 + r1.xyzw;
r0.xy = r0.xy * _sp_screen_coord_offset.xy + _sp_screen_coord_offset.zw;
r0.x = _samp4.Sample(_samp4_samplerstate_s, r0.xy).x;
r0.xyzw = r0.xxxx * _sp_view_projection_inverse_matrix._m20_m21_m22_m23 + r1.xyzw;
r0.xyzw = _sp_view_projection_inverse_matrix._m30_m31_m32_m33 + r0.xyzw;
r0.xyzw = r0.xyzw / r0.wwww;
r1.xy = cmp(r0.xz < float2(-1,-1));
r1.x = (int)r1.y | (int)r1.x;
r1.yz = cmp(float2(1,1) < r0.xz);
r1.x = (int)r1.y | (int)r1.x;
r1.x = (int)r1.z | (int)r1.x;
if (r1.x != 0) discard;
r1.xy = _sp_view_projection_texture_matrix._m10_m12 * r0.yy;
r1.xy = r0.xx * _sp_view_projection_texture_matrix._m00_m02 + r1.xy;
r0.xz = r0.zz * _sp_view_projection_texture_matrix._m20_m22 + r1.xy;
r0.xz = r0.ww * _sp_view_projection_texture_matrix._m30_m32 + r0.xz;
r0.y = abs(r0.y) * 2 + -1;
r0.y = saturate(1 + -r0.y);
r1.w = v1.w * r0.y;
r2.xyzw = _samp0.Sample(_samp0_samplerstate_s, r0.xz).xyzw;
r0.xyz = _samp1.Sample(_samp1_samplerstate_s, r0.xz).xyz;
r0.xyz = r0.xyz * float3(2,2,2) + float3(-1,-1,-1);
r1.xyz = v1.xyz;
r1.xyzw = saturate(r2.xyzw * r1.xyzw);
r0.w = 1 + -r1.w;
r2.x = v5.x ? 1 : -1;
r0.xyz = r2.xxx * r0.xyz;
r2.x = dot(r0.xyz, r0.xyz);
r2.x = rsqrt(r2.x);
r0.xyz = r2.xxx * r0.xyz;
r2.x = dot(v3.xyz, v3.xyz);
r2.x = rsqrt(r2.x);
r2.xyz = v3.xyz * r2.xxx;
r0.x = dot(r2.xyz, r0.xyz);
r0.x = max(0, r0.x);
r0.x = r0.x * 0.5 + 0.5;
r0.x = r0.x * r0.x;
r0.xyz = r1.xyz * r0.xxx + float3(0.100000001,0.100000001,0.100000001);
o0.w = r1.w;
r1.x = dot(r0.xyz, float3(0.298999995,0.587000012,0.114));
o0.xyz = r0.xyz;
r0.x = cmp(r1.x < r0.w);
if (r0.x != 0) discard;
return;
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
//
// using 3Dmigoto v1.2.64 on Sat Sep 09 12:14:45 2017
//
//
// Buffer Definitions:
//
// cbuffer $Globals
// {
//
// row_major float4x4 _sp_view_projection_inverse_matrix;// Offset: 0 Size: 64
// row_major float4x4 _sp_view_projection_texture_matrix;// Offset: 64 Size: 64
// float4 _sp_screen_coord_offset; // Offset: 128 Size: 16
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// _samp0_samplerstate sampler NA NA 0 1
// _samp1_samplerstate sampler NA NA 1 1
// _samp4_samplerstate sampler NA NA 4 1
// _samp0 texture float4 2d 0 1
// _samp1 texture float4 2d 1 1
// _samp4 texture float4 2d 4 1
// $Globals cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position 0 xyzw 0 POS float
// TEXCOORD 7 xyzw 1 NONE float xyzw
// TEXCOORD 0 xy 2 NONE float
// TEXCOORD 2 xyz 3 NONE float xyz
// TEXCOORD 3 xyzw 4 NONE float xy w
// SV_IsFrontFace 0 x 5 FFACE uint x
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Target 0 xyzw 0 TARGET float xyzw
//
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[9], immediateIndexed
dcl_sampler s0, mode_default
dcl_sampler s1, mode_default
dcl_sampler s4, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_resource_texture2d (float,float,float,float) t1
dcl_resource_texture2d (float,float,float,float) t4
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v3.xyz
dcl_input_ps linear v4.xyw
dcl_input_ps_sgv v5.x, is_front_face
dcl_output o0.xyzw
dcl_temps 3
div r0.xy, v4.xyxx, v4.wwww
mul r1.xyzw, r0.yyyy, cb0[1].xyzw
mad r1.xyzw, r0.xxxx, cb0[0].xyzw, r1.xyzw
mad r0.xy, r0.xyxx, cb0[8].xyxx, cb0[8].zwzz
sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t4.xyzw, s4
mad r0.xyzw, r0.xxxx, cb0[2].xyzw, r1.xyzw
add r0.xyzw, r0.xyzw, cb0[3].xyzw
div r0.xyzw, r0.xyzw, r0.wwww
lt r1.xy, r0.xzxx, l(-1.000000, -1.000000, 0.000000, 0.000000)
or r1.x, r1.y, r1.x
lt r1.yz, l(0.000000, 1.000000, 1.000000, 0.000000), r0.xxzx
or r1.x, r1.y, r1.x
or r1.x, r1.z, r1.x
discard_nz r1.x
mul r1.xy, r0.yyyy, cb0[5].xzxx
mad r1.xy, r0.xxxx, cb0[4].xzxx, r1.xyxx
mad r0.xz, r0.zzzz, cb0[6].xxzx, r1.xxyx
mad r0.xz, r0.wwww, cb0[7].xxzx, r0.xxzx
mad r0.y, |r0.y|, l(2.000000), l(-1.000000)
add_sat r0.y, -r0.y, l(1.000000)
mul r1.w, r0.y, v1.w
sample_indexable(texture2d)(float,float,float,float) r2.xyzw, r0.xzxx, t0.xyzw, s0
sample_indexable(texture2d)(float,float,float,float) r0.xyz, r0.xzxx, t1.xyzw, s1
mad r0.xyz, r0.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000)
mov r1.xyz, v1.xyzx
mul_sat r1.xyzw, r1.xyzw, r2.xyzw
add r0.w, -r1.w, l(1.000000)
movc r2.x, v5.x, l(1.000000), l(-1.000000)
mul r0.xyz, r0.xyzx, r2.xxxx
dp3 r2.x, r0.xyzx, r0.xyzx
rsq r2.x, r2.x
mul r0.xyz, r0.xyzx, r2.xxxx
dp3 r2.x, v3.xyzx, v3.xyzx
rsq r2.x, r2.x
mul r2.xyz, r2.xxxx, v3.xyzx
dp3 r0.x, r2.xyzx, r0.xyzx
max r0.x, r0.x, l(0.000000)
mad r0.x, r0.x, l(0.500000), l(0.500000)
mul r0.x, r0.x, r0.x
mad r0.xyz, r1.xyzx, r0.xxxx, l(0.100000, 0.100000, 0.100000, 0.000000)
mov o0.w, r1.w
dp3 r1.x, r0.xyzx, l(0.299000, 0.587000, 0.114000, 0.000000)
mov o0.xyz, r0.xyzx
lt r0.x, r1.x, r0.w
discard_nz r0.x
ret
// Approximately 46 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/


Example of the second pair at 0 convergence: http://u.cubeupload.com/masterotaku/OPBB003.jpg


I've looked at other fixes (Lords of the Fallen, Skyrim SE and Tekken 7), and this is too different to just copy-paste.

Email for PayPal donations: masterotakusuko@gmail.com
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: Gainward Phoenix 1080 GLH
Monitor: Asus PG278QR
Speakers: Logitech Z506

Posted 09/09/2017 03:46 PM   
[quote="radicdotkey"]Does 3DMigoto it the current form allow to write to the IniParams from the pixel shader?[/quote]No - it doesn't have the right bind flags to allow that, and we keep a master copy of that on the CPU and update the copy on the GPU as needed without ever reading it back. [quote]If not, can I create a RWTexture1D buffer and read and write to it in the same pixel shader?[/quote]Yes, or if it is more appropriate you could also use a RWBuffer or RWStructuredBuffer. RWAppendStructuredBuffer is not fully supported yet (I have a topic branch with a work in progress for that, but we need to change how 3DMigoto handles views to make that useful). [quote]Here is a concept that doesn't work. DSS mentioned a while ago that RWTextures are untested from within the pixel shaders, but it's been some time and maybe something has changed, which I'm not aware of.[/quote]Yeah, I tested using a UAV from a pixel shader the other day and it worked fine ;-) [quote][code][ResourceTrackID][/code][/quote]3DMigoto doesn't assume any defaults so you need to either specify enough about the resource to synthesize it from scratch, or copy the missing values (either as a full copy or with copy_desc) from another resource. e.g. [code][ResourceTrackID] type = RWTexture1D width = 1 format = R32G32B32A32_FLOAT mips = 1 array = 1 [/code] Examples of other UAV types you might use (really, these are exactly the same as non-UAV types - 3DMigoto calculates the bind flags to allow these to be used as UAVs automatically based on how they are used, not what type is set to, and the only reason they even have "RW" here is to be consistent with HLSL - only append/consume & counter structured buffers will need additional special handling in 3DMigoto, and when they come it will be as special keywords to use when binding them): [code][ResourceRWTexture2D] type = RWTexture2D ; Has msaa, which should usually be set to 1 width = 1 height = 1 format = R32G32B32A32_FLOAT mips = 1 array = 1 msaa = 1 msaa_quality = 0 [ResourceRWTexture3D] type = RWTexture3D ; Has depth, but no array or msaa width = 128 height = 128 depth = 64 format = R32G32B32A32_FLOAT mips = 1 [ResourceRWBuffer] type = RWBuffer ; Buffers have a format and a length: format = R32G32B32A32_FLOAT array = 20 [ResourceStructuredBuffer] type = RWStructuredBuffer ; Structured buffers do not have a format, but you need to indicate the size of ; the structure in bytes, and how many there are: stride = 32 array = 1 [/code] [quote][code] [ShaderOverride-KeyValue] hash = bdc94f312b8472a7 ps-u1 = ref ResourceTrackID ps-t101 = ResourceTrackID [/code][/quote]You're not supposed to do that - you should unbind it from the UAV slot before binding it to a SRV and vice-versa, and should not try using it from both slot types simultaneously (I say "not supposed" to because I've read that there might be some cases where it can be used as a hack to get around certain limitations, but you really need to know what you are doing if you go down that path). You could copy it to the SRV slot to make this work, but that could become expensive if done many times in a frame. [quote][code] if (MyIni.Load(0) != 0) RWMyIni[0] = keyValue; [/code][/quote]You'll want to measure the performance of that - in Batman Arkham Knight I found that reading from a single location in a shared memory structured buffer was a performance killer, increasing the cost of the tile lighting compute shader from 5fps to 15fps on my laptop and rendering the game unplayable. It wasn't exactly the same situation as this - that was a compute shader accessing a shared memory structured buffer that was not backed by a UAV (g register, not u register), but the key takeaway was that it was just the *READs* from too many thread [groups?] at once that killed performance - no sync instructions, no writes, just reads. I entirely regained that performance loss by firing off my own compute shader beforehand with 8 threads in a single thread group to do the same work (that was where 3DMigoto's CustomShader feature originated from). Of course, if the trade off is that this saves you from having to fire off a lot of additional custom shaders (with associated pipeline state changes) then it might still be better. You'll only know once you try it and see. [quote]A bit more explanation: I use the keyValue as an index for an array of configuration parametrers for each in game track. The problem is that this value for some camera angles is 0 as I use the vignetteAmount parameter form the game config files which I set as a unique integer for each track. I need to make sure that 3DM remembers the last non 0 value which I feed to other shaders.[/quote] You might be interested in the HUD analysis compute shader I'm using for Dreamfall Chapters that is tracking various state across the frame using a UAV, that will be used to decide what depth I render the HUD at in the following frame: https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse.hlsl https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud.hlsl https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse_clear.hlsl https://raw.githubusercontent.com/DarkStarSword/3d-fixes-DreamfallChapters/master/d3dx.ini (Search for Resource_HUD_Info_UAV) There's some more shaders related to the HUD analysis, but they are doing some 3D Vision specific magic (rendering depth values to a Texture2D so that I can use stereo2mono to get the values from both eyes, since the reverse stereo blit doesn't work with buffers, and I wanted something that performed better than copying the entire depth buffer between eyes, which is known to be a bottleneck on SLI).
radicdotkey said:Does 3DMigoto it the current form allow to write to the IniParams from the pixel shader?
No - it doesn't have the right bind flags to allow that, and we keep a master copy of that on the CPU and update the copy on the GPU as needed without ever reading it back.

If not, can I create a RWTexture1D buffer and read and write to it in the same pixel shader?
Yes, or if it is more appropriate you could also use a RWBuffer or RWStructuredBuffer. RWAppendStructuredBuffer is not fully supported yet (I have a topic branch with a work in progress for that, but we need to change how 3DMigoto handles views to make that useful).

Here is a concept that doesn't work. DSS mentioned a while ago that RWTextures are untested from within the pixel shaders, but it's been some time and maybe something has changed, which I'm not aware of.
Yeah, I tested using a UAV from a pixel shader the other day and it worked fine ;-)

[ResourceTrackID]
3DMigoto doesn't assume any defaults so you need to either specify enough about the resource to synthesize it from scratch, or copy the missing values (either as a full copy or with copy_desc) from another resource. e.g.

[ResourceTrackID]
type = RWTexture1D
width = 1
format = R32G32B32A32_FLOAT
mips = 1
array = 1

Examples of other UAV types you might use (really, these are exactly the same as non-UAV types - 3DMigoto calculates the bind flags to allow these to be used as UAVs automatically based on how they are used, not what type is set to, and the only reason they even have "RW" here is to be consistent with HLSL - only append/consume & counter structured buffers will need additional special handling in 3DMigoto, and when they come it will be as special keywords to use when binding them):

[ResourceRWTexture2D]
type = RWTexture2D
; Has msaa, which should usually be set to 1
width = 1
height = 1
format = R32G32B32A32_FLOAT
mips = 1
array = 1
msaa = 1
msaa_quality = 0

[ResourceRWTexture3D]
type = RWTexture3D
; Has depth, but no array or msaa
width = 128
height = 128
depth = 64
format = R32G32B32A32_FLOAT
mips = 1

[ResourceRWBuffer]
type = RWBuffer
; Buffers have a format and a length:
format = R32G32B32A32_FLOAT
array = 20

[ResourceStructuredBuffer]
type = RWStructuredBuffer
; Structured buffers do not have a format, but you need to indicate the size of
; the structure in bytes, and how many there are:
stride = 32
array = 1



[ShaderOverride-KeyValue]
hash = bdc94f312b8472a7
ps-u1 = ref ResourceTrackID
ps-t101 = ResourceTrackID
You're not supposed to do that - you should unbind it from the UAV slot before binding it to a SRV and vice-versa, and should not try using it from both slot types simultaneously (I say "not supposed" to because I've read that there might be some cases where it can be used as a hack to get around certain limitations, but you really need to know what you are doing if you go down that path). You could copy it to the SRV slot to make this work, but that could become expensive if done many times in a frame.

if (MyIni.Load(0) != 0) RWMyIni[0] = keyValue;
You'll want to measure the performance of that - in Batman Arkham Knight I found that reading from a single location in a shared memory structured buffer was a performance killer, increasing the cost of the tile lighting compute shader from 5fps to 15fps on my laptop and rendering the game unplayable. It wasn't exactly the same situation as this - that was a compute shader accessing a shared memory structured buffer that was not backed by a UAV (g register, not u register), but the key takeaway was that it was just the *READs* from too many thread [groups?] at once that killed performance - no sync instructions, no writes, just reads. I entirely regained that performance loss by firing off my own compute shader beforehand with 8 threads in a single thread group to do the same work (that was where 3DMigoto's CustomShader feature originated from).

Of course, if the trade off is that this saves you from having to fire off a lot of additional custom shaders (with associated pipeline state changes) then it might still be better. You'll only know once you try it and see.

A bit more explanation:
I use the keyValue as an index for an array of configuration parametrers for each in game track. The problem is that this value for some camera angles is 0 as I use the vignetteAmount parameter form the game config files which I set as a unique integer for each track. I need to make sure that 3DM remembers the last non 0 value which I feed to other shaders.

You might be interested in the HUD analysis compute shader I'm using for Dreamfall Chapters that is tracking various state across the frame using a UAV, that will be used to decide what depth I render the HUD at in the following frame:

https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse_clear.hlsl
https://raw.githubusercontent.com/DarkStarSword/3d-fixes-DreamfallChapters/master/d3dx.ini (Search for Resource_HUD_Info_UAV)

There's some more shaders related to the HUD analysis, but they are doing some 3D Vision specific magic (rendering depth values to a Texture2D so that I can use stereo2mono to get the values from both eyes, since the reverse stereo blit doesn't work with buffers, and I wanted something that performed better than copying the entire depth buffer between eyes, which is known to be a bottleneck on SLI).

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/09/2017 07:20 PM   
[quote="masterotaku"]By default, the ground decal is correct only with 0 convergence[/quote]Convergence=0 puts everything at infinity, so generally not interesting. [quote]but there's a character silhouette to the right of the character in both eyes that depends on separation.[/quote]Only to the right, not ever to the left? Solve that first - if you can't get it broken on both sides you won't ever get it fixed. Might need StereoFlagsDX10=0x00004008 (the 4 is important, the 8 is for stereo2mono to work in SLI), or might need a render target forced to stereo (frame analysis is your friend) or something else. [quote]Doing a "r0.x+=stereo.x*stereo.z;" just after r0.xy is defined in the PS puts that silhouette into the correct place at 0 convergence, but the ground decal is moved and incorrect.[/quote]stereo.z is -1 or +1 depending on the eye - is that what you meant? If you want to check if the shader is even running in both eyes you can do a quick 'if (stereo.z == 1) do something noticeable' to see if that noticeable thing changed in exactly one eye, and 'if (stereo.z == -1) do something noticeable' to see if it happened in the other eye. If only one of them works, the shader is only being run in one eye. [quote]Correcting o3 with "o3.x+=stereo.x*(o3.w-stereo.y);" fixes some haloing problems unrelated to the decal, but now the decal is at screen depth...[/quote]You will almost certainly need that, and screen depth can be a good place to start since these can sometimes be similar to shadows. [quote]Example of the second pair at 0 convergence: http://u.cubeupload.com/masterotaku/OPBB003.jpg[/quote] tbh I'd be more interested in a screenshot showing a typical convergence to see how it is broken.
masterotaku said:By default, the ground decal is correct only with 0 convergence
Convergence=0 puts everything at infinity, so generally not interesting.

but there's a character silhouette to the right of the character in both eyes that depends on separation.
Only to the right, not ever to the left? Solve that first - if you can't get it broken on both sides you won't ever get it fixed. Might need StereoFlagsDX10=0x00004008 (the 4 is important, the 8 is for stereo2mono to work in SLI), or might need a render target forced to stereo (frame analysis is your friend) or something else.

Doing a "r0.x+=stereo.x*stereo.z;" just after r0.xy is defined in the PS puts that silhouette into the correct place at 0 convergence, but the ground decal is moved and incorrect.
stereo.z is -1 or +1 depending on the eye - is that what you meant? If you want to check if the shader is even running in both eyes you can do a quick 'if (stereo.z == 1) do something noticeable' to see if that noticeable thing changed in exactly one eye, and 'if (stereo.z == -1) do something noticeable' to see if it happened in the other eye. If only one of them works, the shader is only being run in one eye.

Correcting o3 with "o3.x+=stereo.x*(o3.w-stereo.y);" fixes some haloing problems unrelated to the decal, but now the decal is at screen depth...
You will almost certainly need that, and screen depth can be a good place to start since these can sometimes be similar to shadows.

Example of the second pair at 0 convergence: http://u.cubeupload.com/masterotaku/OPBB003.jpg


tbh I'd be more interested in a screenshot showing a typical convergence to see how it is broken.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/09/2017 07:50 PM   
OK, here's a screenshot with a normal convergence level: http://u.cubeupload.com/masterotaku/OPBB004.jpg And here's another about the first VS/PS pair with the corrected "o3.x": http://u.cubeupload.com/masterotaku/OPBB005.jpg I have to check the other stuff you said in a while. And by the way, I commited a typo. It was actually "r0.x+=stereo.x*-stereo.z;". That moved the gray silhouette to the left.
OK, here's a screenshot with a normal convergence level: http://u.cubeupload.com/masterotaku/OPBB004.jpg


And here's another about the first VS/PS pair with the corrected "o3.x": http://u.cubeupload.com/masterotaku/OPBB005.jpg


I have to check the other stuff you said in a while. And by the way, I commited a typo. It was actually "r0.x+=stereo.x*-stereo.z;". That moved the gray silhouette to the left.

Email for PayPal donations: masterotakusuko@gmail.com
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: Gainward Phoenix 1080 GLH
Monitor: Asus PG278QR
Speakers: Logitech Z506

Posted 09/09/2017 08:35 PM   
Thanks DSS for the explanation and the links. I found very precious info in your code, especially now I know how to deal with the missing shadows inside the car cabin caused by the windscreens depth buffer overriding the inside elements. Now I know I have to render an additional depth buffer with pixel shader skipped as you did in the given example and then do another shadow pass. Sweet! There is one thing that bothers me though about the immediate buffers. Shall I be worried about an overhead caused by the array indexing of non padded immediate buffer elements when I define an array like this for example? [code]static const float param[54][11] = { //skyOffset,skyScale,skyGamma,waterClarity,treesNewShader,flareThreshold,flareMultiplier,flareBrightness,grassMin,grassMax,grassMultiplier {0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0},//unused {0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.day {0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.day.alt {0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.night ... {0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//usa.smelter.wet }; #define pSkyOffset [0] #define pSkyScale [1] #define pSkyGamma [2] #define pWaterClarity [3] #define pTreesNewShader [4] #define pFlareThreshold [5] #define pFlareMultiplier [6] #define pFlareBrightness [7] #define pGrassMin [8] #define pGrassMax [9] #define pGrassMultiplier [10] [/code] and calling it like this? [code]float flareMultiplier = param[trackID]pFlareMultiplier[/code]
Thanks DSS for the explanation and the links. I found very precious info in your code, especially now I know how to deal with the missing shadows inside the car cabin caused by the windscreens depth buffer overriding the inside elements. Now I know I have to render an additional depth buffer with pixel shader skipped as you did in the given example and then do another shadow pass. Sweet!

There is one thing that bothers me though about the immediate buffers. Shall I be worried about an overhead caused by the array indexing of non padded immediate buffer elements when I define an array like this for example?

static const float param[54][11] = {
//skyOffset,skyScale,skyGamma,waterClarity,treesNewShader,flareThreshold,flareMultiplier,flareBrightness,grassMin,grassMax,grassMultiplier
{0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0},//unused
{0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.day
{0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.day.alt
{0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//finland.finland.rally.night
...
{0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//usa.smelter.wet
};

#define pSkyOffset [0]
#define pSkyScale [1]
#define pSkyGamma [2]
#define pWaterClarity [3]
#define pTreesNewShader [4]
#define pFlareThreshold [5]
#define pFlareMultiplier [6]
#define pFlareBrightness [7]
#define pGrassMin [8]
#define pGrassMax [9]
#define pGrassMultiplier [10]


and calling it like this?

float flareMultiplier = param[trackID]pFlareMultiplier

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/10/2017 09:23 AM   
When i after use below code to fix water wave effects #include "matrix.hlsl" // Common SVPositionToTranslatedWorld fix: float4 s = StereoParams.Load(0); matrix TranslatedWorldToClip = MATRIX(cb1, 0); matrix ClipToTranslatedWorld = MATRIX(cb1, 32); float4 t = mul(r2.xyz, TranslatedWorldToClip); t.x -= s.x * (t.w - s.y); r2.xyz = mul(t, ClipToTranslatedWorld); wave effects fixed correctly but water's color has changed(disappeared) loook at attached img plz Perhaps the alpha channel's information is missing. Depending on the depth of the water, the color that was expressed in various ways but became transparent. so i used this code in ps.txt //Fix //Translate r2.xyz (world) to clip cb1[0,1,2,3] mul r26.xyzw, r2.yyyy, cb1[1].xyzw mad r26.xyzw, r2.xxxx, cb1[0].xyzw, r26.xyzw mad r26.xyzw, r2.zzzz, cb1[2].xyzw, r26.xyzw add r26.xyzw, r26.xyzw, cb1[3].xyzw //Fix Clip add r25.w, r26.w, -r25.y mul r25.w, r25.x, r25.w add r26.x, r26.x, -r25.w //Translate r26.xyz (Clip) to world cb1[32,33,34,35] mul r2.xyzw, r26.yyyy, cb1[33].xyzw mad r2.xyzw, r26.xxxx, cb1[32].xyzw, r2.xyzw mad r2.xyzw, r26.zzzz, cb1[34].xyzw, r2.xyzw mad r2.xyzw, r26.wwww, cb1[35].xyzw, r2.xyzw then color issue has gone why this issue happened?
When i after use below code to fix water wave effects

#include "matrix.hlsl"

// Common SVPositionToTranslatedWorld fix:
float4 s = StereoParams.Load(0);
matrix TranslatedWorldToClip = MATRIX(cb1, 0);
matrix ClipToTranslatedWorld = MATRIX(cb1, 32);
float4 t = mul(r2.xyz, TranslatedWorldToClip);
t.x -= s.x * (t.w - s.y);
r2.xyz = mul(t, ClipToTranslatedWorld);


wave effects fixed correctly

but
water's color has changed(disappeared)

loook at attached img plz

Perhaps the alpha channel's information is missing.

Depending on the depth of the water, the color that was expressed in various ways but became transparent.

so i used this code in ps.txt

//Fix
//Translate r2.xyz (world) to clip cb1[0,1,2,3]
mul r26.xyzw, r2.yyyy, cb1[1].xyzw
mad r26.xyzw, r2.xxxx, cb1[0].xyzw, r26.xyzw
mad r26.xyzw, r2.zzzz, cb1[2].xyzw, r26.xyzw
add r26.xyzw, r26.xyzw, cb1[3].xyzw
//Fix Clip
add r25.w, r26.w, -r25.y
mul r25.w, r25.x, r25.w
add r26.x, r26.x, -r25.w
//Translate r26.xyz (Clip) to world cb1[32,33,34,35]
mul r2.xyzw, r26.yyyy, cb1[33].xyzw
mad r2.xyzw, r26.xxxx, cb1[32].xyzw, r2.xyzw
mad r2.xyzw, r26.zzzz, cb1[34].xyzw, r2.xyzw
mad r2.xyzw, r26.wwww, cb1[35].xyzw, r2.xyzw

then color issue has gone


why this issue happened?

i7 6700K
GTX1080Ti SLI
Asrock Z170 Extreme6
Windows10 64bit
Samsung UHD3dtv HU9000 (Active shutter full 4k3D@30FPS per each eye)

Posted 09/10/2017 10:04 AM   
[quote="radicdotkey"]Thanks DSS for the explanation and the links. I found very precious info in your code, especially now I know how to deal with the missing shadows inside the car cabin caused by the windscreens depth buffer overriding the inside elements. Now I know I have to render an additional depth buffer with pixel shader skipped as you did in the given example and then do another shadow pass. Sweet![/quote]Hehe, just be aware that 'depth_enable=true', 'depth_write_mask=all' and 'depth_stencil_state_merge=true' are features that will be available in the next version of 3DMigoto (though I guess you are building from source so already have them ;-), which I plan to release as soon as I've finished playing through DFC - hopefully in the next 24 hours assuming I don't come across any more showstoppers (every time I think I've solved everything it throws another curve ball at me). I'll be doing a more in depth write up about all the new features then with examples of how I have used them in DFC then. [quote]There is one thing that bothers me though about the immediate buffers. Shall I be worried about an overhead caused by the array indexing of non padded immediate buffer elements when I define an array like this for example?[/quote]Unless you do it a lot every frame it will most likely be negligible. e.g. I had to do some optimisations in Dreamfall Chapters to regain framerate, and I found that reducing the number of instructions and even stripping our entire matrix multiplies from tonnes of shaders made absolutely no difference (maybe 1fps, but it was in the noise), while switching my lighting pattern from HLSL to assembly regained 5-15fps depending on the location (that had been lost purely due to inefficienceies in the HLSL decompilation and recompilation process), and limiting the number of full buffer copies I performed regained another 5fps, taking me back to within 1fps of the vanilla 3D framerate (volumetric ray marched light shafts still hurt the framerate a little more in fixed 3D than broken 3D due to the extra texture clears and running the relevant shaders in both eyes instead of one). In other words, if you are having a performance issue it pays to identify where it is coming from and go after the big fish rather than trying to optimise every little small thing, especially when that costs development time or makes the code harder to understand. This is universally true in all software engineering - as the saying goes, "Premature optimisation is the root of all evil". If you want to see what assembly your shader will generate, use fxc from the Windows SDK like: [code]fxc.exe /T ps_5_0 test.hlsl[/code] In this case it is producing these instructions to calculate the offset into the icb: [code] imul null, r0.x, v0.x, l(11) mov o0.xyzw, icb[r0.x + 10].xxxx [/code] Compared to accessing it from a structured buffer: [code] ld_structured_indexable(structured_buffer, stride=44)(mixed,mixed,mixed,mixed) r0.x, v0.x, l(40), t100.xxxx [/code] One extra instruction is unlikely to hurt performance. [quote][code]static const float param[54][11] = { ... }; #define pSkyOffset [0] ... #define pGrassMultiplier [10] [/code] and calling it like this? [code]float flareMultiplier = param[trackID]pFlareMultiplier[/code][/quote]Performance considerations aside, I'd suggest doing this instead - the "static const" will still put this in the immediate constant buffer, but the code is a little nicer than using macros, and no need to specify the array size: [code] struct param_entry { float pSkyOffset; float pSkyScale; float pSkyGamma; float pWaterClarity; float pTreesNewShader; float pFlareThreshold; float pFlareMultiplier; float pFlareBrightness; float pGrassMin; float pGrassMax; float pGrassMultiplier; }; static const struct param_entry param[] = { //skyOffset,skyScale,skyGamma,waterClarity,treesNewShader,flareThreshold,flareMultiplier,flareBrightness,grassMin,grassMax,grassMultiplier {0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0},//unused ... {0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//usa.smelter.wet }; ... float flareMultiplier = param[trackID].pFlareMultiplier [/code]
radicdotkey said:Thanks DSS for the explanation and the links. I found very precious info in your code, especially now I know how to deal with the missing shadows inside the car cabin caused by the windscreens depth buffer overriding the inside elements. Now I know I have to render an additional depth buffer with pixel shader skipped as you did in the given example and then do another shadow pass. Sweet!
Hehe, just be aware that 'depth_enable=true', 'depth_write_mask=all' and 'depth_stencil_state_merge=true' are features that will be available in the next version of 3DMigoto (though I guess you are building from source so already have them ;-), which I plan to release as soon as I've finished playing through DFC - hopefully in the next 24 hours assuming I don't come across any more showstoppers (every time I think I've solved everything it throws another curve ball at me). I'll be doing a more in depth write up about all the new features then with examples of how I have used them in DFC then.

There is one thing that bothers me though about the immediate buffers. Shall I be worried about an overhead caused by the array indexing of non padded immediate buffer elements when I define an array like this for example?
Unless you do it a lot every frame it will most likely be negligible. e.g. I had to do some optimisations in Dreamfall Chapters to regain framerate, and I found that reducing the number of instructions and even stripping our entire matrix multiplies from tonnes of shaders made absolutely no difference (maybe 1fps, but it was in the noise), while switching my lighting pattern from HLSL to assembly regained 5-15fps depending on the location (that had been lost purely due to inefficienceies in the HLSL decompilation and recompilation process), and limiting the number of full buffer copies I performed regained another 5fps, taking me back to within 1fps of the vanilla 3D framerate (volumetric ray marched light shafts still hurt the framerate a little more in fixed 3D than broken 3D due to the extra texture clears and running the relevant shaders in both eyes instead of one).

In other words, if you are having a performance issue it pays to identify where it is coming from and go after the big fish rather than trying to optimise every little small thing, especially when that costs development time or makes the code harder to understand. This is universally true in all software engineering - as the saying goes, "Premature optimisation is the root of all evil".

If you want to see what assembly your shader will generate, use fxc from the Windows SDK like:
fxc.exe /T ps_5_0 test.hlsl


In this case it is producing these instructions to calculate the offset into the icb:
imul null, r0.x, v0.x, l(11)
mov o0.xyzw, icb[r0.x + 10].xxxx


Compared to accessing it from a structured buffer:

ld_structured_indexable(structured_buffer, stride=44)(mixed,mixed,mixed,mixed) r0.x, v0.x, l(40), t100.xxxx

One extra instruction is unlikely to hurt performance.



static const float param[54][11] = {
...
};

#define pSkyOffset [0]
...
#define pGrassMultiplier [10]

and calling it like this?

float flareMultiplier = param[trackID]pFlareMultiplier
Performance considerations aside, I'd suggest doing this instead - the "static const" will still put this in the immediate constant buffer, but the code is a little nicer than using macros, and no need to specify the array size:

struct param_entry {
float pSkyOffset;
float pSkyScale;
float pSkyGamma;
float pWaterClarity;
float pTreesNewShader;
float pFlareThreshold;
float pFlareMultiplier;
float pFlareBrightness;
float pGrassMin;
float pGrassMax;
float pGrassMultiplier;
};

static const struct param_entry param[] = {
//skyOffset,skyScale,skyGamma,waterClarity,treesNewShader,flareThreshold,flareMultiplier,flareBrightness,grassMin,grassMax,grassMultiplier
{0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0},//unused
...
{0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5},//usa.smelter.wet
};

...

float flareMultiplier = param[trackID].pFlareMultiplier

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/10/2017 12:13 PM   
@dlrjajdlfo the code looks equivalent to me. At a guess you might have hit a decompiler bug unrelated to that.
@dlrjajdlfo the code looks equivalent to me. At a guess you might have hit a decompiler bug unrelated to that.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/10/2017 12:23 PM   
@masterotaku Have you tried StereoFlagsDX10=0x00004000 in the driver profile? It looks like there is a mono render target somewhere in there you will need to fix. Start with the halo fix - I can pretty much guarantee you will need that*, then look through a frame analysis dump to find out where the offset silhouette first appears and add a textureoverride to force the relevant render target to stereo. * you can confirm this by removing the discard instructions from the pixel shader and setting it to output a solid colour - that should be in 3D surrounding where the decal was supposed to be. This one is unusual enough that you should probably confirm this. Then look at the transformations in the pixel shader and see if you can find a way to move it to the correct depth from there. [quote="masterotaku"]OK, here's a screenshot with a normal convergence level: http://u.cubeupload.com/masterotaku/OPBB004.jpg [/quote]That is probably the most weird decal I've seen in an unfixed state. [quote]And here's another about the first VS/PS pair with the corrected "o3.x": http://u.cubeupload.com/masterotaku/OPBB005.jpg [/quote]I'm pretty sure I can see what is going on, but can you repost this one with a higher convergence?
@masterotaku

Have you tried StereoFlagsDX10=0x00004000 in the driver profile?

It looks like there is a mono render target somewhere in there you will need to fix. Start with the halo fix - I can pretty much guarantee you will need that*, then look through a frame analysis dump to find out where the offset silhouette first appears and add a textureoverride to force the relevant render target to stereo.

* you can confirm this by removing the discard instructions from the pixel shader and setting it to output a solid colour - that should be in 3D surrounding where the decal was supposed to be. This one is unusual enough that you should probably confirm this.

Then look at the transformations in the pixel shader and see if you can find a way to move it to the correct depth from there.

masterotaku said:OK, here's a screenshot with a normal convergence level: http://u.cubeupload.com/masterotaku/OPBB004.jpg
That is probably the most weird decal I've seen in an unfixed state.

And here's another about the first VS/PS pair with the corrected "o3.x": http://u.cubeupload.com/masterotaku/OPBB005.jpg
I'm pretty sure I can see what is going on, but can you repost this one with a higher convergence?

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/10/2017 12:38 PM   
  102 / 120    
Scroll To Top