3Dmigoto now open-source...
  104 / 133    
[quote="radicdotkey"]Hey DSS. I've got my include file with configuration working, but because I merged all my 300 shaders 3DM now takes around 10 seconds to reload as I skip the timestamp in my build to make it working. Do you have any brilliant idea in your sleeve maybe of how to modify the code to make 3DM only reload the shaders which have included the modified file?[/quote]Yeah - we need to implement a custom include handler so we can track what files are included by each shader then modify the reload so it also checks the timestamps on all included files. [quote]Update: 3DM allows an external loading of cbuffers as you mentioned before, but does it mean it compiles as an immediate buffer on each shader that links to it? If so it would be awesome to be able to load it in the text format.[/quote]I'm not entirely sure I follow what you mean by that, but I think the answer is no? The immediate constant buffer is something created by the shader compiler - it's not something we can link in later. If you assign a buffer as a constant buffer with 3DMigoto it is bound to a b register in the shader. If you're looking for a way to construct a file suitable to load with 3DMigoto into a constant buffer, just use something like this python script: [code] #!/usr/bin/env python import struct filename = 'test.buf' data = [ [0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0], # unused [0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.day [0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.day.alt [0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.night [0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # usa.smelter.wet ] if __name__ == '__main__': with file(filename, 'wb') as f: for entry in data: f.write(struct.pack('<11f', *entry)) [/code] [code] dss@Gideon ~ $ chmod +x test.py dss@Gideon ~ $ ./test.py dss@Gideon ~ $ hexdump.exe -C test.buf 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000020 00 00 00 00 00 00 00 00 00 00 00 00 cd cc cc 3d |...............=| 00000030 00 00 80 3f cd cc 4c 3f 00 00 00 3f 00 00 80 3f |...?..L?...?...?| 00000040 cd cc 4c 3f 00 00 80 40 00 00 00 40 cd cc 4c 3e |..L?...@...@..L>| 00000050 00 00 40 40 00 00 c0 3f cd cc cc 3d 00 00 80 3f |..@@...?...=...?| 00000060 cd cc 4c 3f 00 00 00 3f 00 00 80 3f cd cc 4c 3f |..L?...?...?..L?| 00000070 00 00 80 40 00 00 00 40 cd cc 4c 3e 00 00 40 40 |...@...@..L>..@@| 00000080 00 00 c0 3f cd cc cc 3d 00 00 80 3f cd cc 4c 3f |...?...=...?..L?| 00000090 00 00 00 3f 00 00 80 3f cd cc 4c 3f 00 00 80 40 |...?...?..L?...@| 000000a0 00 00 00 40 cd cc 4c 3e 00 00 40 40 00 00 c0 3f |...@..L>..@@...?| 000000b0 cd cc cc 3d 00 00 80 3f cd cc 4c 3f 00 00 00 3f |...=...?..L?...?| 000000c0 00 00 80 3f cd cc 4c 3f 00 00 80 40 00 00 00 40 |...?..L?...@...@| 000000d0 cd cc 4c 3e 00 00 40 40 00 00 c0 3f |..L>..@@...?| [/code] [quote="DHR"]But the texture used in the cursor is a bit busted...is the arrow with a square in the right It's a bit streched because the TaB mode. [img]https://forums.geforce.com/cmd/default/download-comment-attachment/73691/[/img] Any ideas what may be wrong? (i only enabled the run = CustomShaderSoftwareMouse and hide = 1)[/quote] I haven't seen anything that looked like that and I'm not sure what would cause it - is that happening in every game? I might need some more details about your environment - what game that is, are you using the 32 or 64bit version of 3DMigoto, resolution, windows version, driver version, mouse cursor scheme ("Windows Default", "Windows Standard" or something else). Is that just supposed to be the regular Windows arrow cursor? Anything else unusual? 4K, DSR? I dunno, I'm just spitballing here. You're using Top + Bottom, so is that with a projector or TV? Is the projector the only active display?
radicdotkey said:Hey DSS. I've got my include file with configuration working, but because I merged all my 300 shaders 3DM now takes around 10 seconds to reload as I skip the timestamp in my build to make it working. Do you have any brilliant idea in your sleeve maybe of how to modify the code to make 3DM only reload the shaders which have included the modified file?
Yeah - we need to implement a custom include handler so we can track what files are included by each shader then modify the reload so it also checks the timestamps on all included files.

Update: 3DM allows an external loading of cbuffers as you mentioned before, but does it mean it compiles as an immediate buffer on each shader that links to it? If so it would be awesome to be able to load it in the text format.
I'm not entirely sure I follow what you mean by that, but I think the answer is no? The immediate constant buffer is something created by the shader compiler - it's not something we can link in later. If you assign a buffer as a constant buffer with 3DMigoto it is bound to a b register in the shader.

If you're looking for a way to construct a file suitable to load with 3DMigoto into a constant buffer, just use something like this python script:

#!/usr/bin/env python

import struct
filename = 'test.buf'

data = [
[0.0, 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0], # unused
[0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.day
[0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.day.alt
[0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # finland.finland.rally.night
[0.1, 1.0 ,0.8 ,0.5 ,1.0 ,0.8 ,4.0 ,2.0 ,0.2 ,3.0 ,1.5], # usa.smelter.wet
]

if __name__ == '__main__':
with file(filename, 'wb') as f:
for entry in data:
f.write(struct.pack('<11f', *entry))


dss@Gideon ~
$ chmod +x test.py

dss@Gideon ~
$ ./test.py

dss@Gideon ~
$ hexdump.exe -C test.buf
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000020 00 00 00 00 00 00 00 00 00 00 00 00 cd cc cc 3d |...............=|
00000030 00 00 80 3f cd cc 4c 3f 00 00 00 3f 00 00 80 3f |...?..L?...?...?|
00000040 cd cc 4c 3f 00 00 80 40 00 00 00 40 cd cc 4c 3e |..L?...@...@..L>|
00000050 00 00 40 40 00 00 c0 3f cd cc cc 3d 00 00 80 3f |..@@...?...=...?|
00000060 cd cc 4c 3f 00 00 00 3f 00 00 80 3f cd cc 4c 3f |..L?...?...?..L?|
00000070 00 00 80 40 00 00 00 40 cd cc 4c 3e 00 00 40 40 |...@...@..L>..@@|
00000080 00 00 c0 3f cd cc cc 3d 00 00 80 3f cd cc 4c 3f |...?...=...?..L?|
00000090 00 00 00 3f 00 00 80 3f cd cc 4c 3f 00 00 80 40 |...?...?..L?...@|
000000a0 00 00 00 40 cd cc 4c 3e 00 00 40 40 00 00 c0 3f |...@..L>..@@...?|
000000b0 cd cc cc 3d 00 00 80 3f cd cc 4c 3f 00 00 00 3f |...=...?..L?...?|
000000c0 00 00 80 3f cd cc 4c 3f 00 00 80 40 00 00 00 40 |...?..L?...@...@|
000000d0 cd cc 4c 3e 00 00 40 40 00 00 c0 3f |..L>..@@...?|


DHR said:But the texture used in the cursor is a bit busted...is the arrow with a square in the right
It's a bit streched because the TaB mode.

Image

Any ideas what may be wrong?
(i only enabled the run = CustomShaderSoftwareMouse and hide = 1)

I haven't seen anything that looked like that and I'm not sure what would cause it - is that happening in every game? I might need some more details about your environment - what game that is, are you using the 32 or 64bit version of 3DMigoto, resolution, windows version, driver version, mouse cursor scheme ("Windows Default", "Windows Standard" or something else). Is that just supposed to be the regular Windows arrow cursor? Anything else unusual? 4K, DSR? I dunno, I'm just spitballing here. You're using Top + Bottom, so is that with a projector or TV? Is the projector the only active display?

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/12/2017 01:22 AM   
i test the sw mouse in 3 games and only happens in 1 of them. Happens in the 32bits version of 3Dmigoto (unity game), 1080p resolution, TaB mode with a 3DTV (only display active), win7, last driver, i use the windows default cursor. not using 4k or dsr..or upscaling. the issue is only visual..the sw mouse works just fine. hope this info helps
i test the sw mouse in 3 games and only happens in 1 of them. Happens in the 32bits version of 3Dmigoto (unity game), 1080p resolution, TaB mode with a 3DTV (only display active), win7, last driver, i use the windows default cursor.
not using 4k or dsr..or upscaling.

the issue is only visual..the sw mouse works just fine.

hope this info helps

MY WEB

Helix Mod - Making 3D Better

My 3D Screenshot Gallery

Like my fixes? you can donate to Paypal: dhr.donation@gmail.com

Posted 09/12/2017 01:47 AM   
What's the game, and what version of Unity is it (right click on the exe -> properties -> details -> file version)?
What's the game, and what version of Unity is it (right click on the exe -> properties -> details -> file version)?

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/12/2017 03:58 AM   
[quote="DarkStarSword"] I'm not entirely sure I follow what you mean by that, but I think the answer is no? The immediate constant buffer is something created by the shader compiler - it's not something we can link in later. If you assign a buffer as a constant buffer with 3DMigoto it is bound to a b register in the shader. If you're looking for a way to construct a file suitable to load with 3DMigoto into a constant buffer, just use something like this python script: [/quote] Don't worry Ian. This morning I realised that I don't follow myself either :) I wrote a simple editor to edit my buf file with configuration parameters and used your pycode to make sure it produces the same result. [img]https://s26.postimg.org/aaaj7rd4p/delme.png[/img]
DarkStarSword said:

I'm not entirely sure I follow what you mean by that, but I think the answer is no? The immediate constant buffer is something created by the shader compiler - it's not something we can link in later. If you assign a buffer as a constant buffer with 3DMigoto it is bound to a b register in the shader.

If you're looking for a way to construct a file suitable to load with 3DMigoto into a constant buffer, just use something like this python script:


Don't worry Ian. This morning I realised that I don't follow myself either :)
I wrote a simple editor to edit my buf file with configuration parameters and used your pycode to make sure it produces the same result.

Image

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/12/2017 08:11 AM   
[quote="DarkStarSword"]What's the game, and what version of Unity is it (right click on the exe -> properties -> details -> file version)?[/quote] Send you a PM.
DarkStarSword said:What's the game, and what version of Unity is it (right click on the exe -> properties -> details -> file version)?

Send you a PM.

MY WEB

Helix Mod - Making 3D Better

My 3D Screenshot Gallery

Like my fixes? you can donate to Paypal: dhr.donation@gmail.com

Posted 09/12/2017 10:43 AM   
Hey, DarkStarSword I've looked at what you did to the light shafts in Dreamfall. It was quite a task from what I read in the video description. I've had a simillar problem in Dirt 3. The shafts are disabled when you run the game in DX11 mode but are working fine when you set forcedx9="true" in Documents\My Games\DiRT3\hardware_settings_config.xml The only sane explanation for this is that they stumbled across similar issues, so they decided to just disable it. Going from DX9 to DX11 in my opinion should be an upgrade, not a downgrade. I was trying to fix it myself a while ago, and got it almost working, but since some shaft shaders are instanced they get wrong data on the vertex shader input depending on the camera angle. So... I was wondering if you would fancy a challenge, as you are now equipped with knowhow needed to cope with that kind of problem. Here is a screenshot I attached to the post on the Codemasters forum, but gues what, no response, not even a single "get lost" [img]http://i.imgur.com/dRi7nxw.jpg[/img] p.s. Sorry for that piss soaked snow. The screenshots are from unmodded game.
Hey, DarkStarSword I've looked at what you did to the light shafts in Dreamfall. It was quite a task from what I read in the video description. I've had a simillar problem in Dirt 3. The shafts are disabled when you run the game in DX11 mode but are working fine when you set forcedx9="true" in Documents\My Games\DiRT3\hardware_settings_config.xml
The only sane explanation for this is that they stumbled across similar issues, so they decided to just disable it. Going from DX9 to DX11 in my opinion should be an upgrade, not a downgrade.

I was trying to fix it myself a while ago, and got it almost working, but since some shaft shaders are instanced they get wrong data on the vertex shader input depending on the camera angle.

So... I was wondering if you would fancy a challenge, as you are now equipped with knowhow needed to cope with that kind of problem.

Here is a screenshot I attached to the post on the Codemasters forum, but gues what, no response, not even a single "get lost"

Image

p.s. Sorry for that piss soaked snow. The screenshots are from unmodded game.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/12/2017 05:44 PM   
[quote="DarkStarSword"]Unless you do it a lot every frame it will most likely be negligible. e.g. I had to do some optimisations in Dreamfall Chapters to regain framerate, and I found that reducing the number of instructions and even stripping our entire matrix multiplies from tonnes of shaders made absolutely no difference (maybe 1fps, but it was in the noise), while switching my lighting pattern from HLSL to assembly regained 5-15fps depending on the location (that had been lost purely due to inefficienceies in the HLSL decompilation and recompilation process), and limiting the number of full buffer copies I performed regained another 5fps, taking me back to within 1fps of the vanilla 3D framerate (volumetric ray marched light shafts still hurt the framerate a little more in fixed 3D than broken 3D due to the extra texture clears and running the relevant shaders in both eyes instead of one).[/quote] Just curious if this performance problem was due to the integer to float conversions? I have worked out a way to avoid these conversions in the Decompiler, but won't be able to get to this for awhile.
DarkStarSword said:Unless you do it a lot every frame it will most likely be negligible. e.g. I had to do some optimisations in Dreamfall Chapters to regain framerate, and I found that reducing the number of instructions and even stripping our entire matrix multiplies from tonnes of shaders made absolutely no difference (maybe 1fps, but it was in the noise), while switching my lighting pattern from HLSL to assembly regained 5-15fps depending on the location (that had been lost purely due to inefficienceies in the HLSL decompilation and recompilation process), and limiting the number of full buffer copies I performed regained another 5fps, taking me back to within 1fps of the vanilla 3D framerate (volumetric ray marched light shafts still hurt the framerate a little more in fixed 3D than broken 3D due to the extra texture clears and running the relevant shaders in both eyes instead of one).

Just curious if this performance problem was due to the integer to float conversions?

I have worked out a way to avoid these conversions in the Decompiler, but won't be able to get to this for awhile.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 09/15/2017 08:22 PM   
[s]That would be my guess, but I did not get as far as confirming that since switching to assembly was a more attractive option.[/s]
That would be my guess, but I did not get as far as confirming that since switching to assembly was a more attractive option.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/15/2017 09:28 PM   
Actually, no that can't be right - I can't see any integer instructions looking at a few of the shaders and they lack headers so the decompiler wouldn't have used integers even if they did. These were the shaders changed if you wanted to investigate. The assembly patterns in this commit aren't quite equivalent to the HLSL patterns (newer commits improved on it further), but the key was that even if I just used unmodified shaders straight out of ShaderCache I saw a difference in performance: https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/commit/931ee9e283cf7ee44bb3822b879ed900431b5b47
Actually, no that can't be right - I can't see any integer instructions looking at a few of the shaders and they lack headers so the decompiler wouldn't have used integers even if they did.

These were the shaders changed if you wanted to investigate. The assembly patterns in this commit aren't quite equivalent to the HLSL patterns (newer commits improved on it further), but the key was that even if I just used unmodified shaders straight out of ShaderCache I saw a difference in performance:


https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/commit/931ee9e283cf7ee44bb3822b879ed900431b5b47

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/15/2017 09:39 PM   
[s]Hmmm... is anyone using the (now deprecated) depth_filter? I was just working on some unrelated code and spotted that it appears to inadvertently depend on hunting being enabled - and it looks like it has been broken this way ever since 1.2.6 when we were making some changes to support toggling hunting mode. While the fix is easy, since I've deprecated it in 1.2.65 and it has been broken for so long I'd rather just remove the feature altogether, especially as no one has complained (but perhaps it was just that no one noticed). Comments?[/s] No, nevermind - it's not broken. We just have two different places we save off the original and I was only looking at one - the other takes care of the non-hunting case.
Hmmm... is anyone using the (now deprecated) depth_filter? I was just working on some unrelated code and spotted that it appears to inadvertently depend on hunting being enabled - and it looks like it has been broken this way ever since 1.2.6 when we were making some changes to support toggling hunting mode.

While the fix is easy, since I've deprecated it in 1.2.65 and it has been broken for so long I'd rather just remove the feature altogether, especially as no one has complained (but perhaps it was just that no one noticed). Comments?


No, nevermind - it's not broken. We just have two different places we save off the original and I was only looking at one - the other takes care of the non-hunting case.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/15/2017 10:09 PM   
[quote="DarkStarSword"]Actually, no that can't be right - I can't see any integer instructions looking at a few of the shaders and they lack headers so the decompiler wouldn't have used integers even if they did. These were the shaders changed if you wanted to investigate. The assembly patterns in this commit aren't quite equivalent to the HLSL patterns (newer commits improved on it further), but the key was that even if I just used unmodified shaders straight out of ShaderCache I saw a difference in performance: https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/commit/931ee9e283cf7ee44bb3822b879ed900431b5b47[/quote]Took a quick look, and there are a couple of shaders that use a two-level nested loop, which [i]might[/i] add up to enough performance hit. e.g. https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/d1d370f028b27b3083eebcaddba762e80f8c9295/ShaderFixes/03f28715f7644b72-ps_replace.txt That all looks pretty good though, and SM 4.0 is pretty well supported by Decompiler, so I can't really explain that massive slowdown. There are int conversions in this one, but it'd still be hard to see how we'd get that big a hit from a few extra instructions, when we are doing heavier duty things like sampling and dot products. Hard to say. My best guess here would be some strange bug the Decompiler introduced that caused the loops to run longer than they should. Could still be related to int conversion on the loop index, I've seen at least one case where the loop never exited because FP accuracy was never '=='. Which caused a TDR. I have Dreamfall Chapters: FC. If I can get a chance I'll try to bisect the ShaderCache to narrow it down. Besides which I want to see your light rays in action. :-> I'm still thinking those int conversions would be a big win to fix in the Decompiler, but it's hard justify spending much time on it when ASM will always give exact results.
DarkStarSword said:Actually, no that can't be right - I can't see any integer instructions looking at a few of the shaders and they lack headers so the decompiler wouldn't have used integers even if they did.

These were the shaders changed if you wanted to investigate. The assembly patterns in this commit aren't quite equivalent to the HLSL patterns (newer commits improved on it further), but the key was that even if I just used unmodified shaders straight out of ShaderCache I saw a difference in performance:

https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/commit/931ee9e283cf7ee44bb3822b879ed900431b5b47
Took a quick look, and there are a couple of shaders that use a two-level nested loop, which might add up to enough performance hit.

e.g. https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/d1d370f028b27b3083eebcaddba762e80f8c9295/ShaderFixes/03f28715f7644b72-ps_replace.txt

That all looks pretty good though, and SM 4.0 is pretty well supported by Decompiler, so I can't really explain that massive slowdown. There are int conversions in this one, but it'd still be hard to see how we'd get that big a hit from a few extra instructions, when we are doing heavier duty things like sampling and dot products.


Hard to say. My best guess here would be some strange bug the Decompiler introduced that caused the loops to run longer than they should. Could still be related to int conversion on the loop index, I've seen at least one case where the loop never exited because FP accuracy was never '=='. Which caused a TDR.

I have Dreamfall Chapters: FC. If I can get a chance I'll try to bisect the ShaderCache to narrow it down. Besides which I want to see your light rays in action. :->

I'm still thinking those int conversions would be a big win to fix in the Decompiler, but it's hard justify spending much time on it when ASM will always give exact results.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 09/16/2017 05:48 AM   
Is object culling something that can be controlled by 3DMigito? I'm aware of a cull switch but I believe it's only for the faces. I'm workin now on increasing the shadowmap distance, and I succesfully scaled the shadowmap, but there are still disappearing objects in a distance based on the camera angle. https://youtu.be/j8UHn9xaTAI Update: maybe i could modify the viewprojection matrix to increase the view frustum? Is that even possible?
Is object culling something that can be controlled by 3DMigito? I'm aware of a cull switch but I believe it's only for the faces. I'm workin now on increasing the shadowmap distance, and I succesfully scaled the shadowmap, but there are still disappearing objects in a distance based on the camera angle.

https://youtu.be/j8UHn9xaTAI

Update: maybe i could modify the viewprojection matrix to increase the view frustum? Is that even possible?

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/17/2017 11:46 AM   
If the game has culled an object entirely 3DMigoto won't see it - we're too late in the rendering pipeline to change that, and you would need an engine specific tweak/mod to do something about it. One exception is that we can potentially do something about occlusion culling, because that works by the game drawing something, then asking the GPU if it drew at least one pixel of the object to decide whether to bother trying to draw it again until the camera has moved, and we can change the result of that query. I don't think that is what is happening from your video, but you never know - do any of the objects flicker on and off as the camera moves (as the game has to draw them every now and then to see if the occlusion result has changed)? There are two ways that I know of that games implement this - CryEngine renders occlusion data to a 256x128 texture then reads that back to the CPU, and we can prevent that from happening with the deny_cpu_read feature (which returns a zeroed buffer to the game instead). To find this type of occlusion query, look for MapType:1 in the frame analysis log (MapType:1 means the CPU is reading *something* from the GPU, it does not by itself confirm that the game is doing occlusion culling. It would also show up if the game was saving a screenshot to disk, for example), e.g. from Lichdom Battlemage: [code] 1.002492 PSSetShader(pPixelShader:0x000000007E48AD38, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=2c3c0257bfdc160f ... 1.002492 VSSetShader(pVertexShader:0x00000000565E9AB8, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=9a19b17d56db5c4d ... 1.002495 OMSetRenderTargets(NumViews:1, ppRenderTargetViews:0x00000000266AF560, pDepthStencilView:0x0000000000000000) 0: view=0x000000005B154E20 resource=0x000000007804A120 hash=49387529 1.002495 RSSetViewports(NumViewports:1, pViewports:0x00000000266AF540) ... 1.002495 Draw(VertexCount:4, StartVertexLocation:54808) 1.002496 CopySubresourceRegion(pDstResource:0x000000007804DA20, DstSubresource:0, DstX:0, DstY:0, DstZ:0, pSrcResource:0x000000007804A120, SrcSubresource:0, pSrcBox:0x0000000000000000) Src: resource=0x000000007804A120 hash=49387529 Dst: resource=0x000000007804DA20 hash=3d7dd24e ... 2.002493 Map(pResource:0x000000007804DA20, Subresource:0, MapType:1, MapFlags:0, pMappedResource:0x00000000266AF880) hash=3d7dd24e 2.002493 Unmap(pResource:0x000000007804DA20, Subresource:0) hash=3d7dd24e [/code] The second is to use an occlusion query. To find these use "analyse_options = log hold" and grab a log that covers multiple frames, then look for a query=occlusion in the log, like this from Arkham Knight: [code] 1.000063 VSSetShader(pVertexShader:0x00000000C9FFC158, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=4641ff3d22219e42 1.000063 PSSetShader(pPixelShader:0x0000000000000000, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=0000000000000000 ... 1.000064 Begin(pAsync:0x00000000CBDBADD0) type=query query=occlusion MiscFlags=0x0 ... 1.000064 DrawIndexed(IndexCount:36, StartIndexLocation:0, BaseVertexLocation:0) 1.000065 End(pAsync:0x00000000CBDBADD0) type=query query=occlusion MiscFlags=0x0 ... 2.000001 GetData(pAsync:0x00000000CBDBADD0, pData:0x000000001BE043D8, DataSize:8, GetDataFlags:1) = 0 type=query query=occlusion MiscFlags=0x0 data: 0800000000000000 [/code] We don't yet have a way to directly interfere with that (for 3D Vision purposes we only need to identify what shaders were used during the query and stereoise them so that the query will work in 3D giving us the same performance benefits that it was intended to do in the first place), but you could maybe do something with the shaders it uses, or we could add something to fake the result of the query. That reminds me - now that I have this awesome swanky frame analysis log and can identify the relevant shaders and render targets I should go back and fix occlusion culling in CryEngine instead of disabling it - I'll probably need stereo2mono and a custom shader to merge the buffers from both eyes, but I believe that it can be done ;-)
If the game has culled an object entirely 3DMigoto won't see it - we're too late in the rendering pipeline to change that, and you would need an engine specific tweak/mod to do something about it.

One exception is that we can potentially do something about occlusion culling, because that works by the game drawing something, then asking the GPU if it drew at least one pixel of the object to decide whether to bother trying to draw it again until the camera has moved, and we can change the result of that query. I don't think that is what is happening from your video, but you never know - do any of the objects flicker on and off as the camera moves (as the game has to draw them every now and then to see if the occlusion result has changed)?

There are two ways that I know of that games implement this - CryEngine renders occlusion data to a 256x128 texture then reads that back to the CPU, and we can prevent that from happening with the deny_cpu_read feature (which returns a zeroed buffer to the game instead). To find this type of occlusion query, look for MapType:1 in the frame analysis log (MapType:1 means the CPU is reading *something* from the GPU, it does not by itself confirm that the game is doing occlusion culling. It would also show up if the game was saving a screenshot to disk, for example), e.g. from Lichdom Battlemage:

1.002492 PSSetShader(pPixelShader:0x000000007E48AD38, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=2c3c0257bfdc160f
...
1.002492 VSSetShader(pVertexShader:0x00000000565E9AB8, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=9a19b17d56db5c4d
...
1.002495 OMSetRenderTargets(NumViews:1, ppRenderTargetViews:0x00000000266AF560, pDepthStencilView:0x0000000000000000)
0: view=0x000000005B154E20 resource=0x000000007804A120 hash=49387529
1.002495 RSSetViewports(NumViewports:1, pViewports:0x00000000266AF540)
...
1.002495 Draw(VertexCount:4, StartVertexLocation:54808)
1.002496 CopySubresourceRegion(pDstResource:0x000000007804DA20, DstSubresource:0, DstX:0, DstY:0, DstZ:0, pSrcResource:0x000000007804A120, SrcSubresource:0, pSrcBox:0x0000000000000000)
Src: resource=0x000000007804A120 hash=49387529
Dst: resource=0x000000007804DA20 hash=3d7dd24e
...
2.002493 Map(pResource:0x000000007804DA20, Subresource:0, MapType:1, MapFlags:0, pMappedResource:0x00000000266AF880) hash=3d7dd24e
2.002493 Unmap(pResource:0x000000007804DA20, Subresource:0) hash=3d7dd24e

The second is to use an occlusion query. To find these use "analyse_options = log hold" and grab a log that covers multiple frames, then look for a query=occlusion in the log, like this from Arkham Knight:

1.000063 VSSetShader(pVertexShader:0x00000000C9FFC158, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=4641ff3d22219e42
1.000063 PSSetShader(pPixelShader:0x0000000000000000, ppClassInstances:0x0000000000000000, NumClassInstances:0) hash=0000000000000000
...
1.000064 Begin(pAsync:0x00000000CBDBADD0) type=query query=occlusion MiscFlags=0x0
...
1.000064 DrawIndexed(IndexCount:36, StartIndexLocation:0, BaseVertexLocation:0)
1.000065 End(pAsync:0x00000000CBDBADD0) type=query query=occlusion MiscFlags=0x0
...
2.000001 GetData(pAsync:0x00000000CBDBADD0, pData:0x000000001BE043D8, DataSize:8, GetDataFlags:1) = 0 type=query query=occlusion MiscFlags=0x0
data: 0800000000000000

We don't yet have a way to directly interfere with that (for 3D Vision purposes we only need to identify what shaders were used during the query and stereoise them so that the query will work in 3D giving us the same performance benefits that it was intended to do in the first place), but you could maybe do something with the shaders it uses, or we could add something to fake the result of the query.

That reminds me - now that I have this awesome swanky frame analysis log and can identify the relevant shaders and render targets I should go back and fix occlusion culling in CryEngine instead of disabling it - I'll probably need stereo2mono and a custom shader to merge the buffers from both eyes, but I believe that it can be done ;-)

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 09/17/2017 07:57 PM   
I found one texture with MapType:1 [code]000012 Map(pResource:0x4077F5B0, Subresource:0, MapType:1, MapFlags:0, pMappedResource:0x0199E5FC) hash=eaddf18d 000012 Unmap(pResource:0x4077F5B0, Subresource:0) hash=eaddf18d[/code] So I've tried the code below, but nothing has changed unfortunately [code][TextureOverrideCulling] hash = eaddf18d deny_cpu_read = 1[/code] I haven't found any line containning the word "query=occlusion" either
I found one texture with MapType:1

000012 Map(pResource:0x4077F5B0, Subresource:0, MapType:1, MapFlags:0, pMappedResource:0x0199E5FC) hash=eaddf18d
000012 Unmap(pResource:0x4077F5B0, Subresource:0) hash=eaddf18d


So I've tried the code below, but nothing has changed unfortunately

[TextureOverrideCulling]
hash = eaddf18d
deny_cpu_read = 1


I haven't found any line containning the word "query=occlusion" either

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/18/2017 02:45 PM   
I've digged through all editable game files, but still couldn't find anything responsible for the object culling. It would be a huge loss as the game looks much nicer with shadows fixed. [img]https://s26.postimg.org/vjz7vvgvt/dirt3shadows.png[/img]
I've digged through all editable game files, but still couldn't find anything responsible for the object culling. It would be a huge loss as the game looks much nicer with shadows fixed.

Image

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 09/20/2017 07:17 PM   
  104 / 133    
Scroll To Top