Difference between revisions of "Optimization Guidelines"

Revision as of 19:06, 1 August 2021

Rendering

Blendshapes

Every time a blendshape changes, the vertices have to be retransformed on the entire mesh. If the majority of the mesh is not part of any blendshape, then that performance is wasted. Neos can automatically optimize this with the "Separate parts of mesh unaffected by blendshapes" found under the SkinnedMeshRenderer component. Whether this is worth it or not varies on a case-by-case basis, so you'll have to test your before/after performance while driving blendshapes to be sure.

Materials

Some materials (notably the Fur material) are much more expensive than others.

Alpha blend mode uses the more expensive deferred rendering pipeline, and should be avoided where unnecessary. Opaque and Cutout use the less expensive forward rendering pipeline.

Texture Dimensions

Textures are not required to be square. To avoid wasting VRAM it's better to have a rectangular texture than it is to have a square texture with empty space as padding.

Texture Atlasing

If you have a number of different materials of the same type, consider atlasing (combining multiple textures into one larger texture). Even if different parts of your mesh use different settings, the addition of maps can let you combine many materials into one. Try to avoid large empty spaces in the resulting atlas, as they can waste VRAM.

Places where atlasing doesn't help:

If you need a different material all together, e.g. a Fur part of a mostly PBS avatar.
If you need part of your avatar to have Alpha blend, but the majority is fine with Opaque or Cutout.

Procedural vs Static Assets

It's more efficient to bake procedural meshes once you no longer need to change their parameters. Procedural meshes and textures are per-world. This is because the procedural asset is duplicated with the item. Static meshes and textures are automatically instanced across worlds so there's only a single copy in memory at all times, and do not need to be saved on the item itself.

GPU Mesh Instancing

If there are multiple instances of the same static mesh/material combination, they will be instanced (on most shaders). This can significantly improve performance when rendering multiple instances of the same object, e.g. having lots of trees in the environment.

Mirrors and Cameras

Mirrors and cameras can be quite expensive, especially at higher resolutions, as they require additional rendering passes. Mirrors are generally more expensive than cameras, as they require two additional passes (one per eye).

The performance of cameras can be improved by using appropriate near/far clip values and using the selective/exhaustive render lists. Basically, avoid rendering what you don't need to.

It's good practice to localize mirrors and cameras with a ValueUserOverride so users can opt in if they're willing to sacrifice performance to see them.

Reflection Probes

Baked reflection probes are quite cheap, especially at the default resolution of 128x128. The only real cost is the VRAM used to store the cube map.

Real-time reflection probes are extremely expensive, and are comparable to six cameras.

Lighting

Light impact is proportional to how many pixels a light shines on. This is determined by the size of the visible light volume in the world, regardless of much geometry it affects. Short range or partially occluded lights are therefore cheaper.

Lights with shadows are much more expensive than lights without. In deferred shading, shadow-casting GameObjects still need to be rendered once or more for each shadow-casting light. Furthermore, the lighting shader that applies shadows has a higher rendering overhead than the one used when shadows are disabled.

Logix

Less Logix is not always better!

Writes and Drivers

Changing the value of a Sync will result in network traffic, as that change to the data model needs to be sent to the other users in the session. ValueUserOverride does not remove this network activity, as the overrides themselves are Syncs.

Exceptions:

Drivers compute things locally for every user, and do not cause network traffic
"Self Driven values" (A ValueCopy with Writeback and the same source and target) are also locally calculated, even if you use the Write node to change the value.
If multiple writes to a value occur in the same update, only the last value will be replicated over the network.

Generally it's cheaper to perform computations locally and avoid network activity, but for more expensive computations it's better to have one user do it and sync the result.

Dynamic Variables and Impulses

Dynamic Variables are extremely efficient and can be used without concern for performance, however creating and destroying Dynamic Variable Spaces can be costly and should be done infrequenly.

Dynamic Impulses are also extremely efficient, especially if you target them at a slot close to their receiver.

Frequent Impulses

High freqency updates from the Update node, Fire While True, etc. should be avoided if possible. Consider replacing them with Drivers.

Updating Relay Node

Updating Relays can be expensive, as they bypass the normal even-driven nature of logix and force an evaluation every frame. Note while it may appear you need an updating relay due to a display not updating, often times that problem is specific to the display and is not needed for the finalized Logix. Use of this node should be avoided wherever possible, but sometimes there's no way around it.

Sequence Node

The Sequence node is not bad for performance, but if its overuse can be a sign of poor coding practices. Chaining nodes prevents unexpected errors from propogating, and as Sequence will continue execution even on error it can lead to naive use putting your logix into a bad state.

Cache Node

Avoid using the Cache node, as it is a manual optimization that Neos already performs automatically.

Profiling

If you're working on a new item that might be expensive, consider profiling it:

The Debug menu you can find in Home tab of the Dash has many helpful timings
SteamVR has a "Display Performance Graph" that can show GPU frametimes. This can also be shown in-headset from a toggle in the developer settings (toggle "Advanced Settings" on in the settings menu)

@@ Line 2: / Line 2: @@
 === Blendshapes ===
+Every time a blendshape changes, the vertices have to be retransformed on the entire mesh. If the majority of the mesh is not part of any blendshape, then that performance is wasted. Neos can automatically optimize this with the "Separate parts of mesh unaffected by blendshapes" found under the [[SkinnedMeshRenderer (Component)|SkinnedMeshRenderer component]]. Whether this is worth it or not varies on a case-by-case basis, so you'll have to test your before/after performance while driving blendshapes to be sure.
+=== Materials ===
+Some materials (notably the [[FurMaterial|Fur material]]) are much more expensive than others.
+Alpha blend mode uses the more expensive deferred rendering pipeline, and should be avoided where unnecessary. Opaque and Cutout use the less expensive forward rendering pipeline.
+=== Texture Dimensions ===
+Textures are not required to be square. To avoid wasting VRAM it's better to have a rectangular texture than it is to have a square texture with empty space as padding.
+=== Texture Atlasing ===
+If you have a number of different materials of the same type, consider atlasing (combining multiple textures into one larger texture). Even if different parts of your mesh use different settings, the addition of maps can let you combine many materials into one. Try to avoid large empty spaces in the resulting atlas, as they can waste VRAM.
+Places where atlasing doesn't help:
+* If you need a different material all together, e.g. a [[FurMaterial|Fur]] part of a mostly PBS avatar.
+* If you need part of your avatar to have Alpha blend, but the majority is fine with Opaque or Cutout.
+=== Procedural vs Static Assets ===
+It's more efficient to bake procedural meshes once you no longer need to change their parameters. Procedural meshes and textures are per-world. This is because the procedural asset is duplicated with the item. Static meshes and textures are automatically instanced across worlds so there's only a single copy in memory at all times, and do not need to be saved on the item itself.
+=== GPU Mesh Instancing ===
+If there are multiple instances of the same static mesh/material combination, they will be instanced (on most shaders). This can significantly improve performance when rendering multiple instances of the same object, e.g. having lots of trees in the environment.
+=== Mirrors and Cameras ===
+Mirrors and cameras can be quite expensive, especially at higher resolutions, as they require additional rendering passes. Mirrors are generally more expensive than cameras, as they require two additional passes (one per eye).
+The performance of cameras can be improved by using appropriate near/far clip values and using the selective/exhaustive render lists. Basically, avoid rendering what you don't need to.
+It's good practice to localize mirrors and cameras with a [[ValueUserOverride`1 (Component)|ValueUserOverride]] so users can opt in if they're willing to sacrifice performance to see them.
+=== Reflection Probes ===
+Baked reflection probes are quite cheap, especially at the default resolution of 128x128. The only real cost is the VRAM used to store the cube map.
+Real-time reflection probes are extremely expensive, and are comparable to six cameras.
+=== Lighting ===
+Light impact is proportional to how many pixels a light shines on. This is determined by the size of the visible light volume in the world, regardless of much geometry it affects. Short range or partially occluded lights are therefore cheaper.
+Lights with shadows are much more expensive than lights without. In deferred shading, shadow-casting GameObjects still need to be rendered once or more for each shadow-casting light. Furthermore, the lighting shader that applies shadows has a higher rendering overhead than the one used when shadows are disabled.
 == Logix ==
+Less Logix is not always better!
+=== Writes and Drivers ===
+Changing the value of a Sync will result in network traffic, as that change to the data model needs to be sent to the other users in the session. [[ValueUserOverride`1 (Component)|ValueUserOverride]] does not remove this network activity, as the overrides themselves are Syncs.
+Exceptions:
+* [[Drive|Drivers]] compute things locally for every user, and do not cause network traffic
+* "Self Driven values" (A ValueCopy with Writeback and the same source and target) are also locally calculated, even if you use the [[Write (LogiX node)|Write node]] to change the value.
+* If multiple writes to a value occur in the same update, only the last value will be replicated over the network.
+Generally it's cheaper to perform computations locally and avoid network activity, but for more expensive computations it's better to have one user do it and sync the result.
+=== Dynamic Variables and Impulses ===
+[[Dynamic Variables]] are extremely efficient and can be used without concern for performance, however creating and destroying Dynamic Variable Spaces can be costly and should be done infrequenly.
+[[Dynamic Impulses]] are also extremely efficient, especially if you target them at a slot close to their receiver.
+=== Frequent Impulses ===
+High freqency updates from the [[Update (LogiX node)|Update]] node, [[Fire While True (LogiX node)|Fire While True]], etc. should be avoided if possible. Consider replacing them with [[Drive|Drivers]].
+=== Updating Relay Node ===
+[[Updating Relay (LogiX node)|Updating Relays]] can be expensive, as they bypass the normal even-driven nature of logix and force an evaluation every frame. '''Note while it may appear you need an updating relay due to a display not updating, often times that problem is specific to the display''' and is not needed for the finalized Logix. Use of this node should be avoided wherever possible, but sometimes there's no way around it.
+=== Sequence Node ===
+The [[Sequence (LogiX node)|Sequence node]] is not bad for performance, but if its overuse can be a sign of poor coding practices. Chaining nodes prevents unexpected errors from propogating, and as Sequence will continue execution even on error it can lead to naive use putting your logix into a bad state.
+=== Cache Node ===
+Avoid using the [[Cache (LogiX node)|Cache node]], as it is a manual optimization that Neos already performs automatically.
-<references />
+== Profiling ==
+If you're working on a new item that might be expensive, consider profiling it:
+* The Debug menu you can find in Home tab of the Dash has many helpful timings
+* SteamVR has a "Display Performance Graph" that can show GPU frametimes. This can also be shown in-headset from a toggle in the developer settings (toggle "Advanced Settings" on in the settings menu)