Difference between revisions of "Optimization Guidelines"

From Neos Wiki
Jump to navigation Jump to search
m (Add a citation to the "six cameras" thing)
(Add a note on point lights with shadows)
Line 41: Line 41:
  
 
Lights with shadows are much more expensive than lights without. In deferred shading, shadow-casting meshes still need to be rendered once or more for each shadow-casting light. Furthermore, the lighting shader that applies shadows has a higher rendering overhead than the one used when shadows are disabled. [https://discord.com/channels/402159838827905024/731903736028332083/841486233503137813] [https://discord.com/channels/402159838827905024/731903736028332083/779919783868497971]
 
Lights with shadows are much more expensive than lights without. In deferred shading, shadow-casting meshes still need to be rendered once or more for each shadow-casting light. Furthermore, the lighting shader that applies shadows has a higher rendering overhead than the one used when shadows are disabled. [https://discord.com/channels/402159838827905024/731903736028332083/841486233503137813] [https://discord.com/channels/402159838827905024/731903736028332083/779919783868497971]
 +
 +
Point lights with shadows are very expensive, as they render the surrounding scene six times. If you need shadows try to keep them restrained to a spot or directional light. [https://discord.com/channels/402159838827905024/477711742001086505/704511626317135942]
  
 
=== Culling ===
 
=== Culling ===

Revision as of 22:04, 3 August 2021

Rendering

Blendshapes

Every time a blendshape changes, the vertices have to be retransformed on the entire mesh. [1] If the majority of the mesh is not part of any blendshape, then that performance is wasted. Neos can automatically optimize this with the "Separate parts of mesh unaffected by blendshapes" found under the SkinnedMeshRenderer component. Whether this is worth it or not varies on a case-by-case basis, so you'll have to test your before/after performance while driving blendshapes to be sure.

Materials

Some materials (notably the Fur material) are much more expensive than others. [2]

Alpha/Transparent/Additive/Multiply blend modes count as transparent materials and are slightly more expensive because things behind them have to be rendered and filtered through them. Transparent materials use the forward rendering pipeline, so they don't handle dynamic lights as consistently. Opaque and Cutout blend modes use the deferred rendering pipeline, and handle dynamic lights better.

Texture Dimensions

Square textures with pixel dimensions of a power of two (2, 4, 8, 16, 32, 64, 128, 512, 1024, 2048, 4096, etc) are more efficiently handled in VRAM. [3]

Texture Atlasing

If you have a number of different materials of the same type, consider atlasing (combining multiple textures into one larger texture). [4] Even if different parts of your mesh use different settings, the addition of maps can let you combine many materials into one. Try to avoid large empty spaces in the resulting atlas, as they can waste VRAM. [5]

Places where atlasing doesn't help:

  • If you need a different material all together, e.g. a Fur part of a mostly PBS avatar.
  • If you need part of your avatar to have Alpha blend, but the majority is fine with Opaque or Cutout.

Procedural vs Static Assets

It's more efficient to bake procedural meshes once you no longer need to change their parameters. Procedural meshes and textures are per-world. This is because the procedural asset is duplicated with the item. Static meshes and textures are automatically instanced across worlds so there's only a single copy in memory at all times, and do not need to be saved on the item itself. [6]

GPU Mesh Instancing

If there are multiple instances of the same static mesh/material combination, they will be instanced (on most shaders). This can significantly improve performance when rendering multiple instances of the same object, e.g. having lots of trees in the environment. [7]

Mirrors and Cameras

Mirrors and cameras can be quite expensive, especially at higher resolutions, as they require additional rendering passes. Mirrors are generally more expensive than cameras, as they require two additional passes (one per eye).

The performance of cameras can be improved by using appropriate near/far clip values and using the selective/exclusive render lists. Basically, avoid rendering what you don't need to.

It's good practice to localize mirrors and cameras with a ValueUserOverride so users can opt in if they're willing to sacrifice performance to see them.

Reflection Probes

Baked reflection probes are quite cheap, especially at the default resolution of 128x128. The only real cost is the VRAM used to store the cube map. [8]

Real-time reflection probes are extremely expensive, and are comparable to six cameras. [9]

Lighting

Light impact is proportional to how many pixels a light shines on. This is determined by the size of the visible light volume in the world, regardless of much geometry it affects. Short range or partially occluded lights are therefore cheaper.

Lights with shadows are much more expensive than lights without. In deferred shading, shadow-casting meshes still need to be rendered once or more for each shadow-casting light. Furthermore, the lighting shader that applies shadows has a higher rendering overhead than the one used when shadows are disabled. [10] [11]

Point lights with shadows are very expensive, as they render the surrounding scene six times. If you need shadows try to keep them restrained to a spot or directional light. [12]

Culling

Neos already performs frustum culling, meaning objects outside of your view will not render. [13]

A ColliderUserTracker component can be used to set up manual culling zones, but it is incompatible with NoClip locomotion.

Logix

Less Logix is not always better! It's more important to make your calculations do less work than it is to do them in a smaller space.

Writes and Drivers

Changing the value of a Sync will result in network traffic, as that change to the data model needs to be sent to the other users in the session. ValueUserOverride does not remove this network activity, as the overrides themselves are Syncs. [14]

Exceptions:

  • Drivers compute things locally for every user, and do not cause network traffic
  • "Self Driven values" (A ValueCopy with Writeback and the same source and target) are also locally calculated, even if you use the Write node to change the value.
  • If multiple writes to a value occur in the same update, only the last value will be replicated over the network.

Generally it's cheaper to perform computations locally and avoid network activity, but for more expensive computations it's better to have one user do it and sync the result.

Dynamic Variables and Impulses

Dynamic Variables are extremely efficient and can be used without concern for performance, however creating and destroying Dynamic Variable Spaces can be costly and should be done infrequently. [15]

Dynamic Impulses are also extremely efficient, especially if you target them at a slot close to their receiver. [16]

Frequent Impulses

High frequency updates from the Update node, Fire While True, etc. should be avoided if possible if the action results in network replication. Consider replacing them with Drivers.

Updating Relay Node

Updating Relays can be expensive, as they bypass the normal even-driven nature of Logix and force an evaluation every frame. Note while it may appear you need an updating relay due to a display not updating, often times that problem is specific to the display and is not needed for the finalized Logix. Use of this node should be avoided wherever possible, but sometimes there's no way around it.

Sequence Node

The Sequence node is not bad for performance, but its overuse can lead to poor coding practices. Chaining nodes prevents unexpected errors from propagating, and as Sequence will continue execution even on error it can lead to naive use putting your Logix into a bad state.

Cache Node

Don't worry about using the Cache node, as it is a very specific optimization that Neos will perform automatically in the future. [17]

Slot Count

Slot count and packed Logix nodes don't matter much performance-wise. Loading and saving do take a hit for complex setups for sure but this hit is not minimized by placing the Logix nodes on one slot. Neos still has to load and save the exact same number of components. [18]

Profiling

If you're working on a new item that might be expensive, consider profiling it:

  • The Debug menu you can find in Home tab of the Dash has many helpful timings
  • SteamVR has a "Display Performance Graph" that can show GPU frametimes. This can also be shown in-headset from a toggle in the developer settings (toggle "Advanced Settings" on in the settings menu)