Performance Improvements in .NET 8 -- Exceptions & Reflection & Primit

人生三境界 · 发表于 2023-11-19 00:16:30

Exceptions

在 .NET 6 中，ArgumentNullException 增加了一个 ThrowIfNull 方法，我们开始尝试提供“抛出助手”。该方法的目的是简洁地表达正在验证的约束，让系统在未满足约束时抛出一致的异常，同时也优化了成功和99.999%的情况，无需抛出异常。该方法的结构是这样的，执行检查的快速路径被内联，尽可能少的工作在该路径上，然后其他所有的事情都被委托给一个执行实际抛出的方法（JIT 不会内联这个抛出方法，因为它会看到该方法的实现总是抛出异常）。

public static void ThrowIfNull(
[NotNull] object? argument,
[CallerArgumentExpression(nameof(argument))] string? paramName = null)
{
if (argument is null)
Throw(paramName);
}
[DoesNotReturn]
internal static void Throw(string? paramName) => throw new ArgumentNullException(paramName);

复制代码

在 .NET 7 中，ArgumentNullException.ThrowIfNull 增加了另一个重载，这次是针对指针，还引入了两个新方法：ArgumentException.ThrowIfNullOrEmpty 用于字符串，和 ObjectDisposedException.ThrowIf。
现在在 .NET 8 中，添加了一大批新的助手方法。多亏了 dotnet/runtime#86007，ArgumentExc

public static void ThrowIfNullOrWhiteSpace([NotNull] string? argument, [CallerArgumentExpression(nameof(argument))] string? paramName = null);

复制代码

多亏了 @hrrrrustic 的 dotnet/runtime#78222 和 dotnet/runtime#83853，ArgumentOutOfRangeException 增加了 9 个新方法：

public static void ThrowIfEqual<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : System.IEquatable<T>?;
public static void ThrowIfNotEqual<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : System.IEquatable<T>?;
public static void ThrowIfLessThan<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>;
public static void ThrowIfLessThanOrEqual<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>;
public static void ThrowIfGreaterThan<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>;
public static void ThrowIfGreaterThanOrEqual<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>;
public static void ThrowIfNegative<T>(T value, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : INumberBase<T>;
public static void ThrowIfZero<T>(T value, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : INumberBase<T>;
public static void ThrowIfNegativeOrZero<T>(T value, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : INumberBase<T>;

复制代码

这些 PR 在一些地方使用了这些新方法，然后 dotnet/runtime#79460，dotnet/runtime#80355，dotnet/runtime#82357，dotnet/runtime#82533，和 dotnet/runtime#85858 在核心库中更广泛地推出了它们的使用。为了了解这些方法的实用性，以下是我写这段文字时，每个方法在 dotnet/runtime 的核心库的 src 中被调用的次数：
方法计数ANE.ThrowIfNull(object)4795AOORE.ThrowIfNegative873AE.ThrowIfNullOrEmpty311ODE.ThrowIf237AOORE.ThrowIfGreaterThan223AOORE.ThrowIfNegativeOrZero100AOORE.ThrowIfLessThan89ANE.ThrowIfNull(void*)55AOORE.ThrowIfGreaterThanOrEqual39AE.ThrowIfNullOrWhiteSpace32AOORE.ThrowIfLessThanOrEqual20AOORE.ThrowIfNotEqual13AOORE.ThrowIfZero5AOORE.ThrowIfEqual3这些新方法也在抛出部分做了更多的工作（例如，用无效的参数格式化异常消息），这有助于更好地说明将所有这些工作移出到一个单独的方法的好处。例如，这是直接从 System.Private.CoreLib 复制的 ThrowIfGreaterThan：

public static void ThrowIfGreaterThan<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>
{
if (value.CompareTo(other) > 0)
ThrowGreater(value, other, paramName);
}
private static void ThrowGreater<T>(T value, T other, string? paramName) =>
throw new ArgumentOutOfRangeException(paramName, value, SR.Format(SR.ArgumentOutOfRange_Generic_MustBeLessOrEqual, paramName, value, other));

复制代码

这里有一个基准测试，显示了如果抛出表达式直接作为 ThrowIfGreaterThan 的一部分，消耗会是什么样子：

// dotnet run -c Release -f net8.0 --filter "*"
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD", "value1", "value2")]
[DisassemblyDiagnoser]
public class Tests
{
[Benchmark(Baseline = true)]
[Arguments(1, 2)]
public void WithOutline(int value1, int value2)
{
ArgumentOutOfRangeException.ThrowIfGreaterThan(value1, 100);
ArgumentOutOfRangeException.ThrowIfGreaterThan(value2, 200);
}
[Benchmark]
[Arguments(1, 2)]
public void WithInline(int value1, int value2)
{
ThrowIfGreaterThan(value1, 100);
ThrowIfGreaterThan(value2, 200);
}
public static void ThrowIfGreaterThan<T>(T value, T other, [CallerArgumentExpression(nameof(value))] string? paramName = null) where T : IComparable<T>
{
if (value.CompareTo(other) > 0)
throw new ArgumentOutOfRangeException(paramName, value, SR.Format(SR.ArgumentOutOfRange_Generic_MustBeLessOrEqual, paramName, value, other));
}
internal static class SR
{
public static string Format(string format, object arg0, object arg1, object arg2) => string.Format(format, arg0, arg1, arg2);
internal static string ArgumentOutOfRange_Generic_MustBeLessOrEqual => GetResourceString("ArgumentOutOfRange_Generic_MustBeLessOrEqual");
[MethodImpl(MethodImplOptions.NoInlining)]
static string GetResourceString(string resourceKey) => "{0} ('{1}') must be less than or equal to '{2}'.";
}
}

复制代码

方法平均值比率代码大小WithOutline0.4839 ns1.00118 BWithInline2.4976 ns5.16235 B生成的汇编代码中，最相关的亮点来自 WithInline 情况：

; Tests.WithInline(Int32, Int32)
push rbx
sub rsp,20
mov ebx,r8d
mov ecx,edx
mov edx,64
mov r8,1F5815EA8F8
call qword ptr [7FF99C03DEA8]; Tests.ThrowIfGreaterThan[[System.Int32, System.Private.CoreLib]](Int32, Int32, System.String)
mov ecx,ebx
mov edx,0C8
mov r8,1F5815EA920
add rsp,20
pop rbx
jmp qword ptr [7FF99C03DEA8]; Tests.ThrowIfGreaterThan[[System.Int32, System.Private.CoreLib]](Int32, Int32, System.String)
; Total bytes of code 59

复制代码

因为 ThrowIfGreaterThan 方法中有更多的杂项，系统决定不将其内联，所以即使值在范围内，我们也会有两个方法调用（第一个是调用，第二个是 jmp，因为这个方法中没有后续的工作需要返回控制流）。
为了更容易地推广这些助手的使用，dotnet/roslyn-analyzers#6293 添加了新的分析器，用于查找可以由 ArgumentNullException、ArgumentException、ArgumentOutOfRangeException 或 ObjectDisposedException 上的 throw helper 方法替换的参数验证。dotnet/runtime#80149 为 dotnet/runtime 启用了分析器，并修复了许多调用站点。

Reflection 反射

在 .NET 8 的反射堆栈中，有各种各样的改进，主要围绕减少分配和缓存信息，以便后续访问更快。例如，dotnet/runtime#87902 调整了 GetCustomAttributes 中的一些代码，以避免分配一个object[1]数组来设置属性的值。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
[Benchmark]
public object[] GetCustomAttributes() => typeof(C).GetCustomAttributes(typeof(MyAttribute), inherit: true);
[My(Value1 = 1, Value2 = 2)]
class C { }
[AttributeUsage(AttributeTargets.All)]
public class MyAttribute : Attribute
{
public int Value1 { get; set; }
public int Value2 { get; set; }
}
}

复制代码

方法运行时平均值比率分配分配比率GetCustomAttributes.NET 7.01,287.1 ns1.00296 B1.00GetCustomAttributes.NET 8.0994.0 ns0.77232 B0.78像 dotnet/runtime#76574，dotnet/runtime#81059，和 dotnet/runtime#86657 这样的其他改变也减少了反射堆栈中的分配，特别是通过更自由地使用 spans。而来自 @lateapexearlyspeed 的 dotnet/runtime#78288 改进了 Type 上泛型信息的处理，从而提升了各种与泛型相关的成员，特别是对于 GetGenericTypeDefinition，其结果现在被缓存在 Type 对象上。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly Type _type = typeof(List<int>);
[Benchmark] public Type GetGenericTypeDefinition() => _type.GetGenericTypeDefinition();
}

复制代码

方法运行时平均值比率GetGenericTypeDefinition.NET 7.047.426 ns1.00GetGenericTypeDefinition.NET 8.03.289 ns0.07然而，在 .NET 8 中，反射性能的最大影响来自 dotnet/runtime#88415。这是在 .NET 7 中改进 MethodBase.Invoke 性能的工作的延续。当你在编译时知道你想通过反射调用的目标方法的签名时，你可以通过使用 CreateDelegate 来获取和缓存该方法的委托，然后通过该委托执行所有调用，从而实现最佳性能。然而，如果你在编译时不知道签名，你需要依赖更动态的方法，如 MethodBase.Invoke，这在历史上一直更耗时。一些高级的开发者使用 emit 避免这种开销，这也是 .NET 7 中采取的优化方法之一。现在在 .NET 8 中，为许多这样的情况生成的代码已经改进；以前，emitter 总是生成可以容纳 ref/out 参数的代码，但许多方法不提供这样的参数，当不需要考虑这些因素时，生成的代码可以更高效。

// If you have .NET 6 installed, you can update the csproj to include a net6.0 in the target frameworks, and then run:
// dotnet run -c Release -f net6.0 --filter "*" --runtimes net6.0 net7.0 net8.0
// Otherwise, you can run:
// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Reflection;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private MethodInfo _method0, _method1, _method2, _method3;
private readonly object[] _args1 = new object[] { 1 };
private readonly object[] _args2 = new object[] { 2, 3 };
private readonly object[] _args3 = new object[] { 4, 5, 6 };
[GlobalSetup]
public void Setup()
{
_method0 = typeof(Tests).GetMethod("MyMethod0", BindingFlags.NonPublic | BindingFlags.Static);
_method1 = typeof(Tests).GetMethod("MyMethod1", BindingFlags.NonPublic | BindingFlags.Static);
_method2 = typeof(Tests).GetMethod("MyMethod2", BindingFlags.NonPublic | BindingFlags.Static);
_method3 = typeof(Tests).GetMethod("MyMethod3", BindingFlags.NonPublic | BindingFlags.Static);
}
[Benchmark] public void Method0() => _method0.Invoke(null, null);
[Benchmark] public void Method1() => _method1.Invoke(null, _args1);
[Benchmark] public void Method2() => _method2.Invoke(null, _args2);
[Benchmark] public void Method3() => _method3.Invoke(null, _args3);
private static void MyMethod0() { }
private static void MyMethod1(int arg1) { }
private static void MyMethod2(int arg1, int arg2) { }
private static void MyMethod3(int arg1, int arg2, int arg3) { }
}

复制代码

方法运行时平均值比率Method0.NET 6.091.457 ns1.00Method0.NET 7.07.205 ns0.08Method0.NET 8.05.719 ns0.06Method1.NET 6.0132.832 ns1.00Method1.NET 7.026.151 ns0.20Method1.NET 8.021.602 ns0.16Method2.NET 6.0172.224 ns1.00Method2.NET 7.037.937 ns0.22Method2.NET 8.026.951 ns0.16Method3.NET 6.0211.247 ns1.00Method3.NET 7.042.988 ns0.20Method3.NET 8.034.112 ns0.16然而，这里每次调用都涉及到一些开销，并且每次调用都会重复。如果我们可以提前提取这些工作，一次性完成，并进行缓存，我们可以实现更好的性能。这正是新的 MethodInvoker 和 ConstructorInvoker 类型在 dotnet/runtime#88415 中实现的功能。这些并没有包含所有 MethodBase.Invoke 处理的不常见错误（如特别识别和处理 Type.Missing），但对于其他所有情况，它为优化在构建时未知签名的方法的重复调用提供了一个很好的解决方案。

// dotnet run -c Release -f net8.0 --filter "*"
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Reflection;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly object _arg0 = 4, _arg1 = 5, _arg2 = 6;
private readonly object[] _args3 = new object[] { 4, 5, 6 };
private MethodInfo _method3;
private MethodInvoker _method3Invoker;
[GlobalSetup]
public void Setup()
{
_method3 = typeof(Tests).GetMethod("MyMethod3", BindingFlags.NonPublic | BindingFlags.Static);
_method3Invoker = MethodInvoker.Create(_method3);
}
[Benchmark(Baseline = true)]
public void MethodBaseInvoke() => _method3.Invoke(null, _args3);
[Benchmark]
public void MethodInvokerInvoke() => _method3Invoker.Invoke(null, _arg0, _arg1, _arg2);
private static void MyMethod3(int arg1, int arg2, int arg3) { }
}

复制代码

方法平均值比率MethodBaseInvoke32.42 ns1.00MethodInvokerInvoke11.47 ns0.35根据 dotnet/runtime#90119，这些类型然后被 Microsoft.Extensions.DependencyInjection.Abstractions 中的 ActivatorUtilities.CreateFactory 方法使用，以进一步提高 DI 服务构建性能。dotnet/runtime#91881 通过添加额外的缓存层进一步改进，进一步避免每次构建时的反射。
Primitives 基础类型

令人难以置信的是，经过二十年，我们仍然有机会改进 .NET 的核心基元类型，然而我们就在这里。其中一些来自于驱动优化进入不同地方的新场景；一些来自于基于新支持的新机会，使得可以采用不同的方法来解决同一个问题；一些来自于新的研究，突出了解决问题的新方法；还有一些简单地来自于许多新的眼睛看一个磨损的空间（好开源！）无论原因如何，在 .NET 8 中这里有很多值得兴奋的地方。
枚举

让我们从枚举开始。枚举显然自从 .NET 的早期就开始存在，并且被广泛使用。尽管枚举的功能和实现已经演变，也获得了新的 API，但核心在于，数据如何存储在枚举中多年来基本上保持不变。在 .NET Framework 的实现中，有一个内部的 ValuesAndNames 类，它存储一个 ulong[] 和一个 string[]，在 .NET 7 中，有一个 EnumInfo 用于同样的目的。那个 string[] 包含所有枚举值的名称，ulong[] 存储它们的数字对应项。它是一个 ulong[]，以容纳 Enum 可以是的所有可能的底层类型，包括 C# 支持的（sbyte，byte，short，ushort，int，uint，long，ulong）和运行时额外支持的（nint，nuint，char，float，double），尽管实际上没有人使用这些（部分 bool 支持也曾经在这个列表上，但在 .NET 8 中在 dotnet/runtime#79962 中被 @pedrobsaila 删除）。
顺便说一句，作为所有这些工作的一部分，我们检查了广泛的适当许可的 NuGet 包，寻找它们使用枚举的最常见的底层类型。在找到的大约 163 百万个枚举中，这是它们底层类型的分布。结果可能并不令人惊讶，考虑到 Enum 的默认底层类型，但它仍然很有趣：

枚举底层类型的常见程度的图表
在枚举如何存储其数据的设计中有几个问题。每个操作都在这些 ulong[] 值和特定枚举使用的实际类型之间进行转换，而且数组通常比需要的大两倍（int 是枚举的默认底层类型，并且，如上图所示，迄今为止最常使用）。这种方法还导致处理所有近年来添加到 Enum 中的新泛型方法时，会产生大量的汇编代码膨胀。枚举是结构体，当结构体被用作泛型类型参数时，JIT 为该值类型专门化代码（而对于引用类型，它发出一个由所有这些类型使用的单一共享实现）。这种专门化对于吞吐量来说是很好的，但这意味着你得到了它用于的每个值类型的代码副本；如果你有很多代码（例如 Enum 格式化）和很多可能被替换的类型（例如每个声明的枚举类型），那么代码大小可能会大幅增加。
为了解决所有这些问题，现代化实现，并使各种操作更快，dotnet/runtime#78580 重写了 Enum。它不再使用一个非泛型的 EnumInfo 来存储所有值的 ulong[] 数组，而是引入了一个泛型的 EnumInfo 来存储 TUnderlyingValue[]。然后，基于枚举的类型，每个泛型和非泛型的 Enum 方法都会查找底层的 TUnderlyingType，并调用一个带有该 TUnderlyingType 但不带有枚举类型的泛型类型参数的泛型方法，例如 Enum.IsDefined(...) 和 Enum.IsDefined(typeof(TEnum), ...) 都会查找 TEnum 的 TUnderlyingValue，并调用内部的 Enum.IsDefinedPrimitive(typeof(TEnum))。这样，实现存储了一个强类型的 TUnderlyingValue[] 值，而不是存储最坏情况的 ulong[]，并且所有的实现都在泛型和非泛型入口点之间共享，而不需要为每个 TEnum 进行完全的泛型专门化：最坏的情况，我们最终得到的是每个底层类型的一个泛型专门化，其中只有前面引用的 8 个在 C# 中可以表示。泛型入口点能够非常有效地进行映射，这要归功于 @MichalPetryka 的 dotnet/runtime#71685，它使 typeof(TEnum).IsEnum 成为 JIT 内置（这样它实际上就成为一个常量），而非泛型入口点使用在各种方法中已经被使用的 TypeCode/CorElementType 的切换。
Enum 也进行了其他改进。dotnet/runtime#76162 提高了各种方法（如 ToString 和 IsDefined）的性能，在所有枚举的定义值从 0 开始连续的情况下。在这种常见情况下，查找 EnumInfo 中的值的内部函数可以通过简单的数组访问来完成，而不需要搜索目标。
所有这些更改的最终结果是一些非常好的性能提升：

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
private readonly DayOfWeek _dow = DayOfWeek.Saturday;
[Benchmark] public bool IsDefined() => Enum.IsDefined(_dow);
[Benchmark] public string GetName() => Enum.GetName(_dow);
[Benchmark] public string[] GetNames() => Enum.GetNames<DayOfWeek>();
[Benchmark] public DayOfWeek[] GetValues() => Enum.GetValues<DayOfWeek>();
[Benchmark] public Array GetUnderlyingValues() => Enum.GetValuesAsUnderlyingType<DayOfWeek>();
[Benchmark] public string EnumToString() => _dow.ToString();
[Benchmark] public bool TryParse() => Enum.TryParse<DayOfWeek>("Saturday", out _);
}

复制代码

方法运行时平均值比率分配分配比率IsDefined.NET 7.020.021 ns1.00-NAIsDefined.NET 8.02.502 ns0.12-NAGetName.NET 7.024.563 ns1.00-NAGetName.NET 8.03.648 ns0.15-NAGetNames.NET 7.037.138 ns1.0080 B1.00GetNames.NET 8.022.688 ns0.6180 B1.00GetValues.NET 7.0694.356 ns1.00224 B1.00GetValues.NET 8.039.406 ns0.0656 B0.25GetUnderlyingValues.NET 7.041.012 ns1.0056 B1.00GetUnderlyingValues.NET 8.017.249 ns0.4256 B1.00EnumToString.NET 7.032.842 ns1.0024 B1.00EnumToString.NET 8.014.620 ns0.4424 B1.00TryParse.NET 7.049.121 ns1.00-NATryParse.NET 8.030.394 ns0.62-NA然而，这些更改也使枚举与字符串插值更加融洽。
首先，枚举现在具有一个新的静态 TryFormat 方法，可以直接将枚举的字符串表示格式化为 Span：

public static bool TryFormat<TEnum>(TEnum value, Span<char> destination,
out int charsWritten,
[StringSyntax(StringSyntaxAttribute.EnumFormat)] ReadOnlySpan<char> format = default)
where TEnum : struct, Enum

复制代码

第二，枚举现在实现了 ISpanFormattable，因此任何使用值 ISpanFormattable.TryFormat 方法的代码现在也可以在枚举上使用。然而，尽管枚举是值类型，但它们在引用类型 Enum 派生，这意味着调用实例方法（如 ToString 或 ISpanFormattable.TryFormat）时，会将枚举值进行装箱。
所以，第三，System.Private.CoreLib 中的各种插值字符串处理程序已更新为特殊处理 typeof(T).IsEnum，如前所述，现在由于即时编译（JIT）优化，这个操作实际上是开销为0。直接使用 Enum.TryFormat 以避免装箱。我们可以通过运行以下基准测试来查看这种情况的影响：

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
private readonly char[] _dest = new char[100];
private readonly FileAttributes _attr = FileAttributes.Hidden | FileAttributes.ReadOnly;
[Benchmark]
public bool Interpolate() => _dest.AsSpan().TryWrite($"Attrs: {_attr}", out int charsWritten);
}

复制代码

方法运行时平均值比率分配分配比率Interpolate.NET 7.081.58 ns1.0080 B1.00Interpolate.NET 8.034.41 ns0.42-0.00Numbers

这样的格式化改进并不仅仅局限于枚举。在 .NET 8 中，数字格式化的性能也获得了一组不错的改进。Daniel Lemire 有一篇 2021 年的博客文章，讨论了各种计算整数中数字位数的方法。数字位数与数字格式化密切相关，因为我们需要知道数字将占用多少个字符，以便分配合适长度的字符串进行格式化，或确保目标缓冲区具有足够的长度。dotnet/runtime#76519 将在 .NET 的数字格式化内部实现这一点，为计算格式化值中的数字位数提供了一种无分支、基于表的查找解决方案。
dotnet/runtime#76726 进一步提高了性能，它使用了其他格式化库使用的技巧。格式化十进制数中最昂贵的部分之一是除以 10 来获取每个数字；如果我们可以减少除法的数量，我们就可以减少整个格式化操作的总体开销。这里的技巧是，我们不是为数字中的每个数字除以 10，而是为数字中的每对数字除以 100，然后有一个预先计算的查找表，用于所有 0 到 99 的值的基于字符的表示。这让我们可以将除法的数量减半。
dotnet/runtime#79061 还扩展了 .NET 中已经存在的一个先前的优化。格式化代码包含了一个预先计算的单个数字字符串的表，所以如果你要求等效于 0.ToString()，实现不需要分配一个新的字符串，它只需要从表中获取 "0" 并返回。这个 PR 将这个缓存从单个数字扩展到所有 0 到 299 的数字（它也使缓存变得懒惰，这样我们不需要为从未使用的值的字符串付费）。选择 299 有些随意，如果需要的话，将来可以提高，但在检查各种服务的数据时，这解决了来自数字格式化的大部分分配。巧合的是，它也包括了 HTTP 协议的所有成功状态代码。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
[Benchmark]
[Arguments(12)]
[Arguments(123)]
[Arguments(1_234_567_890)]
public string Int32ToString(int i) => i.ToString();
}

复制代码

方法运行时i平均值比率分配分配比率Int32ToString.NET 7.01216.253 ns1.0032 B1.00Int32ToString.NET 8.0121.985 ns0.12-0.00Int32ToString.NET 7.012318.056 ns1.0032 B1.00Int32ToString.NET 8.01231.971 ns0.11-0.00Int32ToString.NET 7.0123456789026.964 ns1.0048 B1.00Int32ToString.NET 8.0123456789017.082 ns0.6348 B1.00在 .NET 8 中，数字还获得了作为二进制格式化（通过 dotnet/runtime#84889）和从二进制解析（通过 dotnet/runtime#84998）的能力，通过新的“b”指定符。例如：

// dotnet run -f net8.0
int i = 12345;
Console.WriteLine(i.ToString("x16")); // 16 hex digits
Console.WriteLine(i.ToString("b16")); // 16 binary digits

复制代码

outputs:

0000000000003039
0011000000111001

复制代码

然后，该实现被用来重新实现现有的 Convert.ToString(int value, int toBase) 方法，使其也现在被优化：

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly int _value = 12345;
[Benchmark]
public string ConvertBinary() => Convert.ToString(_value, 2);
}

复制代码

方法运行时平均值比率ConvertBinary.NET 7.0104.73 ns1.00ConvertBinary.NET 8.023.76 ns0.23在对基本类型（数字和其他）的重大增加中，.NET 8 还引入了新的 IUtf8SpanFormattable 接口。ISpanFormattable 在 .NET 6 中引入，以及许多类型上的 TryFormat 方法，使这些类型能够直接格式化到 Span：

public interface ISpanFormattable : IFormattable
{
bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}

复制代码

现在在 .NET 8 中，我们也有了 IUtf8SpanFormattable 接口：

public interface IUtf8SpanFormattable
{
bool TryFormat(Span<byte> utf8Destination, out int bytesWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}

复制代码

这使得类型可以直接格式化到 Span。这些在设计上几乎是相同的，关键的区别在于这些接口的实现是写出 UTF16 字符还是 UTF8 字节。通过 dotnet/runtime#84587 和 dotnet/runtime#84841，System.Private.CoreLib 中的所有数值基元都实现了新的接口并公开了一个公共的 TryFormat 方法。所以，例如，ulong 暴露了这些：

public bool TryFormat(Span<char> destination, out int charsWritten, [StringSyntax(StringSyntaxAttribute.NumericFormat)] ReadOnlySpan<char> format = default, IFormatProvider? provider = null);
public bool TryFormat(Span<byte> utf8Destination, out int bytesWritten, [StringSyntax(StringSyntaxAttribute.NumericFormat)] ReadOnlySpan<char> format = default, IFormatProvider? provider = null);

复制代码

它们具有完全相同的功能，支持完全相同的格式字符串，具有相同的一般性能特性，等等，只是在写出 UTF16 或 UTF8 上有所不同。我怎么能这么确定它们是如此相似呢？因为，鼓声，它们共享相同的实现。多亏了泛型，上面的两个方法都委托给了完全相同的帮助器：

public static bool TryFormatUInt64<TChar>(ulong value, ReadOnlySpan<char> format, IFormatProvider? provider, Span<TChar> destination, out int charsWritten)

复制代码

只是其中一个 TChar 为 char，另一个为 byte。所以，当我们运行像这样的基准测试时：

// dotnet run -c Release -f net8.0 --filter "*"
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly ulong _value = 12345678901234567890;
private readonly char[] _chars = new char[20];
private readonly byte[] _bytes = new byte[20];
[Benchmark] public void FormatUTF16() => _value.TryFormat(_chars, out _);
[Benchmark] public void FormatUTF8() => _value.TryFormat(_bytes, out _);
}

复制代码

我们得到的结果几乎是完全相同的：
方法平均值FormatUTF1612.10 nsFormatUTF812.96 ns现在，基本类型本身能够以完全保真的 UTF8 格式化，Utf8Formatter 类在很大程度上变得过时了。实际上，前面提到的 PR 也删除了 Utf8Formatter 的实现，并将其重新定位在基本类型的相同格式化逻辑之上。所有之前引用的数字格式化的性能改进不仅适用于 ToString 和 TryFormat 的 UTF16，不仅适用于 UTF8 的 TryFormat，而且还适用于 Utf8Formatter（此外，删除重复的代码和减少维护负担让我感到兴奋）。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers.Text;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly byte[] _bytes = new byte[10];
[Benchmark]
[Arguments(123)]
[Arguments(1234567890)]
public bool Utf8FormatterTryFormat(int i) => Utf8Formatter.TryFormat(i, _bytes, out int bytesWritten);
}

复制代码

方法运行时i平均值比率Utf8FormatterTryFormat.NET 7.01238.849 ns1.00Utf8FormatterTryFormat.NET 8.01234.645 ns0.53Utf8FormatterTryFormat.NET 7.0123456789015.844 ns1.00Utf8FormatterTryFormat.NET 8.012345678907.174 ns0.45不仅所有这些类型都直接支持 UTF8 格式化，而且还支持解析。dotnet/runtime#86875 添加了新的 IUtf8SpanParsable 接口，并在原始数值类型上实现了它。就像其格式化对应项一样，这为 UTF8 提供了与 IParsable 相同的行为，而不是 UTF16。就像其格式化对应项一样，所有的解析逻辑都在两种模式之间的通用例程中共享。实际上，这不仅在 UTF16 和 UTF8 解析之间共享逻辑，而且紧随 dotnet/runtime#84582 的步伐，该 PR 使用相同的通用技巧来去重所有原始类型的解析逻辑，这样相同的通用例程最终被用于所有类型和 UTF8 和 UTF16。该 PR 从 System.Private.CoreLib 中删除了近 2000 行代码：

DateTime

解析和格式化在其他类型上也得到了改进。以 DateTime 和 DateTimeOffset 为例。dotnet/runtime#84963 改进了 DateTime{Offset} 格式化的各种方面：

格式化逻辑具有通用支持作为后备，并支持任何自定义格式，但然后有专用的例程用于最流行的格式，允许它们被优化和调整。对于非常流行的 "r"（RFC1123 模式）和 "o"（往返日期/时间模式）格式，已经存在专用的例程；此 PR 为默认格式（"G"）添加了专用例程，当与不变文化一起使用时，"s" 格式（可排序的日期/时间模式），和 "u" 格式（通用可排序的日期/时间模式），所有这些在各种领域中都经常使用。
对于 "U" 格式（通用完整日期/时间模式），实现最终总是会分配新的 DateTimeFormatInfo 和 GregorianCalendar 实例，即使只在罕见的后备情况下需要，也会导致大量的分配。这修复了它，只有在真正需要时才分配。
当没有专用的格式化例程时，格式化是在一个名为 ValueListBuilder 的内部 ref 结构中完成的，该结构以提供的 span 缓冲区开始（通常从 stackalloc 中构建），然后根据需要使用 ArrayPool 内存增长。格式化完成后，该构建器要么被复制到目标 span，要么被复制到新的字符串，这取决于触发格式化的方法。然而，如果我们只是用目标 span 构建器，我们可以避免对目标 span 的复制。然后，如果构建器在格式化完成时仍包含初始 span（没有超出它的增长），我们知道所有数据都适合，我们可以跳过复制，因为所有数据已经在那里。
以下是一些示例影响：

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
private readonly DateTime _dt = new DateTime(2023, 9, 1, 12, 34, 56);
private readonly char[] _chars = new char[100];
[Params(null, "s", "u", "U", "G")]
public string Format { get; set; }
[Benchmark] public string DT_ToString() => _dt.ToString(Format);
[Benchmark] public string DT_ToStringInvariant() => _dt.ToString(Format, CultureInfo.InvariantCulture);
[Benchmark] public bool DT_TryFormat() => _dt.TryFormat(_chars, out _, Format);
[Benchmark] public bool DT_TryFormatInvariant() => _dt.TryFormat(_chars, out _, Format, CultureInfo.InvariantCulture);
}

复制代码

方法运行时格式平均值比率分配分配比率DT_ToString.NET 7.0?166.64 ns1.0064 B1.00DT_ToString.NET 8.0?102.45 ns0.6264 B1.00DT_ToStringInvariant.NET 7.0?161.94 ns1.0064 B1.00DT_ToStringInvariant.NET 8.0?28.74 ns0.1864 B1.00DT_TryFormat.NET 7.0?151.52 ns1.00–NADT_TryFormat.NET 8.0?78.57 ns0.52–NADT_TryFormatInvariant.NET 7.0?140.35 ns1.00–NADT_TryFormatInvariant.NET 8.0?18.26 ns0.13–NADT_ToString.NET 7.0G162.86 ns1.0064 B1.00DT_ToString.NET 8.0G109.49 ns0.6864 B1.00DT_ToStringInvariant.NET 7.0G162.20 ns1.0064 B1.00DT_ToStringInvariant.NET 8.0G102.71 ns0.6364 B1.00DT_TryFormat.NET 7.0G148.32 ns1.00–NADT_TryFormat.NET 8.0G83.60 ns0.57–NADT_TryFormatInvariant.NET 7.0G145.05 ns1.00–NADT_TryFormatInvariant.NET 8.0G79.77 ns0.55–NADT_ToString.NET 7.0s186.44 ns1.0064 B1.00DT_ToString.NET 8.0s29.35 ns0.1764 B1.00DT_ToStringInvariant.NET 7.0s182.15 ns1.0064 B1.00DT_ToStringInvariant.NET 8.0s27.67 ns0.1664 B1.00DT_TryFormat.NET 7.0s165.08 ns1.00–NADT_TryFormat.NET 8.0s15.53 ns0.09–NADT_TryFormatInvariant.NET 7.0s155.24 ns1.00–NADT_TryFormatInvariant.NET 8.0s15.50 ns0.10–NADT_ToString.NET 7.0u184.71 ns1.0064 B1.00DT_ToString.NET 8.0u29.62 ns0.1664 B1.00DT_ToStringInvariant.NET 7.0u184.01 ns1.0064 B1.00DT_ToStringInvariant.NET 8.0u26.98 ns0.1564 B1.00DT_TryFormat.NET 7.0u171.73 ns1.00–NADT_TryFormat.NET 8.0u16.08 ns0.09–NADT_TryFormatInvariant.NET 7.0u158.42 ns1.00–NADT_TryFormatInvariant.NET 8.0u15.58 ns0.10–NADT_ToString.NET 7.0U1,622.28 ns1.001240 B1.00DT_ToString.NET 8.0U206.08 ns0.1396 B0.08DT_ToStringInvariant.NET 7.0U1,567.92 ns1.001240 B1.00DT_ToStringInvariant.NET 8.0U207.60 ns0.1396 B0.08DT_TryFormat.NET 7.0U1,590.27 ns1.001144 B1.00DT_TryFormat.NET 8.0U190.98 ns0.12–0.00DT_TryFormatInvariant.NET 7.0U1,560.00 ns1.001144 B1.00DT_TryFormatInvariant.NET 8.0U184.11 ns0.12–0.00
解析也有了显著的改进。例如，dotnet/runtime#82877 改进了自定义格式字符串中“ddd”（一周中某天的缩写名称）、“dddd”（一周中某天的全名）、“MMM”（月份的缩写名称）和“MMMM”（月份的全名）的处理；这些在各种常用格式字符串中都有出现，比如在 RFC1123 格式的扩展定义中：ddd, dd MMM yyyy HH':'mm':'ss 'GMT'。当通用解析例程在格式字符串中遇到这些时，它需要查阅提供的 CultureInfo / DateTimeFormatInfo，以获取该语言区域设置的相关月份和日期名称，例如 DateTimeFormatInfo.GetAbbreviatedMonthName，然后需要对每个名称和输入文本进行语言忽略大小写的比较；开销很大。然而，如果我们得到的是一个不变的语言区域设置，我们可以做得更快，快得多。以“MMM”为例，代表缩写的月份名称。我们可以读取接下来的三个字符（uint m0 = span[0], m1 = span[1], m2 = span[2]），确保它们都是 ASCII ((m0 | m1 | m2) _guid.TryFormat(_dest, out _, format);}[/code]方法运行时格式平均值比率TryFormat.NET 7.0B23.622 ns1.00TryFormat.NET 8.0B7.341 ns0.31TryFormat.NET 7.0D22.134 ns1.00TryFormat.NET 8.0D5.485 ns0.25TryFormat.NET 7.0N20.891 ns1.00TryFormat.NET 8.0N4.852 ns0.23TryFormat.NET 7.0P24.139 ns1.00TryFormat.NET 8.0P6.101 ns0.25在从基元和数值类型转向其他主题之前，让我们快速看一下 System.Random，它有一些方法可以生成伪随机数值。
Random

dotnet/runtime#79790 来自 @mla-alm，它在 Random 中提供了一个基于 @lemire 的无偏范围函数的实现。当调用像 Next(int min, int max) 这样的方法时，它需要提供在范围 [min, max) 内的值。为了提供一个无偏的答案，.NET 7 的实现生成一个 32 位的值，将范围缩小到包含最大值的最小的 2 的幂（通过取最大值的 log2 并进行移位以丢弃位），然后检查结果是否小于最大值：如果是，它返回结果作为答案。但如果不是，它会拒绝该值（这个过程被称为“拒绝采样”）并循环重新开始整个过程。虽然当前方法产生每个样本的开销并不会很高，但这种判断方法的性质使得随机出来的样本可能会无效，这意味着需要不断循环和重试。使用新的方法，它实际上实现了模数减少（例如 Next() % max），除了用更简便的乘法和移位替换昂贵的模数操作；虽然还使用了一个“拒绝采样”，但它纠正的偏差发生的频率更低，因此更耗时的可能性发生的频率也更低。最终的结果是，Random 的方法的吞吐量平均上有了很好的提升（Random 也可以从动态 PGO 中获得提升，因为 Random 使用的内部抽象可以被去虚拟化，所以我在这里展示了启用和未启用 PGO 的影响。）

switch ((m0 << 16) | (m1 << 8) | m2 | 0x202020)
{
case 0x6a616e: /* 'jan' */ result = 1; break;
case 0x666562: /* 'feb' */ result = 2; break;
case 0x6d6172: /* 'mar' */ result = 3; break;
case 0x617072: /* 'apr' */ result = 4; break;
case 0x6d6179: /* 'may' */ result = 5; break;
case 0x6a756e: /* 'jun' */ result = 6; break;
case 0x6a756c: /* 'jul' */ result = 7; break;
case 0x617567: /* 'aug' */ result = 8; break;
case 0x736570: /* 'sep' */ result = 9; break;
case 0x6f6374: /* 'oct' */ result = 10; break;
case 0x6e6f76: /* 'nov' */ result = 11; break;
case 0x646563: /* 'dec' */ result = 12; break;
default: maxMatchStrLen = 0; break; // undo match assumption
}

复制代码

方法运行时平均值比率NextMax.NET 7.05.793 ns1.00NextMax.NET 8.0 w/o PGO1.840 ns0.32NextMax.NET 8.01.598 ns0.28dotnet/runtime#87219 由 @MichalPetryka 提出，然后进一步对此进行了优化，以适用于长值。算法的核心部分涉及将随机值乘以最大值，然后取乘积的低位部分：

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
private const string Format = "ddd, dd MMM yyyy HH':'mm':'ss 'GMT'";
private readonly string _s = new DateTime(1955, 11, 5, 6, 0, 0, DateTimeKind.Utc).ToString(Format, CultureInfo.InvariantCulture);
[Benchmark]
public void ParseExact() => DateTimeOffset.ParseExact(_s, Format, CultureInfo.InvariantCulture, DateTimeStyles.AllowInnerWhite | DateTimeStyles.AssumeUniversal);
}

复制代码

这可以通过不使用 UInt128 的乘法实现，而是使用 Math.BigMul 来提高效率，

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
private readonly CultureInfo _ci = new CultureInfo("ru-RU");
[Benchmark] public DateTime Parse() => DateTime.ParseExact("вторник, 18 апреля 2023 04:31:26", "dddd, dd MMMM yyyy HH:mm:ss", _ci);
}

复制代码

它是通过使用 Bmi2.X64.MultiplyNoFlags 或 Armbase.Arm64.MultiplyHigh 内部函数来实现的，当其中一个可用时。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers.Text;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
private readonly DateTime _dt = new DateTime(2023, 9, 1, 12, 34, 56);
private readonly byte[] _bytes = new byte[100];
[Benchmark] public bool TryFormatUtf8Formatter() => Utf8Formatter.TryFormat(_dt, _bytes, out _);
}

复制代码

方法运行时平均值比率NextMinMax.NET 7.09.839 ns1.00NextMinMax.NET 8.01.927 ns0.20最后，我要提到 dotnet/runtime#81627。Random 本身是一个常用的类型，同时也是一个抽象概念；Random 上的许多 API 是虚拟的，这样派生类型可以实现完全替换使用的算法。因此，例如，如果您想要实现一个从 Random 派生的 MersenneTwisterRandom，并通过重写每个虚拟方法完全替换基算法，您可以这样做，将您的实例作为 Random 传递，让大家都很高兴... 除非您经常创建派生类型并且关心分配。实际上，Random 包含了多个伪随机生成器。在 .NET6中，它赋予 Random 实现了 xoshiro128/xoshiro256 算法，当你只是创建一个新的 Random() 时使用。然而，如果您实例化一个派生类型，实现会回退到自 Random 诞生以来一直使用的相同算法（是一种变异的 Knuth 减法随机数生成算法），因为它不知道派生类型会做什么，也不知道它可能采用了哪种算法的依赖关系。这种算法携带一个 56 元素的 int[] 数组，这意味着即使派生类从未使用它，它们最终也会实例化和初始化这个数组。通过这个 PR，创建该数组的过程被延迟，只有在使用时才会初始化。有了这个改进，希望避免这种开销的派生实现就可以实现。

// dotnet run -c Release -f net7.0 --filter "*" --runtimes net7.0 net8.0
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Tests
{
[Benchmark]
[Arguments("America/Los_Angeles")]
[Arguments("Pacific Standard Time")]
public TimeZoneInfo FindSystemTimeZoneById(string id) => TimeZoneInfo.FindSystemTimeZoneById(id);
}

复制代码

方法运行时平均值比率分配分配比率NewDerived.NET 7.01,237.73 ns1.00312 B1.00NewDerived.NET 8.020.49 ns0.0272 B0.23
来源:https://www.cnblogs.com/yahle/archive/2023/11/18/Performance_Improvements_in_NET_8_Exceptions_Reflection_Primitives.html
免责声明：由于采集信息均来自互联网，如果侵犯了您的权益，请联系我们【E-Mail:cb@itdo.tech】我们会及时删除侵权内容，谢谢合作！

Performance Improvements in .NET 8 -- Exceptions & Reflection & Primit

本帖子中包含更多资源