Unexpected inconsistency in records

52 points | by OptionOfT 3 days ago

27 comments

andygocke 2 hours ago
Unfortunately, there are alternatives to this behavior, but they all have other downsides. The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well). There were two strong lowering contenders, with their own problems.
The first is to use a `With` method and rely on "optional" parameters in some sense. When you write `with { x = 3 }` you're basically writing a `.With(x: 3)` call, and `With` presumably calls the constructor with the appropriate values. The problem here is that optional parameters are also kind of fake. The .NET IL format doesn't have a notion of optional parameters -- the C# compiler just fills in the parameters when lowering the call. So that means that adding a new field to a record would require adding a new parameter. But adding a new parameter means that you've broken binary backwards compatibility. One of the goals of records was to make these kinds of "simple" data updates possible, instead of the current situation with classes where they can be very challenging.
The second option is a `With` method for every field. A single `with { }` call turns into N `WithX(3).WithY(5)` for each field being set. The problem with that is that it is a lot of dead assignments that need to be unwound by the JIT. We didn't see that happening reliably, which was pretty concerning because it would also result in a lot of allocation garbage.
So basically, this was a narrow decision that fit into the space we had. If I had the chance, I would completely rework dotnet/C# initialization for a reboot of the language.
One thing I proposed, but was not accepted, was to make records much more simple across the board. By forbidding a lot of the complex constructs, the footguns are also avoided. But that was seen as too limiting. Reading between the lines, I bet Jon wouldn’t have liked this either, as some of the fancy things he’s doing may not have been possible.
[-]
- louthy an hour ago
  > The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well).
  My biggest sadness reading this is that what MS have done is to outsource the issue to all C# devs. We will all hit this problem at some point (I have a couple of times) and I suspect we will all lose hours of time trying to work out WTF is going on. It may not quite be the Billion Dollar Mistake, but it's an ongoing cost to us all.
  A possible approach I mentioned elsewhere in the thread is this (for the generation of the `with`):
```
    var n2 = n1.<Clone>$();
    n2.Value = 3;                  // 'with' field setters
    n2.<OnPostCloneInitialise>();  // run the initialisers
```
  Then the <OnPostCloneInitialise>:
```
    public virtual void <OnPostCloneInitialise>()
    {
        base.<OnPostCloneInitialise>();

        Even = (Value & 1) == 0;    
    }
```
  If the compiler could generate the <OnPostCloneInitialise> based on the initialisation code in the record/class, could that work?
  That would just force the new object to initialise after the cloning without any additional IL or modifications.
wpollock 6 hours ago
As I read the post, I thought of relational data models. The behavior is expected. I believe the root issue is your records should not have computed fields that depend on mutable fields. Change your record schemas to eliminate that and you should have no further problems using "with".
If changing the schema isn't reasonable, use a copy constructor instead.
[-]
- louthy 6 hours ago
  > The behavior is expected
  It isn't, that's why there's a blog article documenting how unexpected it is.
  > that depend on mutable fields
  The fields are not mutable. The `with` expression creates a whole new record, clones the fields, and then sets the field you're changing (the field is read-only, so this is the compiler going 'behind the scenes' to update the new record before it sets the reference). The reason for all this is performance: the new structure is allocated on the heap, a memcopy happens (old structure copied onto the new), and then the `with` changes are applied. It's just at this point the 'init time' properties aren't run on the new object.
  In the language the fields are immutable. So, the argument is that a whole new record initialised with the fields of the old record (with some changes) should run the 'init time' properties so that they get set too, otherwise the data-structure can become inconsistent/poorly-defined.
  > use a copy constructor instead
  It's probably worth reading the article:
  "Note that because Value is set after the cloning operation, we couldn’t write a copy constructor to do the right thing here anyway."
  [-]
  - cbsmith 5 hours ago
    > It isn't, that's why there's a blog article documenting how unexpected it is.
    The behaviour is expected for the language design. Whether developers using the language expect it is a separate matter.
    The with operator clearly allows someone to break encapsulation and as such should only be used in cases where you aren't expecting encapsulation of the underlying record.
    > It's probably worth reading the article:
    > "Note that because Value is set after the cloning operation, we couldn’t write a copy constructor to do the right thing here anyway."
    It's probably worth reading the entire article, as that quote is followed by: "(At least, not in any sort of straightforward way – I’ll mention a convoluted approach later.)", which presumably was what was being referred to there.
    In general, there's a whole ton of gotchyas around encapsulation of precomputed values. That's just life outside of a purely functional programming context.
    [-]
    - louthy 5 hours ago
      > The behaviour is expected for the language design.
      So, Microsoft meant it. Ok...
      > Whether developers using the language expect it is a separate matter.
      Really? Perhaps read the 'Principle of Least Astonishment' [1] to see why this is a problem. If I create a new object I would expect the 'init time' properties to be initialised.
      > It's probably worth reading the entire article, as that quote is followed by: "(At least, not in any sort of straightforward way – I’ll mention a convoluted approach later.)", which presumably was what was being referred to there.
      It's probably worth continuing to read the article. Because the attempt to deal with it required manual writing of Lazy properties:
      private readonly Lazy<ComputedMembers> computed = new(() => new(Value), LazyThreadSafetyMode.ExecutionAndPublication);
      That's not practical. Might as well use computed properties.
      > In general, there's a whole ton of gotchyas around encapsulation of precomputed values. That's just life outside of a purely functional programming context.
      Great insight. Let's not run the 'init time' properties for a newly initialised object, just in case it works as expected. This 'feature' can't even be manually resolved by doing post-`with` updates (because often the properties are init/read-only). It makes the whole init-property feature brittle as fuck.
      [1] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
      [-]
      - cbsmith 4 hours ago
        > Really? Perhaps read the 'Principle of Least Astonishment' [1] to see why this is a problem. If I create a new object I would expect the 'init time' properties to be initialised.
        The Principle of Least Astonishment definitely applies. It's a deliberate design choice that unfortunately violates the principle.
        > Great insight. Let's not run the 'init time' properties for a newly initialised object, just in case it works as expected. This 'feature' can't even be manually resolved by doing post-`with` updates (because often the properties are init/read-only). It makes the whole init-property feature brittle as fuck.
        ? I'm not sure I follow what you are going with here, but yeah, in general you'd have to carefully limit all uses of "with" for objects with precomputed values to inside the encapsulation of said objects. Alternatively, as you mentioned, you could just not have precomputed properties.
        [-]
        louthy an hour ago
        > in general you'd have to carefully limit all uses of "with" for objects with precomputed values
        What you’re describing is incidental complexity. It is not a good thing. You can’t limit it with the language, you have to rely on the programmer following this and never ever making a mistake.
        Ultimately the incidental complexity for the average C# developer has increased, whereas a better direction of travel is toward correctness and declarative features. I would prefer it if the csharplang team worked toward that.
wavemode 6 hours ago
If John Skeet of all people is confused about something in C#, that probably means it's objectively confusing behavior.
OptionOfT 7 hours ago
I think this stems from having properties, which are synthetic sugar for a backing-field and a getter & setter function.
This muddies the water between just setting a field vs executing a function that does work and then sets the field.
If I write a record with an explicit setFoo(foo: Foo) I wouldn't expect a clone & subsequent direct field assignment to execute the setFoo(foo: Foo) code.
[-]
- louthy 6 hours ago
  > wouldn't expect a clone
  If I create a new object I would expect the 'init time' properties to be initialised. Regardless of how it was initialised. The current approach just leads to inconsistent data structures, with significant issues for debugging how a data structure got into an inconsistent state. Modern language features should not be trying to save nanoseconds like they did in the past. Or, should at least default to 'correct' with performance opts outs.
  [-]
  - OptionOfT 2 hours ago
    You don't even need to use the `with` operator to get into an inconsistent state:
```
    using System;
    
    // same behavior with 
    // public sealed class Inner
    public struct Inner
    {
        public int Value { get; set; }
    }
    
    // same behavior with
    // public sealed record class Outer(Inner Inner)
    public record struct Outer(Inner Inner)
    {
        public bool Even { get; } = (Inner.Value & 1) == 0;
    }
    
    class Program
    {
        public static void Main(string[] args)
        {
            var inner = new Inner { Value = 42 };
    
            var outer = new Outer(inner);
    
            inner.Value = 43;
    
            Console.WriteLine("{0} is {1}", inner.Value, outer.Even);
        }
    }
```
    Would you expect Even to be updated here?
    [-]
    - louthy 2 hours ago
      Sure, but you’re talking about a 25 year old feature (structs). One that was implemented like that for performance reasons in C# 1.0 – because we were all using single or dual core machines back then.
      Records were a relatively recent feature addition, computers are significantly more powerful, and our programs are significantly more complex. And so it’s hella frustrating that MS didn’t opt for program correctness over performance. They even missed the opportunity to close the uninitialised struct issue with record structs (they could have picked different semantics).
      I find their choices utterly insane when we’re in a world where bad data could mean a data breach and/or massive fine. The opportunity to make records robust and bulletproof was completely missed.
      I was hoping that we’d get to a set of data-types in C# that were robust and correct: product-types and sum-types. With proper pattern matching (exhaustiveness checking).
      Then either a Roslyn analyser or a C# ‘modern mode’ which only used the more modern ‘correct’ features so that we could actually go away from the compromises of the past.
      Unfortunately many modern features of C# are being built with these ugly artefacts seeping in. It’s the opposite of declarative programming, so it just keeps increasing incidental complexity: it exports the complexity to the users, something a type-safe compiler should be reducing.
_old_dude_ 8 hours ago
For the record (sorry), I believe C# uses the clone operation because records support inheritance.
For me, this is where lies the design flaw, trying to support both inheritance and be immutability at the same time.
[-]
- ethan_smith 3 hours ago
  The inheritance + immutability combination forces the compiler to use field-by-field copying rather than constructor chaining, which bypasses the property initialization logic that would maintain consistency between related fields.
- louthy 5 hours ago
  Cloning anything creates a new object of a known type (well, the runtime knows at least) and so if the object re-runs the init-properties of the known type then it will be the same as constructing that type afresh.
  You could even imagine a compiler generated virtual method: `OnCloneReinitialiseFields()`, or something, that just re-ran the init-property setters (post clone operation).
  Is there some other inheritance issue that is problematic here? Immutability isn't a concern, it's purely about what happens after cloning an object, whether the fields are immutable or not doesn't change the behaviour of the `with` operation.
benmmurphy 7 hours ago
Seems like enough of a gotcha that other people have stumbled upon the issue as well: https://blog.codingmilitia.com/2022/09/01/beware-of-records-...
[-]
- firesteelrain 7 hours ago
  > SomeCalculatedValue in the second line still has the same value as in the first. This makes sense, as what happens when we use the with expression, is that the record is cloned and then the properties provided within the brackets are overwritten.
  Shouldn't SomeCalculatedValue be "This is another some value *calculated*" when using with ?
  Edit: Actually that is the problem, that it isn't "recalculating" due to the way that the initialization of read-only properties works in a C# record.
  [-]
  - bee_rider 5 hours ago
    Did you go through the same journey as the author of the article, in compressed form? Further evidence it is a confusing feature, if so!
    [-]
    - firesteelrain 5 hours ago
      Yes! It didn’t make sense at first. It’s not intuitive unless you understand the internals on how it works and then it makes sense. I don’t think there is anything wrong just needs to be documented better
louthy 6 hours ago
I've hit this before and facepalmed when I realised what they had done. Records were supposed to work more like immutable product-types in F# and other functional languages, but with this approach it can't be seen as anything other than broken IMHO.
Sometimes the csharplang team make some utterly insane decisions. This is the kind of thing that would happen back-in-the-day™ in the name of performance, but in the modern world just adds to the endless list of quirks (like uninitialised structs). I suspect there are elements of the design team that are still stuck in C#1.0/Java mode and so holes like this don't even seem that bad in their minds. But it literally leads to inconsistent data structures which is where bugs live (and potential security issues in the worst cases).
harpiaharpyja 2 hours ago
Ehh, I think the real footgun here is using a property with backing storage to store what is clearly a derived value. Using a computed property is what we really should be doing here, if we think our code should line up with our intentions.
I feel like what's happened here is that the author actually needed a system to cache their derived values, but didn't think to build that out explicitly.
uticus 8 hours ago
It's been a while since I followed Jon Skeet, but his books on Manning were always worthwhile. Plus the Jon Skeet facts [0] is fun.
[0] https://meta.stackexchange.com/questions/9134/jon-skeet-fact...
high_na_euv 7 hours ago
To be fair I dont think this behaviour is unreasonable
[-]
- mlhpdx 2 hours ago
  I tend to agree. It’s always been my understanding that record types were strictly data holders and shouldn’t be embellished with behavior.
apwell23 7 hours ago
i wonder how much of Coding Agents/AIs are now just Jon Skeet.
croemer 8 hours ago
The "– Jon Skeet's coding blog" in the title is not necessary as the URL shows in parentheses after the title. Adding C# however might be helpful.