You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
instances of sealed case class hierarchies with no mutable fields should be represented by singleton objects in memory, have their hashCode memoized, and equality implemented as reference equality
#3375
Closed
scabug opened this issue
Apr 30, 2010
· 6 comments
to model abstract syntax trees and do some rewriting on them.
I've written my rewriting engine in a purely functional style, using trampolines (instead of mutually tail recursive functions because of the incomplete TCO support) to avoid overflowing the stack when doing various actions in pre/in/post order during traversal of deep trees.
Because of the purely functional coding style, I use Stacks, Maps and HashMaps to bind and unbind data with these AST nodes as the rewriting progresses top down, bottom up, breadth first or depth first.
As nothing can distinguish Expr instances having an identical structure, it would make sense to have all such instances represented by a single object in memory, and to have hashCode memoized at creation time, and equals redefined to be reference equality. This would provide a dramatic speedup, reduce memory footprint, would avoid stack overflow problems when inserting such things in HashTables, etc...
Because of the "bad" behaviour of :
memory allocation : instance with identical structure are not shared
hashCode : recursive but not tail recursive and not memoized
equality test : recursive but not tail recursive
for such case class hierarchies, the code is slow, eats up a lot of memory, and triggers stack overflows when inserting deep terms in Maps or HashMaps (whereas these things cause no problems in CAML for instance).
The only solution I've found to sidestep this is to manually maintain a pool unique instances (doing structural hashing and instance numbering), and to manually overload the hashCode and equals methods, which is a bit boring and non scalable. I'm under the impression that this structural hashing could be automated by the run time, or that the code could be transformed by the compiler to achieve this in the case of sealed hierarchies of case classes with no mutable fields.
This would make scala more efficient for language processing applications of this kind.
The remark would also apply to all generic containers instanciated for sealed case class hierarchies, like lists (lists suffixes could be globally unique in an application)
best regards.
The text was updated successfully, but these errors were encountered:
@paulp said:
I am sympathetic to these issues. It should be addressed in the larger context of making case classes less of an all-or-nothing endeavor; in particular I think we should have a mechanism for autogenerated equals/hashCode without the other baggage. And such a mechanism could be able to address at least a subset of the above.
The direction I'd propose is compiler-recognized marker interfaces.
@dubochet said:
One thing to note: there is nothing that prevents a case class from having mutable state.�In the case of your Expr example, there isn't, but that is because none was defined. Furthermore, in some cases, identity remain important even is structure is identical — even with a case class, you can test for identity using method eq. As such, you cannot say, in general, that case classes with the same structure can be shared.
As Paul mentioned, this problem probably needs to be considered in terms of making case classes more configurable and not an all-or-nothing construct like it is now.
@dubochet said:
If this is a topic you want to pursue, we would very much like to have a more detailed proposal from you, probably in the form of a SID. If you want to go in that direction, I would recommend you discuss this topic on the mailing list to collect feedback from the user community.
thsoft said:
Hmm, I was kind of surprised when I saw this issue: I used to think that immutable case classes were implemented the proposed way. What's the obstacle of this?
Hello,
this remark applies to Scala 2.7.7.
I am using sealed, purely functional, case class hierarchies like this one :
to model abstract syntax trees and do some rewriting on them.
I've written my rewriting engine in a purely functional style, using trampolines (instead of mutually tail recursive functions because of the incomplete TCO support) to avoid overflowing the stack when doing various actions in pre/in/post order during traversal of deep trees.
Because of the purely functional coding style, I use Stacks, Maps and HashMaps to bind and unbind data with these AST nodes as the rewriting progresses top down, bottom up, breadth first or depth first.
As nothing can distinguish Expr instances having an identical structure, it would make sense to have all such instances represented by a single object in memory, and to have hashCode memoized at creation time, and equals redefined to be reference equality. This would provide a dramatic speedup, reduce memory footprint, would avoid stack overflow problems when inserting such things in HashTables, etc...
Because of the "bad" behaviour of :
memory allocation : instance with identical structure are not shared
hashCode : recursive but not tail recursive and not memoized
equality test : recursive but not tail recursive
for such case class hierarchies, the code is slow, eats up a lot of memory, and triggers stack overflows when inserting deep terms in Maps or HashMaps (whereas these things cause no problems in CAML for instance).
The only solution I've found to sidestep this is to manually maintain a pool unique instances (doing structural hashing and instance numbering), and to manually overload the hashCode and equals methods, which is a bit boring and non scalable. I'm under the impression that this structural hashing could be automated by the run time, or that the code could be transformed by the compiler to achieve this in the case of sealed hierarchies of case classes with no mutable fields.
This would make scala more efficient for language processing applications of this kind.
The remark would also apply to all generic containers instanciated for sealed case class hierarchies, like lists (lists suffixes could be globally unique in an application)
best regards.
The text was updated successfully, but these errors were encountered: