You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to parse files that have millions of lines. I noticed that my combinator parser gets slower and slower as it parses more and more lines. The problem seems to be in scala's "rep" or regex parsers, because this behaviour occurs even for the simple example parser shown below:
def file: Parser[Int] = rep(line) ^^ { 1 } // a file is a repetition of lines
def line: Parser[Int] = """(?m)^.*$""".r ^^ { 0 } // reads a line and returns 0
When I try to parse a file with millions of lines of equal length with this simple parser, in the beginning it parses 46 lines/ms. After 370000 lines, the speed drops to 20 lines/ms. After 840000 lines, it drops to 10 lines/ms. After 1790000 lines, 5 lines/ms...
I know that, in principle, combinator parsers were not made for parsing inputs that are so large. But wouldn't it be possible to improve "rep" and the regex parsers so that this performance degradation does not occur?
I need to parse files that have millions of lines. I noticed that my combinator parser gets slower and slower as it parses more and more lines. The problem seems to be in scala's "rep" or regex parsers, because this behaviour occurs even for the simple example parser shown below:
def file: Parser[Int] = rep(line) ^^ { 1 } // a file is a repetition of lines
def line: Parser[Int] = """(?m)^.*$""".r ^^ { 0 } // reads a line and returns 0
When I try to parse a file with millions of lines of equal length with this simple parser, in the beginning it parses 46 lines/ms. After 370000 lines, the speed drops to 20 lines/ms. After 840000 lines, it drops to 10 lines/ms. After 1790000 lines, 5 lines/ms...
I know that, in principle, combinator parsers were not made for parsing inputs that are so large. But wouldn't it be possible to improve "rep" and the regex parsers so that this performance degradation does not occur?
I asked a related question in Stackoverflow: http://stackoverflow.com/questions/23117635/why-is-scalas-combinator-parsing-slow-when-parsing-large-files-what-can-i-do
Thanks!
The text was updated successfully, but these errors were encountered: