Scala Programming Language
  1. Scala Programming Language
  2. SI-4603

Unable to inherit from Hadoop Mapper (new or old API) in 2.9.0 with code that functions in 2.8.1

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Scala 2.10.0
    • Component/s: Misc Compiler
    • Labels:
      None

      Description

      === What steps will reproduce the problem (please be specific and use wikiformatting)? ===

      The following code works in 2.8.1:

      import org.apache.hadoop.io.{NullWritable, LongWritable, Text}
      import org.apache.hadoop.mapreduce.{Mapper, Job}
      
      
      object MyJob {
        def main(args:Array[String]) {
          val job = new Job(new Configuration())
          job.setMapperClass(classOf[MyMapper])
          
        }
      }
      
      class MyMapper extends Mapper[LongWritable,Text,Text,Text] {
        override def map(key: LongWritable, value: Text, context: Mapper[LongWritable,Text,Text,Text]#Context) {
      
        }
      }
      
      

      === What do you see instead? ===
      I get this compiler error:

      error: type mismatch;
      found   : java.lang.Class[MyMapper](classOf[MyMapper])
      required: java.lang.Class[_ <: org.apache.hadoop.mapreduce.Mapper]
      job.setMapperClass(classOf[MyMapper])
      

      === Additional information ===
      Daniel Sobral vry helpfully directed me here from my Stack Overflow question at (http://stackoverflow.com/questions/6028221/how-does-one-implement-a-hadoop-mapper-in-scala-2-9-0).

      The method definition of the failing call (Job.setMapperClass) is this:

      public void setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Mapper> cls) throws java.lang.IllegalStateException { /* compiled code */ }
      

      The method definition of Mapper itself is this:

      public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

      === What versions of the following are you using? ===

      • Scala: 2.8.1 and 2.9.0
      • Java: latest
      • Operating system: OSX / Linux

        Activity

        Hide
        Lukas Rytz added a comment -

        (comment by extempore):

        This is induced by the use of a type constructor as a bound. Much simplified:

        
        // J.java
        public class J<T> {
          public static void f(java.lang.Class<? extends J> cls) { }
          // correctly it should be like this, and then it would work.
          // unfortunately that doesn't mean we don't have to deal with it.
          // public static void f(java.lang.Class<? extends J<?>> cls) { }
        }
        // S.scala
        class S extends J[AnyRef]
        
        object Test {    
          def main(args:Array[String]) {
            J.f(classOf[S])
          }
        }
        

        This results in:

        
        S.scala:5: error: type mismatch;
         found   : java.lang.Class[S](classOf[S])
         required: java.lang.Class[_ <: J]
            J.f(classOf[S])
                       ^
        one error found
        

        I don't immediately see how to work around it from the scala side alone, because scala is picky about how we express these things.

        
        // nope
        J.f(classOf[S]: Class[_ <: J[_]])
        
        S.scala:5: error: type mismatch;
         found   : Class[_$1(in method main)] where type _$1(in method main) <: J[_]
         required: java.lang.Class[_ <: J]
            J.f(classOf[S]: Class[_ <: J[_]])
                          ^
        one error found
        
        // nope
        J.f(classOf[S]: Class[J[_]])
        
        S.scala:5: error: type mismatch;
         found   : java.lang.Class[S](classOf[S])
         required: Class[J[_]]
        Note: S <: J[_] (and java.lang.Class[S](classOf[S]) <: java.lang.Class[S]), but Java-defined class Class is invariant in type T.
        You may wish to investigate a wildcard type such as `_ <: J[_]`. (SLS 3.2.10)
            J.f(classOf[S]: Class[J[_]])
                       ^
        one error found
        
        // naturally, nope
        J.f(classOf[S]: Class[J])
        
        S.scala:5: error: class J takes type parameters
            J.f(classOf[S]: Class[J])
                                  ^
        one error found
        
        Show
        Lukas Rytz added a comment - (comment by extempore): This is induced by the use of a type constructor as a bound. Much simplified: // J.java public class J<T> { public static void f(java.lang.Class<? extends J> cls) { } // correctly it should be like this, and then it would work. // unfortunately that doesn't mean we don't have to deal with it. // public static void f(java.lang.Class<? extends J<?>> cls) { } } // S.scala class S extends J[AnyRef] object Test { def main(args:Array[String]) { J.f(classOf[S]) } } This results in: S.scala:5: error: type mismatch; found : java.lang.Class[S](classOf[S]) required: java.lang.Class[_ <: J] J.f(classOf[S]) ^ one error found I don't immediately see how to work around it from the scala side alone, because scala is picky about how we express these things. // nope J.f(classOf[S]: Class[_ <: J[_]]) S.scala:5: error: type mismatch; found : Class[_$1(in method main)] where type _$1(in method main) <: J[_] required: java.lang.Class[_ <: J] J.f(classOf[S]: Class[_ <: J[_]]) ^ one error found // nope J.f(classOf[S]: Class[J[_]]) S.scala:5: error: type mismatch; found : java.lang.Class[S](classOf[S]) required: Class[J[_]] Note: S <: J[_] (and java.lang.Class[S](classOf[S]) <: java.lang.Class[S]), but Java-defined class Class is invariant in type T. You may wish to investigate a wildcard type such as `_ <: J[_]`. (SLS 3.2.10) J.f(classOf[S]: Class[J[_]]) ^ one error found // naturally, nope J.f(classOf[S]: Class[J]) S.scala:5: error: class J takes type parameters J.f(classOf[S]: Class[J]) ^ one error found
        Hide
        Martin Odersky added a comment -

        Paul, can you bisect to see what made the regression happen? Then assign to the one who did the commit (might well be myself).

        Show
        Martin Odersky added a comment - Paul, can you bisect to see what made the regression happen? Then assign to the one who did the commit (might well be myself).
        Hide
        Paul Phillips added a comment -

        It is indeed yourself, in r24275.

        commit d832118727bc6ffc1b87519086bae47e06090bb6
        Author: odersky <odersky@5e8d7ff9-d8ef-0310-90f0-a4852d11357a>
        Date: Sun Feb 13 15:04:53 2011 +0000

        More tweaks to rawToExistential to avoid pile-up of transformations.

        git-svn-id: http://lampsvn.epfl.ch/svn-repos/scala/scala/trunk@24275 5e8d7ff9-d8ef-0310-90f0-a4852d11357a

        Show
        Paul Phillips added a comment - It is indeed yourself, in r24275. commit d832118727bc6ffc1b87519086bae47e06090bb6 Author: odersky <odersky@5e8d7ff9-d8ef-0310-90f0-a4852d11357a> Date: Sun Feb 13 15:04:53 2011 +0000 More tweaks to rawToExistential to avoid pile-up of transformations. git-svn-id: http://lampsvn.epfl.ch/svn-repos/scala/scala/trunk@24275 5e8d7ff9-d8ef-0310-90f0-a4852d11357a
        Hide
        Ismael Juma added a comment -

        I have verified that this is fixed in 2.9.0-1. Could not find the exact commit that fixed it though.

        Show
        Ismael Juma added a comment - I have verified that this is fixed in 2.9.0-1. Could not find the exact commit that fixed it though.
        Hide
        Leif Wickland added a comment -

        I was unable to duplicate Ismael Juma's finding that this is fixed in 2.9.0-1. I created an sbt project with build.properties like:

        project.scratch=true
        project.name=test
        sbt.version=0.7.7
        project.version=1.0
        build.scala.versions=2.9.0-1
        project.initialize=false
        

        The build/Project.scala looked like:

        import sbt._
        
        class Project(info: ProjectInfo) extends DefaultProject(info) {
          val clouderaRepo = "cloudera release" at "https://repository.cloudera.com/content/repositories/releases"
          val cdhVer = "0.20.2-cdh3u0"
          val hadoopCore = "org.apache.hadoop" % "hadoop-core" % cdhVer % "provided"
        }
        

        The project contained exactly one source file:

        import org.apache.hadoop._
        import org.apache.hadoop.io._
        import org.apache.hadoop.conf._
        import org.apache.hadoop.mapreduce._
        
        object MyJob {
          def main(args:Array[String]) {
            val job = new Job(new Configuration())
            job.setMapperClass(classOf[MyMapper])
          }
        }
        
        class MyMapper extends Mapper[LongWritable,Text,Text,Text] {
          override def map(key: LongWritable, value: Text, context: Mapper[LongWritable,Text,Text,Text]#Context) {
          }
        }
        

        Compiling resulted in the following error message

        [error] ....MyJob.scala:9: type mismatch;
        [error]  found   : java.lang.Class[MyMapper](classOf[MyMapper])
        [error]  required: java.lang.Class[_ <: org.apache.hadoop.mapreduce.Mapper]
        [error]     job.setMapperClass(classOf[MyMapper])
        [error]                               ^
        [error] one error found
        
        Show
        Leif Wickland added a comment - I was unable to duplicate Ismael Juma's finding that this is fixed in 2.9.0-1. I created an sbt project with build.properties like: project.scratch=true project.name=test sbt.version=0.7.7 project.version=1.0 build.scala.versions=2.9.0-1 project.initialize=false The build/Project.scala looked like: import sbt._ class Project(info: ProjectInfo) extends DefaultProject(info) { val clouderaRepo = "cloudera release" at "https://repository.cloudera.com/content/repositories/releases" val cdhVer = "0.20.2-cdh3u0" val hadoopCore = "org.apache.hadoop" % "hadoop-core" % cdhVer % "provided" } The project contained exactly one source file: import org.apache.hadoop._ import org.apache.hadoop.io._ import org.apache.hadoop.conf._ import org.apache.hadoop.mapreduce._ object MyJob { def main(args:Array[String]) { val job = new Job(new Configuration()) job.setMapperClass(classOf[MyMapper]) } } class MyMapper extends Mapper[LongWritable,Text,Text,Text] { override def map(key: LongWritable, value: Text, context: Mapper[LongWritable,Text,Text,Text]#Context) { } } Compiling resulted in the following error message [error] ....MyJob.scala:9: type mismatch; [error] found : java.lang.Class[MyMapper](classOf[MyMapper]) [error] required: java.lang.Class[_ <: org.apache.hadoop.mapreduce.Mapper] [error] job.setMapperClass(classOf[MyMapper]) [error] ^ [error] one error found
        Hide
        Leif Wickland added a comment - - edited

        I also saw the same compile error on 2.9.1-SNAPSHOT.

        I should've mentioned explicitly that I tested with Cloudera's CDH3, which is based on Hadoop 20.2.

        Show
        Leif Wickland added a comment - - edited I also saw the same compile error on 2.9.1-SNAPSHOT. I should've mentioned explicitly that I tested with Cloudera's CDH3, which is based on Hadoop 20.2.
        Hide
        Ismael Juma added a comment -

        Hi Leif. Interesting. The difference between your example and the one I have is that my mapper is an inner class of the object. I tried doing the same for your example and then it works.

        So, you're right that this is not actually fixed. It's just that you can workaround it by making the mapper an inner class of the object. Coincidentally, I made that change before I tested it with 2.9.0-1 and that made me think it had been fixed in 2.9.0-1.

        Show
        Ismael Juma added a comment - Hi Leif. Interesting. The difference between your example and the one I have is that my mapper is an inner class of the object. I tried doing the same for your example and then it works. So, you're right that this is not actually fixed. It's just that you can workaround it by making the mapper an inner class of the object. Coincidentally, I made that change before I tested it with 2.9.0-1 and that made me think it had been fixed in 2.9.0-1.
        Hide
        Leif Wickland added a comment - - edited

        Ismael,

        Thanks for taking another look at this.

        I had actually tried the inner class trick before I posted and it failed for me. I discovered experimentally that that's because I put my inner class after the main(). If the inner class MyMapper is declared before the main, it works fine.

        It turns out, having MyMapper as a top level class declared in the same file and before MyJob also works around the problem.

        Show
        Leif Wickland added a comment - - edited Ismael, Thanks for taking another look at this. I had actually tried the inner class trick before I posted and it failed for me. I discovered experimentally that that's because I put my inner class after the main(). If the inner class MyMapper is declared before the main, it works fine. It turns out, having MyMapper as a top level class declared in the same file and before MyJob also works around the problem.
        Hide
        Leif Wickland added a comment - - edited

        For an unknown reason, the workaround of defining the mapper and reducer before the job stopped working when I added enough files to my project. I didn't notice the problem until I made a change that caused sbt (v0.10) to have to recompile a few files. Now I see an error like:

        [error]  found   : java.lang.Class[MyReducer](classOf[MyReducer])
        [error]  required: java.lang.Class[_ <: org.apache.hadoop.mapreduce.Reducer]
        [error]     job.setCombinerClass(classOf[MyReducer])
        
        Show
        Leif Wickland added a comment - - edited For an unknown reason, the workaround of defining the mapper and reducer before the job stopped working when I added enough files to my project. I didn't notice the problem until I made a change that caused sbt (v0.10) to have to recompile a few files. Now I see an error like: [error] found : java.lang.Class[MyReducer](classOf[MyReducer]) [error] required: java.lang.Class[_ <: org.apache.hadoop.mapreduce.Reducer] [error] job.setCombinerClass(classOf[MyReducer])
        Hide
        Leif Wickland added a comment -

        Going back to Scala 2.8.1 allowed me to avoid this problem.

        Show
        Leif Wickland added a comment - Going back to Scala 2.8.1 allowed me to avoid this problem.
        Hide
        Paul Phillips added a comment -

        See also duplicate SI-4769. I can see exactly what line is causing this and removing the line does fix this. Unfortunately there is no test case to give me an idea of what the line is supposed to be preventing.

        Show
        Paul Phillips added a comment - See also duplicate SI-4769 . I can see exactly what line is causing this and removing the line does fix this. Unfortunately there is no test case to give me an idea of what the line is supposed to be preventing.
        Hide
        Thomas Themel added a comment -

        Here's what I ran into, which might make a nicer test case than Hadoop because all the required classes come with the JDK:

        package something.weird
        
        import javax.xml.bind.annotation.adapters.{XmlAdapter,XmlJavaTypeAdapter}
        import scala.reflect.BeanInfo
        
        @BeanInfo class Intermediate {
            var foo : String = _
            var bar: String = _
        }
        
        @XmlJavaTypeAdapter(classOf[Adapter]) class Original(val foo: String, val bar: String) { 
        }
        
        class Adapter extends XmlAdapter[Intermediate, Original] { 
            def marshal(o: Original) : Intermediate = null
            def unmarshal(i: Intermediate) : Original = null
        }
        

        Compiles in 2.8.1, fails in 2.9.0 with

        [error] jaxb.scala:11: type mismatch;
        [error]  found   : java.lang.Class[something.weird.Adapter](classOf[something.weird.Adapter])
        [error]  required: java.lang.Class[_ <: javax.xml.bind.annotation.adapters.XmlAdapter]
        [error] @XmlJavaTypeAdapter(classOf[Adapter]) class Original(val foo: String, val bar: String) {
        
        Show
        Thomas Themel added a comment - Here's what I ran into, which might make a nicer test case than Hadoop because all the required classes come with the JDK: package something.weird import javax.xml.bind.annotation.adapters.{XmlAdapter,XmlJavaTypeAdapter} import scala.reflect.BeanInfo @BeanInfo class Intermediate { var foo : String = _ var bar: String = _ } @XmlJavaTypeAdapter(classOf[Adapter]) class Original(val foo: String, val bar: String) { } class Adapter extends XmlAdapter[Intermediate, Original] { def marshal(o: Original) : Intermediate = null def unmarshal(i: Intermediate) : Original = null } Compiles in 2.8.1, fails in 2.9.0 with [error] jaxb.scala:11: type mismatch; [error] found : java.lang.Class[something.weird.Adapter](classOf[something.weird.Adapter]) [error] required: java.lang.Class[_ <: javax.xml.bind.annotation.adapters.XmlAdapter] [error] @XmlJavaTypeAdapter(classOf[Adapter]) class Original(val foo: String, val bar: String) {
        Hide
        Paul Phillips added a comment -

        See also duplicate SI-4822.

        Show
        Paul Phillips added a comment - See also duplicate SI-4822 .
        Hide
        Shai Yallin added a comment -

        We ran into a problem similar to Thomas Themel (a colleague of mine opened SI-4822). I think it's safe to assume that the raw types issue exists across the JDK and 3rd party libraries and since we can't get anyone else to fix it, I recon a fix in the Scala compiler is needed - even though this is not strictly a bug in the compiler...

        Show
        Shai Yallin added a comment - We ran into a problem similar to Thomas Themel (a colleague of mine opened SI-4822 ). I think it's safe to assume that the raw types issue exists across the JDK and 3rd party libraries and since we can't get anyone else to fix it, I recon a fix in the Scala compiler is needed - even though this is not strictly a bug in the compiler...
        Hide
        Commit Message Bot added a comment -

        (odersky in r25394) Closes #4603. Review by extempore.

        Show
        Commit Message Bot added a comment - (odersky in r25394 ) Closes #4603. Review by extempore.
        Hide
        Martin Odersky added a comment -

        My commit r25394 should fix this.

        Show
        Martin Odersky added a comment - My commit r25394 should fix this.
        Hide
        Commit Message Bot added a comment -

        (extempore in r25403) Test case for SI-4603, no review.

        Show
        Commit Message Bot added a comment - (extempore in r25403 ) Test case for SI-4603 , no review.
        Hide
        Paul Phillips added a comment -

        This will be in the next RC for 2.9.1 as well.

        Show
        Paul Phillips added a comment - This will be in the next RC for 2.9.1 as well.
        Hide
        Leif Wickland added a comment -

        I verified that my problem is resolved in 2.9.1 RC2. Thanks for the fix!

        Show
        Leif Wickland added a comment - I verified that my problem is resolved in 2.9.1 RC2. Thanks for the fix!
        Hide
        Chris Bissell added a comment -

        Verified that the problem is fixed in 2.9.1.RC3 for our code. Many thanks! And the compiler is snappier too!

        Show
        Chris Bissell added a comment - Verified that the problem is fixed in 2.9.1.RC3 for our code. Many thanks! And the compiler is snappier too!
        Hide
        Jiamin Zhao added a comment - - edited

        Will this fix be backported to 2.8.1 or 2.9.0? Cuz I'm using Lift and GridGain. I have the same problem even with 2.8.1.

        Show
        Jiamin Zhao added a comment - - edited Will this fix be backported to 2.8.1 or 2.9.0? Cuz I'm using Lift and GridGain. I have the same problem even with 2.8.1.
        Hide
        Federico Ragona added a comment - - edited

        Hello,
        I'm facing a similar issue in Scala 2.10.3, only when extending a Java class from a Scala class. This issue is not reproduced if AMapper (see code below) is implemented as a Scala class.

        Code

        Here is my project layout:

        src
          main
            java
              myhadoop
                AMapper.java
            scala
              myhadoop
                MyMapper.scala
        
        

        And these are my classes' sources:

        AMapper.java source code
        package myhadoop;
        
        import org.apache.hadoop.io.BytesWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.mapreduce.Mapper;
        
        public class AMapper extends Mapper<Text, BytesWritable, Text, Text> {
          public void handleThis(Text key, Mapper<Text, BytesWritable, Text, Text>.Context context, Text text) {
            // not relevant to this test
          }
        }
        
        MyMapper.scala source code
        package myhadoop
        
        import org.apache.hadoop.io.{BytesWritable, Text}
        import org.apache.hadoop.mapreduce.Mapper
        
        class MyMapper extends AMapper {
         
          override def map(key: Text, value: BytesWritable, context: Mapper[Text, BytesWritable, Text, Text]#Context) {
            handleThis(key, context, new Text(value.getBytes))
          }
        }
        

        Error

        Compiling this code will raise the following error:

        error: type mismatch;
        found: org.apache.hadoop.mapreduce.Mapper[org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text]#Context
        required: org.apache.hadoop.mapreduce.Mapper[KEYIN,VALUEIN,KEYOUT,VALUEOUT]#Context
        

        pointing to the handleThis(key, context, new Text(bytes)) line.

        Environment

        Scala 2.10.3
        Java 1.7.0u21
        Ubuntu Linux 12.04

        Show
        Federico Ragona added a comment - - edited Hello, I'm facing a similar issue in Scala 2.10.3, only when extending a Java class from a Scala class. This issue is not reproduced if AMapper (see code below) is implemented as a Scala class. Code Here is my project layout: src main java myhadoop AMapper.java scala myhadoop MyMapper.scala And these are my classes' sources: AMapper.java source code package myhadoop; import org.apache.hadoop.io.BytesWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class AMapper extends Mapper<Text, BytesWritable, Text, Text> { public void handleThis(Text key, Mapper<Text, BytesWritable, Text, Text>.Context context, Text text) { // not relevant to this test } } MyMapper.scala source code package myhadoop import org.apache.hadoop.io.{BytesWritable, Text} import org.apache.hadoop.mapreduce.Mapper class MyMapper extends AMapper { override def map(key: Text, value: BytesWritable, context: Mapper[Text, BytesWritable, Text, Text]#Context) { handleThis(key, context, new Text(value.getBytes)) } } Error Compiling this code will raise the following error: error: type mismatch; found: org.apache.hadoop.mapreduce.Mapper[org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text]#Context required: org.apache.hadoop.mapreduce.Mapper[KEYIN,VALUEIN,KEYOUT,VALUEOUT]#Context pointing to the handleThis(key, context, new Text(bytes)) line. Environment Scala 2.10.3 Java 1.7.0u21 Ubuntu Linux 12.04
        Hide
        Simon Ochsenreither added a comment -

        Hi Frederico, could you provide some self-contained piece of code? That would make it easier and faster to investigate this issue. Thanks!

        Show
        Simon Ochsenreither added a comment - Hi Frederico, could you provide some self-contained piece of code? That would make it easier and faster to investigate this issue. Thanks!
        Hide
        Federico Ragona added a comment -

        Simon, I have to apologize to you.
        I thought that the snippet I pasted in my original comment was enough to reproduce the issue but it's definitely not. I just edited my original comment to provide you with all the information needed to reproduce it. As I mentioned in the comment, the compiler will raise this issue only when extending a Java class.

        I hope this helps you.

        Show
        Federico Ragona added a comment - Simon, I have to apologize to you. I thought that the snippet I pasted in my original comment was enough to reproduce the issue but it's definitely not. I just edited my original comment to provide you with all the information needed to reproduce it. As I mentioned in the comment, the compiler will raise this issue only when extending a Java class. I hope this helps you.

          People

          • Assignee:
            Martin Odersky
            Reporter:
            Chris Bissell
            TracCC:
            Ismael Juma, Paul Phillips, Robbie Coleman
          • Votes:
            8 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development