У меня есть несколько вопросов с этим кодом ниже:
val input1 = rawinput.map(_.split("\t")).map(x=>(x(6).trim(),x)).sortByKey()
val input2 = input1.map(x=> x._2.mkString("\t"))
val x0 = input2.map(_.split("\t")).map(x => (x(6),x(0))
val x1 = input2.map(_.split("\t")).map(x => (x(6),x(1))
val x2 = input2.map(_.split("\t")).map(x => (x(6),x(2))
val x3 = input2.map(_.split("\t")).map(x => (x(6),x(3))
val x4 = input2.map(_.split("\t")).map(x => (x(6),x(4))
val x5 = input2.map(_.split("\t")).map(x => (x(6),x(5))
val x6 = input2.map(_.split("\t")).map(x => (x(6),x(6))
val x = x0 union x1 union x2 union x3 union x4 union x5 union x6
<pre>
**Lineage Graph:**
(7) UnionRDD[25] at union at rddCustUtil.scala:78 []
| UnionRDD[24] at union at rddCustUtil.scala:78 []
| UnionRDD[23] at union at rddCustUtil.scala:78 []
| UnionRDD[22] at union at rddCustUtil.scala:78 []
| UnionRDD[21] at union at rddCustUtil.scala:78 []
| UnionRDD[20] at union at rddCustUtil.scala:78 []
| MapPartitionsRDD[7] at map at rddCustUtil.scala:43 []
| MapPartitionsRDD[6] at map at rddCustUtil.scala:43 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[9] at map at rddCustUtil.scala:48 []
| MapPartitionsRDD[8] at map at rddCustUtil.scala:48 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[11] at map at rddCustUtil.scala:53 []
| MapPartitionsRDD[10] at map at rddCustUtil.scala:53 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[13] at map at rddCustUtil.scala:58 []
| MapPartitionsRDD[12] at map at rddCustUtil.scala:58 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[15] at map at rddCustUtil.scala:63 []
| MapPartitionsRDD[14] at map at rddCustUtil.scala:63 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[17] at map at rddCustUtil.scala:68 []
| MapPartitionsRDD[16] at map at rddCustUtil.scala:68 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
| MapPartitionsRDD[19] at map at rddCustUtil.scala:73 []
| MapPartitionsRDD[18] at map at rddCustUtil.scala:73 []
| MapPartitionsRDD[5] at map at rddCustUtil.scala:40 []
| ShuffledRDD[4] at sortByKey at rddCustUtil.scala:38 []
+-(1) MapPartitionsRDD[3] at map at rddCustUtil.scala:38 []
| MapPartitionsRDD[2] at map at rddCustUtil.scala:38 []
| /Data/ MapPartitionsRDD[1] at textFile at rddCustUtil.scala:35 []
| /Data/ HadoopRDD[0] at textFile at rddCustUtil.scala:35 []
</pre>
- Не могли бы вы объяснить мне, сколько этапов перетасовки будет выполнено, поскольку он показывает 7 ShuffledRDD[4]?
- Не могли бы вы дать мне подробное объяснение ниже потока DAG?
- Эта операция дорогая?
.flatMap{ case (k, arr) => arr.zipWithIndex.filter(t => colIndices.contains(t._2)).map((k, _)) }
не сработает, предполагаяcolIndices = List(3,2,6,2,8)
? 05.01.2017