Thursday, March 19, 2015

Apache Spark checkpoint issue on windows

"To keep track of the log statistics for all of time, state must be maintained between processing RDD's in a DStream.

To maintain state for key-pair values, the data may be too big to fit in memory on one machine - Spark Streaming can maintain the state for you. To do that, call the updateStateByKey function of the Spark Streaming library.

First, in order to use updateStateByKey, checkpointing must be enabled on the streaming context. To do that, just call checkpoint on the streaming context with a directory to write the checkpoint data." (from http://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/total.html)

When you enable checkpointing for your streaming context with ssc.checkpoint(<PATH_TO_DIRECTORY>); you may get the error messages

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
...
Exception in thread "pool-8-thread-1" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)


Also, looking inside checkpoint directory you can find some files created at the moment of the stream processing, however these files are empty. These files store state information (act as checkpoint) without this data we cannot use updateStateByKey properly.

To solve this issue you need:



1) Download winutils.exe and save it on a local storage:

  • for Win32 (x86) you can find it using the links below:

https://repo.rrd-hadoop-win32.googlecode.com/archive/f54eb586ddb66d3a938033bf3d9272a832b8e201.zip
https://code.google.com/p/rrd-hadoop-win32/source/checkout
Download and extract the zip file and extract jar files as well

  • for Win64 (x64):

http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
Discussions about winutils on stackoverflow  and MSDN

It's very important to have corresponding version of winutills accordingly your OS, don't forget about it in production code including the condition which resolves version of winutils on the target platform.
If the version of the winutils isn't capable with the target platform you can get the following message:

CreateProcess error=216, This version of %1 is not compatible with the version of Windows you're running. Check your computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher

2) set environmental variable "HADOOP_HOME" to the folder which contains  bin/winutils.exe
- option A: use global environment variables. My Computer -> Properties -> Advanced system settings -> Environment variables
- option B: from your source code System.setProperty("hadoop.home.dir", <PATH_TO_WINUTILS>)

I hope this post will save somebody's time

4 comments:

  1. Hi,

    I could not find Win32 (x86) version of Winutils.exe on links published. Please help if you have them. I need them for Spark installation on Windows 7 32 bit machine.

    Thanks,
    Vineet

    ReplyDelete
  2. Thanks Victor, yes, you saved me some time! :)

    ReplyDelete
  3. Thanks a lot, It really saved my time.

    ReplyDelete
  4. Hi Dear,

    I like Your Blog Very Much..I see Daily Your Blog ,is A Very Useful For me.

    안전놀이터 추천 사설토토 검증사이트, 메이저놀이터 토토검문소입니다. "검문소 " 메이저놀이터 카지노 바카라사이트이며, 오직 안전하고 검증된 사설 토토사이트만 모았습니다. 먹튀검증이 전혀없는 인증이 완료된 업체들이며 먹튀시"【토토검문소】안전놀이터 추천 | 사설토토 검증사이트, 메이저놀이터" 전액보상해드립니다. 검증된 놀이터추천에 대해서는 오직 저희 검문소만이 가능하며, 절대 믿으셔도 됩니다

    Visit Here - 【토토검문소】안전놀이터 추천 | 사설토토 검증사이트, 메이저놀이터

    ReplyDelete