To maintain state for key-pair values, the data may be too big to fit in memory on one machine - Spark Streaming can maintain the state for you. To do that, call the updateStateByKey function of the Spark Streaming library.
First, in order to use updateStateByKey, checkpointing must be enabled on the streaming context. To do that, just call checkpoint on the streaming context with a directory to write the checkpoint data." (from http://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/total.html)
When you enable checkpointing for your streaming context with ssc.checkpoint(<PATH_TO_DIRECTORY>); you may get the error messages
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
...
Exception in thread "pool-8-thread-1" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
Also, looking inside checkpoint directory you can find some files created at the moment of the stream processing, however these files are empty. These files store state information (act as checkpoint) without this data we cannot use updateStateByKey properly.
To solve this issue you need:
1) Download winutils.exe and save it on a local storage:
- for Win32 (x86) you can find it using the links below:
https://repo.rrd-hadoop-win32.googlecode.com/archive/f54eb586ddb66d3a938033bf3d9272a832b8e201.zip
https://code.google.com/p/rrd-hadoop-win32/source/checkout
Download and extract the zip file and extract jar files as well
- for Win64 (x64):
http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
Discussions about winutils on stackoverflow and MSDN
It's very important to have corresponding version of winutills accordingly your OS, don't forget about it in production code including the condition which resolves version of winutils on the target platform.
If the version of the winutils isn't capable with the target platform you can get the following message:
CreateProcess error=216, This version of %1 is not compatible with the version of Windows you're running. Check your computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher
2) set environmental variable "HADOOP_HOME" to the folder which contains bin/winutils.exe
- option A: use global environment variables. My Computer -> Properties -> Advanced system settings -> Environment variables
- option B: from your source code System.setProperty("hadoop.home.dir", <PATH_TO_WINUTILS>)
I hope this post will save somebody's time
Hi,
ReplyDeleteI could not find Win32 (x86) version of Winutils.exe on links published. Please help if you have them. I need them for Spark installation on Windows 7 32 bit machine.
Thanks,
Vineet
Thanks Victor, yes, you saved me some time! :)
ReplyDeleteThanks a lot, It really saved my time.
ReplyDeleteHi Dear,
ReplyDeleteI like Your Blog Very Much..I see Daily Your Blog ,is A Very Useful For me.
안전놀이터 추천 사설토토 검증사이트, 메이저놀이터 토토검문소입니다. "검문소 " 메이저놀이터 카지노 바카라사이트이며, 오직 안전하고 검증된 사설 토토사이트만 모았습니다. 먹튀검증이 전혀없는 인증이 완료된 업체들이며 먹튀시"【토토검문소】안전놀이터 추천 | 사설토토 검증사이트, 메이저놀이터" 전액보상해드립니다. 검증된 놀이터추천에 대해서는 오직 저희 검문소만이 가능하며, 절대 믿으셔도 됩니다
Visit Here - 【토토검문소】안전놀이터 추천 | 사설토토 검증사이트, 메이저놀이터
토토사이트 안전놀이터 사설토토사이트 검증된 토토사이트를 엄선하여 안전한 메이저업체만 추천드립니다 한번 둘러보시고 선택해보세요.
ReplyDelete사설토토사이트 사설배팅사이트
토토사이트 안전놀이터 사설토토사이트 검증된 토토사이트를 엄선하여 안전한 메이저업체만 추천드립니다 한번 둘러보시고 선택해보세요.
ReplyDelete토토사이트 메이저사이트
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training
ReplyDeleteVery nice blog with very nice explanation.
ReplyDeletekeep sharing more blogs with us,Thank you...
big data hadoop course
big data hadoop training
big data online course