GSoC PROJECT ``Make OpenJDK LSB-compliant'' Pavel Shved 2009, Moscow. SHALLOW REPORT 2 I ran Linux appchecker, analyzed report, googled some information about issues encountered, the lsb mailing lists and bugzilla were also considered. I did NOT try to compile it with lsbcc. The general impression is that the issues are the very same as in Sun JRE 6 thread a year ago [1]. Nothing has changed. The following report covers JRE and JDK issues. Whenever an issue is JDK-specific, it's marked with ``JDK'' note at the beginning of the line. The following report is divided into problem sections as in ``LSB compliancy'' appchecker page. 1. ELF HEADER CHECK FAILED The problems with ELF headers haven't been taken into account. I hope they'll disappear magically as the proper LSB compiler will be run. 2. INCORRECT PROGRAM LOADER Everything's ok: ld-linux.so.2 issue. 3. NON-LSB ELF SECTION Just [[ignoring .debug_abbrev: Not in the LSB but not loaded]]. Doesn't matter. 4. NON-LSB LIBRARY USED 1) libnsl.so.1 -- just drop -lnsl from cc's command line [2]; libasproc 2) libthread_db.so -- thread debugging library. Only Sun's solaris manpage [3]; However, thread_db.h states that: /* This is the debugger interface for the NPTL library. It is modelled closely after the interface with same names in Solaris with the goal to share the same code in the debugger. */ I doubt that will be included in standard. Appchecker reports nothing on it. So, it can be shipped as static (but more careful review needed). 5. NON-LSB INTERFACE USED A plenty of them. - FT_GlyphSlot_Embolden FT_GlyphSlot_Oblique [[This interface is marked alpha/due to change. Do not use yet.]] Still true. No documentation in official packages. - _Unwind_GetIPInfo from libgcc library. That's a new interface in C++ support library introduced between 4.2.1 and 4.3.3 (checked my gentoo's files), in feb 2006. See appendix A for gcc's changelog's quote. - ___tls_get_addr supports TLS mechanism that originates from IA64 and also gained support on x86+64. Is a non-portable C extension described in [5], supported by all LSB archs except for ppc32/64; The funny thing is that that ELF sections for TLS support are in standard since 3.0. The even more funny thing is that these functions should be supported by dynamic linker, which is not ld-linux.so on LSB. Bugs 992,993 are related. - __open_2 glibc 2.7 new interface. Yet another internal interface with wrapper (I think). Should be solved by lsbcc. - __pthread_register_cancel __pthread_unregister_cancel take part in implementations of LSB's pthread_cleanup_push and pthread_cleanup_pop. Candidate; solved by lsbcc. - backtrace backtrace_symbols bug 2187 [6] claimed it doesn't have any documentation, but now it does (as of linux man-pages 3.21 release). LSB 4.1? - dl_iterate_phdr bug 1739 [7] and conf call notes for June 11, 2008 suggest to add the interface. LSB 4.1. - dlvsym i really don't understand why we have symversion support in LSB database, but don't allow apps to check symbol's version through glibc routines. Propose LSB 4.1. Specified well enough to use it, but for standard one line in manpage is insufficiet; easilly supplementable. Ofc, it may be hacked through (as in libchk), but not having it is nonsence imo. Jeff: discussion is to be initiated. - endmntent getmntent_r setmntent mntent = ``mount entry''. Routines to scan /etc/fstab file. Probably, the reason for it is portability with other OSes, don't really know. Anyway, it does have man page. 60+ apps use it. 4.1/inline. Most likey 4.1. LSB exists for such interfaces to be taken into account. ``Don't parse /ets/fstab directly, use Interfaces!'' says LSB to ISVs. - gnu_get_libc_release gnu_get_libc_version LSB specifies their return values :-) - madvise can be replaced with posix_madvise unless 2.6.16+ kernel behavior is used, which is unlikely. - memalign posix_memalign - mincore needed by libnio.so. Source: jdk/src/solaris/native/java/nio/MappedByteBuffer.c . Used in function Java_java_nio_MappedByteBuffer_isLoaded0 that returns whether all pages are loaded. From mailing lists (reference manual confirms this): The return value of is_load is just a hint, rather than a guarantee. In windows, MappedByteBuffer.isLoaded() will always return false since it does not support this operation. While in linux, there exists corresponding api to support this functionality. So, the possible solution is to load it dynamically. - pthread_getattr_np non-portable. Source files: hotspot/src/os_cpu/linux_sparc/vm/os_linux_sparc.cpp and hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp. Results obtained may be calculated without use of the function, though it's not very easy. This function is absolutely required. Java explicitely fails immediately after pthread_getattr_np returned non-zero code. getstackaddr/getstacksize functions? Pthread has them but they are merely accessors to `attr' parameter, which it portably abailable via pthread calls only to current thread. libasproc - ptrace read "size" bytes of data from "addr" within the target process. Like wine -- see ptrace's LSB bug. - snd_* Lots of ALSA interfaces. - snd_rawmidi_info snd_rawmidi_get_card not even in trial use, but are easilly inlineable. No, not inline -- accessors to opaque type. dlopen. - syscall syscall(__NR_fstatat64) - fstatat (posix.1-2008). OK/inline -- __fxstatat64 is in LSB !!! syscall(SYS_gettid) - used to get `kernel thread id' to access proc 1) hotspot/src/os/linux/vm/os_linux.cpp a) access proc to determine whether we're in `unsafe chroot' -- we can avoid it by just checking for proc b) print /proc/pid/maps just to dump them when error occured c) NOTHING more, only debug dumps of sorts. So, it can be easilly replaced with the other unique-number-generation mechanism. pthread_self(); syscalls to fork/execve - old issue Appears in [hotspot/src/os/linux/vm/os_linux.cpp], subroutine os::fork_and_exec. More info on it -- see appendix D. syscall(SYS_clock_getres, x, y) - clock_getress, in LSB already. VVVV the issues below are in solaris files (non-compileable injection doesn't cause problems on linux system) syscall(SYS_nanosleep) - nanosleep() syscall(SYS_ioctl) syscall(SYS_sigprocmask) ^^^^ - sysinfo Used only in ONE place! To provide implementation for one of sun.com.management classes (that gets system's swap information). See ./solaris/native/com/sun/management/UnixOperatingSystem_md.c IMHO - add. Because, well, that's the very interface you should not dig in /proc because of. JDK - td_* libthread_db interfaces. See section 4 of this report. 6. UNEXPECTED INTERFACE VERSION - pread Current glibc's too new. Will be fixed by lsbcc. 7. DEPRECATED INTERFACE USED - gethostbyaddr gethostbyaddr_r gethostbyname gethostbyname_r no problems. The thing is that I am not a pro in these socket mechanics and I postponed it but didn't took it in time. - strerror_r Just use underlying implementation instead of this convinience function. No problem then. - tempnam used in unpacker for logging. Hm, symlink security hole. Minor issue, easy fix. So, deprecated interfaces are not an issue. 8. IOCTL() ENCOUNTERED lsbcc compiles all ioctls, so, I think, flags are ok. 9. DLOPEN() ENCOUNTERED DLopens were reviewed and they do not do harm. Some system components are dlopened, but then Java throws an exception and doesnt fail in a hard way. Most dlopens are about dlopening components of java itself. If such a dlopen fails, java crashes. For it not to happen, all such components must persist (that's ok), and should dlopen correctly and yield no runtime-errors after lazy dlopen succeeds. I had to fix dlopening of sound subsystem as It could fail and crash java, because of ALSA absence in the system. Other components are lsb-compliant and successull dlopening of them is guaranteed. 10. NON-LSB COMMAND USED - unzip Used only in bootstrap. My mistake that I included it. 11. INCLUDE ERROR No issues. 12. OTHER PROBLEMS Dynamic tag and section sizes. Should try lsbcc. 13. PROC ISSUES /proc/self/fd - get list of opened file descriptors and close them [./test/java/nio/channels/spi/SelectorProvider/inheritedChannel/Launcher.c] implementation of com.sun.management.UnixOperatingSystem [8], getOpenFileDescriptorCount. [./src/solaris/native/com/sun/management/UnixOperatingSystem_md.c] Fix: run from 0 to sysconf(_SC_OPEN_MAX) and N+=(fstat(i)!=-EBADF); Something that can't be dropped: close all descriptors. (Uses magic inside :-D ) [./src/solaris/native/java/lang/UNIXProcess_md.c]. Has fallback -- like described above. /proc/asound/version - get alsa version description string We may replace it with something like `0.9', if we do want it so much. [src/solaris/native/com/sun/media/sound/PLATFORM_API_LinuxOS_ALSA_PCMUtils.c] [src/solaris/native/com/sun/media/sound/PLATFORM_API_LinuxOS_ALSA_MidiUtils.c] /proc/self/stat - to get current process' virtual memory size in bytes. implementation of [8], getCommittedVirtualMemorySize(). May be dropped (?). [src/os/linux/vm/os_linux.cpp] - use as pointer. Has fallback. /proc - iterate through all processes invoking callback for each - replace with ps command. [./src/solaris/native/sun/tools/attach/LinuxVirtualMachine.c] /proc/net/if_inet6 - to determine whether IPv6 is supported /* Since we have initialized and loaded the Socket library we will check now to whether we have IPv6 on this platform and if the supporting socket APIs are available */ [./src/solaris/native/java/net/net_util_md.c] and [./src/solaris/native/java/net/net_util.c] The function IPv6_supported() is used in the same files. But further it's called from many-many places. So it's a needful stuff. IPv6 may be checked via `socket' call and handling an error there. But well... LSB DOES support IPv6. So this function should just return true. /proc/net/if_inet6 - to get loopback int's `internal' ID. WHY??? -- Bug workaround. Hm, need to check. Some shit I can't figure out is for. [./src/solaris/native/java/net/net_util_md.c] /proc/net/ipv6_route - Get routes and interface names get routes. Use netstat -r -6 instead (ok; maybe easier?) Ok, LSB says getting routes is evil. So i must get rid of this. These routes are gotten for IPv6 mess with scope_id stuff. Suggest to leave them alone. [./src/solaris/native/java/net/net_util_md.c] EHM, wait. Java machine indeed does perform the routing for link-local addresses??? Did i miss something???? /proc/version - kernel version if uname failed [./src/solaris/native/java/net/PlainDatagramSocketImpl.c ] /proc/self/exe - symlink to current executable has a fallback interface implemented right there (reading argv[0]) [./src/solaris/bin/java_md.c] /proc/self/maps [hotspot/src/os/linux/vm/os_linux.cpp] ``Find the virtual memory area that contains addr'' -- concerns getting pointer to current thread's stack top -- bug workaround for getattrnp for initial thread. Has fallback. [same] print_dll_info - just prints /proc/meminfo [hotspot/src/os/linux/vm/os_linux.cpp] get page size, has fallback SOURCE CODE REPORT NAME_MAX pathconf(_PC_NAME_MAX) getsockopt(-, -, SO_PEERCRED...) [hotspot/src/os/linux/vm/attachListener_linux.cpp] struct ucred cred_info; getsockopt(s, SOL_SOCKET, SO_PEERCRED, (void*)&cred_info, &optlen) if (cred_info.uid != euid || cred_info.gid != egid) { ... } see http://www.welz.org.za/notes/on-peer-cred.html This stuff is unspecified. It also relies on non-LSB `struct ucred' type. Apeeared before 2004 (acc to man-pages changelog). Add? /home/pavel/openjdk/lsbcc-jdk/hotspot/src/os/linux/vm/os_linux.cpp 1) alloca a) grow stack of a thread b) `randomize' cache when starting a new thread - oh fuck, java go to hell - kernel will do it. MAP_NORESERVE flag for mmap probably, LSB 4.1. This flag doesn't reserve virtual memory. if (sig > 0 || sig < _NSIG) Well... is it a bug? %-) _NSIG -> NSIG and || -> &&, and it will be OK. uc->uc_mcontext.gregs[REG_TRAPNO] REG_SP REG_PC REG_FP these regs are undefined in LSB. IPV6_CHECKSUM flag for setsockopt file: jdk/src/solaris/native/java/net/Inet6AddressImpl.c Seems like in new kernels this call will fail with EINVAL, because it violates RFC somehow. See [12]. MAXHOSTNAMELEN constant something like NAME_MAX -- no. gethostname returns error that length is insufficient, and we double the size of buffer. RTLD_NOLOAD flag for dlopen non-posix. File: jdk/src/solaris/native/sun/security/pkcs11/j2secmod_md.c Imho add. as well as another dlopen flag. Useful! IFHWADDRLEN in memcpy(buf, &ifr.ifr_hwaddr.sa_data, IFHWADDRLEN); in [ src/solaris/native/java/net/NetworkInterface.c ] length of ifr_hwaddr.sa_data in bytes. LSB supports fetching of this address (SIOCGIFHWADDR ioctl option), but doesn't specify this flag, omg! Fix it. mess with epoll events jdk/src/solaris/native/sun/nio/ch/EPoll* MSG_FIN /* Initiate graceful shutdown process. */ [ jdk/src/solaris/native/sun/nio/ch/Sctp.h ] Close thread in a JDK's SCTP implementation. Some MSG_ constants are specified in posix. Their values are arch-independent. This constant is within a mere copy-and-paste from third party library libsctp. There SCTP_EOF is equal to MSG_FIN. Don't know why, but the number of MSG_FIN is passed directly to kernel, and it maintains its compatibility obviously. So, I'd replace MSG_FIN with 0x200 as Java would need not only to be recompiled, if this value changes, but OpenJDK's code will have to be modified! Therefore, it's sane to inline the value. Not in IcedTea6. CONCLUSION Some less major issues concern debugging interfaces. LSB should have them I think as DEs should provide debugging facilities. However, that's not the main goal. Severity of FT_* issues is not high although building freetype is a tricky process involving tricky mechanisms (ask Mats ;)), so i still haven't succeeded in it. The other issues are not tricky and require mere labor. Though some solutions (such as pthread_getattr_np replacement with intermally managed structure of threads' parameters) are never going to be accepted by JDK developers and have not much sense TBH, compared with LSBing these interfaces. A debugger question is still under consideration. What called JRE contains debugging backend (see appendix B) - libasproc. So, the original estimation of project's complexity still holds. However, due to more extensive use of proc,ptrace and dlopen, initial investigation stage may be extended. FUTURE Add notes about required interfaces for 4.1 into LSB bugzilla. APPENDIX A A quote from GCC's Source changeLog files. 2006-02-27 Jakub Jelinek PR other/26208 <...> (_Unwind_GetIPInfo): New function. * unwind-dw2.h (_Unwind_FrameState): Add signal_frame field. * unwind-c.c (PERSONALITY_FUNCTION): Use _Unwind_GetIPInfo instead of _Unwind_GetIP. * unwind-sjlj.c (_Unwind_GetIPInfo): New function. * unwind-generic.h (_Unwind_GetIPInfo): New prototype. * unwind-compat.c (_Unwind_GetIPInfo): New function. * libgcc-std.ver (_Unwind_GetIPInfo): Export @@GCC_4.2.0. * config/ia64/unwind-ia64.c (_Unwind_GetIPInfo): New function. * config/arm/unwind-arm.h (_Unwind_GetIPInfo): Define. APPENDIX B - Debugger In makefile it is said that libsaproc is under echo "Making SA debugger back-end..." The libsaproc is solely responsible for libthread_db ptrace Check this out: http://www.docjar.org/docs/api/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.html for LinuxDebuggerLocal But the serviceability agent is also tied into implementation of sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal class. That may be an obstacle for JCK certification. But anyway, it'll be loading dynamically. APPENDIX C - Operating system info and OperatingSystemMXBean interfaces There is a couple of interfaces (their implementation is available via singleton object) that aim to get properties of OS that runs Java machine. Interface OperatingSystemMXBean [8]: long getCommittedVirtualMemorySize() ( !!! ) Returns the amount of virtual memory that is guaranteed to be available to the running process in bytes, or -1 if this operation is not supported. < /proc/self/stat > Always return -1 ??? long getFreePhysicalMemorySize() Returns the amount of free physical memory in bytes. < sysconf > - non-POSIX, but LSB-ok. long getFreeSwapSpaceSize() ( !!! ) Returns the amount of free swap space in bytes. < sysinfo > -- NOT in LSB! long getProcessCpuTime() Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds. < times > OK long getTotalPhysicalMemorySize() Returns the total amount of physical memory in bytes. < sysconf > - non-POSIX, but LSB-ok. long getTotalSwapSpaceSize() ( !!! ) Returns the total amount of swap space in bytes. same as getFreeSwapSpaceSize(). Interface UnixOperatingSystemMXBean [9]: long getMaxFileDescriptorCount() Returns the maximum number of file descriptors. < getrlimit > - OK long getOpenFileDescriptorCount() Returns the number of open file descriptors. < reads /proc/self/fd > Workarounded. APPENDIX D. Where do we use notorious fork_and_exec calls (they're claimed to be available even in signals code) [hotspot/src/os/linux/vm/vmError_linux.cpp] VM pleasantly runs the debugger for us! How cute =). Unfortunately LSB does not support cute apps. :-( [hotspot/src/share/vm/runtime/os.hpp] Declaration of this function actually. [hotspot/src/share/vm/runtime/thread.cpp] Only takes effect when `KERNEL' configurational variable is defined. Seems not to be defined on Linux [grepping makefiles], but additional checking is required. That's what it calls: %s/bin/java %s -Dkernel.background.download=false sun.jkernel.DownloadManager -download client_jvm [hotspot/src/share/vm/utilities/vmError.cpp] App is crashing and dying, the commands are executed at the time of crash. They may be useful, for example mailing logs to sysadmin's mail/ That's ... ALL??? Why so much noise then? Of course, in the future it may and etc etc, but still... damn. Ah, got it. In emergency circumstances calling atfork_handlers may crash our app even more. Okay, as to replace them with usual fork()-exec(). As I understood, the problem with fork is (from pthread_atfork rationale) that (a) some non-async-safe external routines may be called in child process or (b) that the child hangs up due to waiting for mutex owned by the other thread, which is not forked because only one thread actually forks. All these problems are really irrelevant when doing an exec() immediately after fork() and doing an _exit() immediately after exec(). What is relevant is the possible misbehaving of pthread_atfork handlers. However, OpenJDK does NOT use this routine at all! (proved by grep) So, we replace these cumbersome syscalls with mere fork/exec. Just to be sure: see man pthread_atfork, and [10]. But the problem is in emergency. Java devs mailed. LINKS [1] Thread where the report over Sun JRE 6 compliancy was posted and discussed https://lists.linux-foundation.org/pipermail/lsb-discuss/2008-April/004936.html [2] Drop -lnsl flag to compile without libnsl.so.1 (in russian) http://www.opennet.ru/openforum/vsluhforumID9/6929.html [3] Solaris manpage for libthread_db.so http://docs.sun.com/app/docs/doc/817-0680/6mgfc6njs?a=view [4] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27880 [5] Gcc manual for Thread-Local Storage (TLS) http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Thread_002dLocal.html#Thread_002dLocal [6] bug 2187 http://bugs.linuxbase.org/show_bug.cgi?id=2187 [7] bug 1739 http://bugs.linuxbase.org/show_bug.cgi?id=1739 [8] JRE 6 doc http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html [9] JRE 6 docs http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html [10] comp.programming.threads FAQ http://www.lambdacs.com/cpt/FAQ.html [11] [12] Thread of kernel devs http://marc.info/?l=linux-netdev&m=120851487828101&w=2