Sunday, April 04, 2010

Troubleshooting : process scheduler and distribution

When debugging process scheduler (or AppServer), we need to be sure about the certification matrix. It might appear basic advice, but it is very often forgotten. The certification matrix assure that the combination of component must work.
Note I said "must work", and not the other way around, if a combination is not certified, it does not necessarily mean it does not work, but at least if it is certified it must work.

The first two components to check are the OS wordsize and the database client version (32 or 64 bit). Depending of your Peopletools version, this could be very different, from a certification point of view as well as from a working point of view.
And this two points are not exclusive to the process scheduler, but are also valid for AppServer, that's why tests below could also contain example with AppServer as well as Prcs.

1. The OS wordsize
I don't have any troubleshoot tests cases, but I assume it must be easy to be respectful of this constraint.

Two situations Peopletools 8.49 :
=> Before Peopletools 8.49.14 : Linux 32bit only is certified
=> From Peopletools 8.49.14+ : Linux 64bit is also certified

Only one certification path with Peopletools 8.50 :
=> Peopletools 8.50.xx : 64bit only is certified (regardless the OS, Unix, Linux and Windows)

2. The database client libraries
2.1 Peopletools 8.49 (and below)
Whatever the wordsize of the OS the Peopletools are running on, it must use the Oracle 32bit libraries.
Here we can distinguished two cases, before Oracle 11gR2 and from 11gR2.
2.1.1 Before 11gR2
Here a test case with Peopletools 8.49.20 working with Oracle 11.1.0.7 64-bit on Linux OEL 5.3 64-bit. The test below is being with AppServer, but this is exactly the same with Process Scheduler.
[oracle@orion2:/apps/oracle/product/11.1.0/bin]$ file oracle
oracle: setuid setgid ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped
[oracle@orion2:/apps/oracle/product/11.1.0/bin]$
----------------------------------------------
Quick-configure menu -- domain: DMOHRMS9
----------------------------------------------
Features Settings
========== ==========
1) Pub/Sub Servers : Yes 15) DBNAME :[DMOHRMS9]
2) Quick Server : No 16) DBTYPE :[ORACLE]
3) Query Servers : No 17) UserId :[PS]
4) Jolt : Yes 18) UserPswd :[PS]
5) Jolt Relay : No 19) DomainID :[DMOHRMS9]
6) WSL : Yes 20) AddToPATH :[/apps/oracle/product/11.1.0]
7) PC Debugger : No 21) ConnectID :[people]
8) Event Notification: No 22) ConnectPswd:[peop1e]
9) MCF Servers : No 23) ServerName :[]
10) Perf Collator : No 24) WSL Port :[7000]
11) Analytic Servers : No 25) JSL Port :[9000]
12) Domains Gateway : No 26) JRAD Port :[9100]
...
Booting server processes ...

exec PSWATCHSRV -A -- -ID 114790 -C psappsrv.cfg -D DMOHRMS9 -S PSWATCHSRV :
process id=8362 ... Started.
exec PSAPPSRV -s@../psappsrv.lst -s@../psqcksrv.lst -sICQuery -sSqlQuery:SqlRequest -- -C psappsrv.cfg -D DMOHRMS9 -S PSAPPSRV :
CMDTUX_CAT:1685: ERROR: Application initialization failure

tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler

exec tmshutdown -qy

==============ERROR!================
Boot attempt encountered errors!. Check the TUXEDO log for details.
==============ERROR!================

Do you wish to see the error messages in the APPSRV.LOG file? (y/n) [n] :y

PSADMIN.8321 (0) [04/04/10 16:12:14](0) Begin boot attempt on domain DMOHRMS9
PSWATCHSRV.8362 (0) [04/04/10 16:12:22] Checking process status every 120 seconds
PSWATCHSRV.8362 (0) [04/04/10 16:12:22] Server started
PSAPPSRV.8363 (0) [04/04/10 16:12:22](0) PeopleTools Release 8.49.20 (Linux) starting
PSAPPSRV.8363 (0) [04/04/10 16:12:23](0) Cache Directory being used: /apps/psoft/hrms9/appserv/DMOHRMS9/CACHE/PSAPPSRV_1/
PSAPPSRV.8363 (0) [04/04/10 16:12:23](1) GenMessageBox(200, 0, M): PS General SQL Routines: Missing or invalid version of SQL library libpsora (200,0)
PSAPPSRV.8363 (0) [04/04/10 16:12:23](1) GenMessageBox(0, 0, M): Database Signon: Could not sign on to database DMOHRMS9 with user PS.
PSAPPSRV.8363 (0) [04/04/10 16:12:23](0) Server failed to start
PSWATCHSRV.8362 (0) [04/04/10 16:12:24] Shutting down
PSADMIN.8321 (0) [04/04/10 16:12:30](0) End boot attempt on domain DMOHRMS9

Do you wish to see the error messages in the TUXLOG.040410 file? (y/n) [n] :
We can check the stderr file from the domain we are trying to start :
[hrms9@orion2:/apps/psoft/hrms9/appserv/DMOHRMS9]$ more stderr
dlopen in libpscompat failed for 'libpsora.so': libclntsh.so.9.0: cannot open shared object file: No such file or directory
As suggested in the certification matrix on My Oracle Support, a symbolic link must be created to workaround this. The Oracle is a 64-bit, but we can use the lib32 libraries :
[oracle@orion2:/apps/oracle/product/11.1.0/lib]$ ln -s /apps/oracle/product/11.1.0/lib32/libclntsh.so.11.1 libclntsh.so.9.0
And restart the AppServer works fine :
Booting server processes ...

exec PSWATCHSRV -A -- -ID 114790 -C psappsrv.cfg -D DMOHRMS9 -S PSWATCHSRV :
process id=8460 ... Started.
exec PSAPPSRV -s@../psappsrv.lst -s@../psqcksrv.lst -sICQuery -sSqlQuery:SqlRequest -- -C psappsrv.cfg -D DMOHRMS9 -S PSAPPSRV :
process id=8462 ... Started
....
15 processes started.
2.1.2 With 11gR2
Here a test case with Peopletools 8.49.20 working with Oracle 11.2.0.1 64-bit on Linux OEL 5.3 64-bit.
Booting admin processes ...

exec BBL -A :
process id=8992 ... Started.

Booting server processes ...

exec PSMSTPRC -A -- -C psprcs.cfg -CD DMOHRMS9 -PS PSUNX -A start -S PSMSTPRC :
CMDTUX_CAT:1685: ERROR: Application initialization failure

tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler

exec tmshutdown -y

Shutting down all admin and server processes in /apps/psoft/hrms9/appserv/prcs/DMOHRMS9/PSTUXCFG

Shutting down server processes ...


Shutting down admin processes ...

Server Id = 0 Group Id = orion2.phoenix-nga Machine = orion2.phoenix-nga: shutdown succeeded
1 process stopped.
And the stderr file :
[hrms9@orion2:/apps/psoft/hrms9/appserv/prcs/DMOHRMS9]$ more stderr
dlopen in libpscompat failed for 'libpsora.so': libclntsh.so.9.0: cannot open shared object file: No such file or directory
Within Oracle 11gR2 64-bit, there is no 32-bit libraries (as I described here), so we can try the symbolic link as above, but it will not work :
[oracle@orion2:/apps/oracle/product/11.2.0/lib]$ ln -s libclntsh.so.11.1 libclntsh.so.9.0
...
Booting server processes ...

exec PSWATCHSRV -A -- -ID 111990 -C psappsrv.cfg -D DMOHRMS9 -S PSWATCHSRV :
process id=7625 ... Started.
exec PSAPPSRV -s@../psappsrv.lst -s@../psqcksrv.lst -sICQuery -sSqlQuery:SqlRequest -- -C psappsrv.cfg -D DMOHRMS9 -S PSAPPSRV :
CMDTUX_CAT:1685: ERROR: Application initialization failure

tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler

exec tmshutdown -qy

==============ERROR!================
Boot attempt encountered errors!. Check the TUXEDO log for details.
==============ERROR!================

Do you wish to see the error messages in the APPSRV.LOG file? (y/n) [n] :y

PSADMIN.7615 (0) [04/04/10 16:00:30](0) Begin boot attempt on domain DMOHRMS9
PSWATCHSRV.7625 (0) [04/04/10 16:00:38] Checking process status every 120 seconds
PSWATCHSRV.7625 (0) [04/04/10 16:00:38] Server started
PSAPPSRV.7626 (0) [04/04/10 16:00:39](0) PeopleTools Release 8.49.20 (Linux) starting
PSAPPSRV.7626 (0) [04/04/10 16:00:39](0) Cache Directory being used: /apps/psoft/hrms9/appserv/DMOHRMS9/CACHE/PSAPPSRV_1/
PSAPPSRV.7626 (0) [04/04/10 16:00:39](1) GenMessageBox(200, 0, M): PS General SQL Routines: Missing or invalid version of SQL library libpsora (200,0)
PSAPPSRV.7626 (0) [04/04/10 16:00:39](1) GenMessageBox(0, 0, M): Database Signon: Could not sign on to database DMOHRMS9 with user PS.
PSAPPSRV.7626 (0) [04/04/10 16:00:39](0) Server failed to start
PSWATCHSRV.7625 (0) [04/04/10 16:00:40] Shutting down
PSADMIN.7615 (0) [04/04/10 16:00:46](0) End boot attempt on domain DMOHRMS9

Do you wish to see the error messages in the TUXLOG.040410 file? (y/n) [n] :
...
[hrms9@orion2:/apps/psoft/hrms9/appserv/DMOHRMS9]$ more stderr
dlopen in libpscompat failed for 'libpsora.so': libclntsh.so.9.0: wrong ELF class: ELFCLASS64
As expected, that does not work, the 64-bit libraries are not accepted by Peopletools 8.49.xx.
A real client 32-bit must be installed, and linked to the AppServer and Process Scheduler.
After that install, reconfigure the domain, everything is started :
----------------------------------------------
Quick-configure menu -- domain: DMOHRMS9
----------------------------------------------
Features Settings
========== ==========
1) Pub/Sub Servers : Yes 15) DBNAME :[DMOHRMS9]
2) Quick Server : No 16) DBTYPE :[ORACLE]
3) Query Servers : No 17) UserId :[PS]
4) Jolt : Yes 18) UserPswd :[PS]
5) Jolt Relay : No 19) DomainID :[DMOHRMS9]
6) WSL : Yes 20) AddToPATH :[/apps/oracle/product/11.2.0_client_32bit]
7) PC Debugger : No 21) ConnectID :[people]
8) Event Notification: No 22) ConnectPswd:[peop1e]
9) MCF Servers : No 23) ServerName :[]
10) Perf Collator : No 24) WSL Port :[7000]
11) Analytic Servers : No 25) JSL Port :[9000]
12) Domains Gateway : No 26) JRAD Port :[9100]

Actions
=========
13) Load config as shown
14) Custom configuration
h) Help for this menu
q) Return to previous menu

HINT: Enter 15 to edit DBNAME, then 13 to load

Enter selection (1-26, h, or q):
Booting server processes ...

exec PSWATCHSRV -A -- -ID 184396 -C psappsrv.cfg -D DMOHRMS9 -S PSWATCHSRV :
process id=7907 ... Started.
exec PSAPPSRV -s@../psappsrv.lst -s@../psqcksrv.lst -sICQuery -sSqlQuery:SqlRequest -- -C psappsrv.cfg -D DMOHRMS9 -S PSAPPSRV :
process id=7908 ... Started.
...
15 processes started.
2.2 Peopletools 8.50
2.2.3 Linux

On Linux, Peopletools 8.50 is fully certified to run on Oracle 64-bit client. At least on Linux, there is no issue anymore, 32 or 64-bit client should be ok. No need anymore the symbolic link workaround described above for previous Peopletools version, and no extra client need.

2.2.2 Windows
On Windows, however, it is a bit confusing. Once again, Peopletools 8.50 (Apps/Batch) are certified only on 64-bit OS server. But, the database client must be a 32-bit.
Let's see.
=> Oracle client 64bit libraries
Here a test with Peopletools 8.50.02, start process scheduler on W2k8 64bit, and Oracle 10.2.0.4 64-bit.
Booting admin processes ...

exec BBL -A :
process id=2448 ... Started.

Booting server processes ...

exec PSMSTPRC -o ".\LOGS\stdout" -e ".\LOGS\stderr" -A -- -CD h92tmplt -PS PSNT -A start -S PSMSTPRC :
CMDTUX_CAT:1685: ERROR: Application initialization failure
tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler

tmshutdown -y
Shutting down all admin and server processes in F:\appl\pt85002\appserv\prcs\h92tmplt\PSTUXCFG
Shutting down server processes ...

Shutting down admin processes ...
Server Id = 0 Group Id = ANTLIA Machine = ANTLIA: shutdown succeeded
1 process stopped.
As expected, that does not start and as usual, first file to check, stderr :

LoadLibraryA() in pscompat.dll failed for 'PSORA.dll': reason=126
Well, not very helpful message, but seeing closer the configuration, the given DBBIN did not includ the \bin directory. It should, let's change it :
Do you want to change any values (y/n)? [n]:y
PrcsServerName [PSNT]:
DBBIN [F:\appl\oracle\10.2.0.4\server_64bit]:F:\appl\oracle\10.2.0.4\server_64bit\bin
Max Reconnect Attempt [12]:

New error on prcs start, stderr file is now :
LoadLibraryA() in pscompat.dll failed for 'PSORA.dll': reason=193
Not helpful neither to define that's a client wordsize error...

=> Oracle client 32bit libraries
Let's change now the client to a 32-vit one installed onto the same server :

...
Do you want to change any values (y/n)? [n]:y
PrcsServerName [PSNT]:
DBBIN [F:\appl\oracle\10.2.0.4\server_64bit\BIN]:F:\appl\oracle\10.2.0.4\client_32bit\BIN
Max Reconnect Attempt [12]:
Reconnection Interval [300]:
...
And boot start process scheduler :
Booting admin processes ...

exec BBL -A :
process id=1152 ... Started.

Booting server processes ...

exec PSMSTPRC -o ".\LOGS\stdout" -e ".\LOGS\stderr" -A -- -CD h91tmplt -PS PSNT -A start -S PSMSTPRC :
process id=2476 ... Started.
exec PSAESRV -o ".\LOGS\stdout" -e ".\LOGS\stderr" -- -CD h91tmplt -S PSAESRV :
process id=2244 ... Started.
...
8 processes started.
Ok, it is started now.

This first part could be very disapointing sometimes and time spending, until to be very cautious before starting a Peoplesoft installation.

Note : all the tests below have been done from a W2k8 64bit workstation, on Peopletools 8.50.02 , Peoplesoft OVM database server (Peopletools 8.50.02/HCM9.1), Peoplesoft OVM App/Batch server (Peopletools 8.50.02) and Peoplesoft OVM PIA server (Peopletools 8.50.02).

3. The user to be used
A very basic user as been created onto the database (through the front end application, no role, no permission).
And a new process scheduler created and configured to be started with that user.
The prcs failed to start :
------------------------------------------------------------
Quick-configure menu -- Scheduler for Database: h91tmplt
------------------------------------------------------------
Features Settings
========== ==========
1) Master Schdlr : Yes 5) DBNAME :[h91tmplt]
2) App Eng Server : Yes 6) DBTYPE :[ORACLE]
7) PrcsServer :[PSUNX]
8) UserId :[user_prcs]
9) UserPswd :[user_prcs]
10) ConnectID :[people]
11) ConnectPswd:[peop1e]
12) ServerName :[]
Actions 13) Log/Output Dir:[%PS_SERVDIR%/log_output]
========= 14) SQRBIN :[%PS_HOME%/bin/sqr/%PS_DB%/bin]
3) Load config as shown 15) AddToPATH :[%PS_HOME%/cblbin]
4) Custom configuration
h) Help for this menu
q) Return to previous menu
...
Starting Process Scheduler Server PSUNX for Database h91tmplt ...

Booting all admin and server processes in /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/PSTUXCFG
INFO: Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level (none)

Booting admin processes ...

exec BBL -A :
process id=1970 ... Started.

Booting server processes ...

exec PSMSTPRC -o ./LOGS/stdout -e ./LOGS/stderr -A -- -CD h91tmplt -PS PSUNX -A start -S PSMSTPRC :
CMDTUX_CAT:1685: ERROR: Application initialization failure

tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler

exec tmshutdown -y

Shutting down all admin and server processes in /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/PSTUXCFG

Shutting down server processes ...

Shutting down admin processes ...

Server Id = 0 Group Id = psovmab.phoenix.nga Machine = psovmab.phoenix.nga: shutdown succeeded
1 process stopped.
Unfortunately, in that case either, nothing helpful from the log files :
[psadm2@psovmab ps]$ cd pt/8.50/appserv/prcs/h91tmplt/
[psadm2@psovmab h91tmplt]$ ls
Archive CACHE LOGS PSTUXCFG files log_output psprcs.cfg psprcsrv.cfx psprcsrv.env psprcsrv.ubb psprcsrv.ubx psprcsrv.val psstat.in
[psadm2@psovmab h91tmplt]$ cd LOGS
[psadm2@psovmab LOGS]$ ls -lrt
total 24
-rw-r--r-- 1 psadm2 oracle 0 Apr 4 15:44 stdout
-rw-r--r-- 1 psadm2 oracle 0 Apr 4 15:44 stderr
-rw-r--r-- 1 psadm2 oracle 315 Apr 4 15:44 MSTRSCHDLR_0404.LOG
-rw-r--r-- 1 psadm2 oracle 1369 Apr 4 15:44 TUXLOG.040410
[psadm2@psovmab LOGS]$ more TUXLOG.040410
154435.psovmab.phoenix.nga!tmloadcf.1966.3461187040.-2: 04-04-2010: client high water (0), total client (0)
154435.psovmab.phoenix.nga!tmloadcf.1966.3461187040.-2: 04-04-2010: Tuxedo Version 10.3.0.0, 64-bit
154435.psovmab.phoenix.nga!tmloadcf.1966.3461187040.-2: CMDTUX_CAT:879: INFO: A new file system has been created. (size = 980 512-byte blocks)
154435.psovmab.phoenix.nga!tmloadcf.1966.3461187040.-2: CMDTUX_CAT:871: INFO: TUXCONFIG file /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/PSTUXCFG has been created
154443.psovmab.phoenix.nga!BBL.1970.2525654112.0: 04-04-2010: Tuxedo Version 10.3.0.0, 64-bit, Patch Level (none)
154443.psovmab.phoenix.nga!BBL.1970.2525654112.0: LIBTUX_CAT:262: INFO: Standard main starting
154443.psovmab.phoenix.nga!PSMSTPRC.1971.4055057680.-2: 04-04-2010: Tuxedo Version 10.3.0.0, 64-bit
154443.psovmab.phoenix.nga!PSMSTPRC.1971.4055057680.-2: LIBTUX_CAT:262: INFO: Standard main starting
154444.psovmab.phoenix.nga!PSMSTPRC.1971.4055057680.-2: LIBTUX_CAT:250: ERROR: tpsvrinit() failed
154444.psovmab.phoenix.nga!tmboot.1969.236119584.-2: 04-04-2010: Tuxedo Version 10.3.0.0, 64-bit
154444.psovmab.phoenix.nga!tmboot.1969.236119584.-2: tmboot: CMDTUX_CAT:827: ERROR: Fatal error encountered; initiating user error handler
154447.psovmab.phoenix.nga!BBL.1970.2525654112.0: CMDTUX_CAT:26: INFO: The BBL is exiting system
[psadm2@psovmab LOGS]$ more MSTRSCHDLR_0404.LOG
PSMSTPRC.1971 (0) [04/04/10 15:44:43](0) PeopleTools Release 8.50.02 (Linux) starting. Tuxedo server is BASE(1)/102
PSMSTPRC.1971 (0) [04/04/10 15:44:43](0) Cache Directory being used: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/CACHE/PSMSTPRC_102/
PSMSTPRC.1971 (0) [04/04/10 15:44:44](0) Server failed to start
And set some trace level does not help neither. There is a missing ROLE for that user user_prcs defined to managed the prcs : AppServer Administrator
Once this ROLE is added to the user_prcs user, the process scheduler is able to boot.

From this point, it is assuming the process scheduler is started. The following will be about the possible distribution issues.

4. The report node definition
Since the process scheduler is booted, let's run a process (AEMINITEST is the simplest one).
It won't post, here message logs :
And the prcs logs files :
[psadm2@psovmab LOGS]$ tail -20 stdout
Process Type: Application Engine
=====================================================================
OprId = user_prcs
PS_TOKEN=qQAAAAQDAgEBAAAAvAIAAAAAAAAsAAAABABTaGRyAk4Acwg4AC4AMQAwABTz9QlIS4gf8bE/nzl4yBgXufOPHWkAAAAFAFNkYXRhXXicTYo9DkBQEIS/hygV7uGFRziBn0oEvUL0QtzO4QyVTb6Z2Z09gcD3jJHfHt/EFycbBwu7dNUW1vS0RAMTDbOajpHCkZKJRBQ/LbHkyhan3lLxfla6OHgAOKYNfw==

NULL HTTP response - check Report Repository web server. (63,70)
OprId = user_prcs
PS_TOKEN=pwAAAAQDAgEBAAAAvAIAAAAAAAAsAAAABABTaGRyAk4Acwg4AC4AMQAwABShoTHXH8OI+PFN7JZBHEtlDZDrAmcAAAAFAFNkYXRhW3icTYo7DkBQFETPQ5QK+yA8vxX4VCLoFaIXYncWZ6jcZM7M5M4JeK5jjPx2+C68ONk4WNjFVc2v6WkJBiYaZn06RnJLQipFUv5jSUym/LIQK6x29tvyADjODXs=

NULL HTTP response - check Report Repository web server. (63,70)
OprId = user_prcs
PS_TOKEN=qAAAAAQDAgEBAAAAvAIAAAAAAAAsAAAABABTaGRyAk4Acwg4AC4AMQAwABQP+W2bMOEOBikcFsPqiXpcc8FbrGgAAAAFAFNkYXRhXHicTYo7DkBQEEWPT5QK+/Di87ACn0oEvUL0QuzO4lwqk5z7mZkT8D3XceS3yzfRxcnGwcIuXdWCmp6WcGCiYdalY8RmJKQiFvanJYZc2VCoGSoy/b0bCw84wQ2A

NULL HTTP response - check Report Repository web server. (63,70)
=================================Error===============================
Message: Unable to post report/log file for Process Instance: 891, Report Id: 737
Process Name: AEMINITEST, Type: Application Engine
Description: Simple AE test program
Directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
=====================================================================
[psadm2@psovmab LOGS]$ tail -20 DSTAGNT_0404.LOG
PSDSTSRV.2600 (1) [04/04/10 16:34:22 PostReport](3) HTTP transfer error.
PSDSTSRV.2600 (1) [04/04/10 16:34:22 PostReport](3) Post Report Elapsed Time: 1.6900
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](3) Number of new entries to process: 1
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](3) 1. Process Instance: 891/Report Id: 737/Descr: Simple AE test program
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](3) from directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](1) PSJNI: Java exception thrown: java.io.IOException: Stream closed.
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](3) HTTP transfer error.
PSDSTSRV.2600 (2) [04/04/10 16:34:35 PostReport](3) Post Report Elapsed Time: 0.0100
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](3) Number of new entries to process: 1
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](3) 1. Process Instance: 891/Report Id: 737/Descr: Simple AE test program
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](3) from directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](1) PSJNI: Java exception thrown: java.io.IOException: Stream closed.
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](3) HTTP transfer error.
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](3) Post Report Elapsed Time: 0.0200
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](1) =================================Error===============================
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](1) Unable to post report/log file for Process Instance: 891, Report Id: 737
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](2) Process Name: AEMINITEST, Type: Application Engine
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](2) Description: Simple AE test program
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](2) Directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2600 (3) [04/04/10 16:34:50 PostReport](2) =====================================================================
[psadm2@psovmab LOGS]$
Ok, the Report node is missing, let's add one.
But after bouncing the prcs, re-send the content does not help. Next step.

5. The local node definition
Now the error is a littel bit different :
[psadm2@psovmab LOGS]$ tail -20 stdout

The XML file returned by the web server is invalid. (63,94)

XML document object creation failed. (63,102)

Unable to process HTTP reply from Report Repository. (63,73)
OprId = user_prcs
PS_TOKEN=pwAAAAQDAgEBAAAAvAIAAAAAAAAsAAAABABTaGRyAk4Acwg4AC4AMQAwABRNcCgBNFm41dbNd6td8xOUAsInWGcAAAAFAFNkYXRhW3icTYk7DkBAAESfT5QK9yAsNnEAn0oEvUL0YuN2DmeiMsm8ycw4IAx8z1M+Pp+SG8fBxcYp7mpRy0hPPLHQseoZmKkMOYWcytWPlowa87EWDaXWRmnhBTkBDYY=

The XML file returned by the web server is invalid. (63,94)

XML document object creation failed. (63,102)

Unable to process HTTP reply from Report Repository. (63,73)
=================================Error===============================
Message: Unable to post report/log file for Process Instance: 891, Report Id: 737
Process Name: AEMINITEST, Type: Application Engine
Description: Simple AE test program
Directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
=====================================================================
[psadm2@psovmab LOGS]$ tail -20 DSTAGNT_0404.LOG
PSDSTSRV.2832 (1) [04/04/10 16:52:24 PostReport](3) PSJNI: Created a Java VM instance
PSDSTSRV.2832 (1) [04/04/10 16:52:24 PostReport](3) PSJNI: Set Context Class Loader
PSDSTSRV.2832 (1) [04/04/10 16:52:25 PostReport](3) HTTP transfer error.
PSDSTSRV.2832 (1) [04/04/10 16:52:25 PostReport](3) Post Report Elapsed Time: 0.8700
PSDSTSRV.2832 (2) [04/04/10 16:52:39 PostReport](3) Number of new entries to process: 1
PSDSTSRV.2832 (2) [04/04/10 16:52:39 PostReport](3) 1. Process Instance: 891/Report Id: 737/Descr: Simple AE test program
PSDSTSRV.2832 (2) [04/04/10 16:52:39 PostReport](3) from directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2832 (2) [04/04/10 16:52:39 PostReport](3) HTTP transfer error.
PSDSTSRV.2832 (2) [04/04/10 16:52:39 PostReport](3) Post Report Elapsed Time: 0.1000
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](3) Number of new entries to process: 1
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](3) 1. Process Instance: 891/Report Id: 737/Descr: Simple AE test program
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](3) from directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](3) HTTP transfer error.
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](3) Post Report Elapsed Time: 0.0900
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](1) =================================Error===============================
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](1) Unable to post report/log file for Process Instance: 891, Report Id: 737
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](2) Process Name: AEMINITEST, Type: Application Engine
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](2) Description: Simple AE test program
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](2) Directory: /home/psadm2/ps/pt/8.50/appserv/prcs/h91tmplt/log_output/AE_AEMINITEST_891
PSDSTSRV.2832 (3) [04/04/10 16:52:54 PostReport](2) =====================================================================
[psadm2@psovmab LOGS]$
The local node must be trusted
And password authentication must be set.
After this setting and bounce the prcs, an other error to the next step.

6. The missing role - ProcessSchedulerAdmin
Now the error is the following :
OprId = user_prcs
PS_TOKEN=pgAAAAQDAgEBAAAAvAIAAAAAAAAsAAAABABTaGRyAk4Acwg4AC4AMQAwABQ69uNPo1aucPuKHZaK64OrsLUWgmYAAAAFAFNkYXRhWnicTYk7DkBQFETPQ5QK+yD+nwX4VCLoFaIXYncWZ6Jyk3NmJvcCHNsyRvlYfOffXOycrBzypuU2DHR4IzMtiz49E1lCRCwCkf1cEsqRnH+9IKWiFjG8OSANjw==

Access Denied, unable to post file(s) to repository. (63,124)

SchedulerTransfer Servlet error. (63,74)
Looks like we are close to the goal, but a right is missing, the role to post the report.
Let's add the ProcessSchedulerAdmin role to the user user_prcs who manage the process scheduler :
And bounce the prcs.
Now it is posting.


Of course, it is non-exhaustive testing, there are many other possible cases. But hopefully it is cover a large part and enough to debug most of the process scheduler issues.


Nicolas.

5 comments:

Anonymous said...

hi nicolas,
Thanks for this information.
But after checking all the details needed i still encounter "Not Posted" error on the distribution status. I get the "HTTP Status 902". Could you please advise me on this issue? Thanks alot. :)

Anonymous said...

Hi Nicolas,

Thanks for the info.

I would just like to add one more info suppose you are creating a new user to post reports through process scheduler then it should have "PeopleTools Web Server" role also apart from "ProcessSchedulerAdmin" and "ReportDistAdmin" roles.

indrasena said...

Hi,
This is Indrasena Reddy, n thanks for the information about process scheduler.

Anonymous said...

Hi, I need some help with Process Schedular.

The issue is that couple of days ago Process Schedular of one of our environments didnt started automatically through the windows services( windows services started automatically ) when the machine is restarted.

I checked the logs but could not find any reson for the same.

Can anybody suggest some thing with this???

Anonymous said...

Hi Nocolas - Thanks a bunch for the . My process scheduler is up and running
AB Krishna