couch_doc_update fails randomly with "Request Timeout" during CI tests
Today we got these mails from rewdevcouch1 and rewdevcouch2:
Output of error log below: Apr 12 06:08:12 - [rewdevcouch2.rewire.org] err: /Stage[main]/Site_couchdb::Designs/Site_couchdb::Upload_design[shared_transactions]/Exec[upload_design_shared_transactions]/returns: change from notrun to 0 failed: /usr/local/bin/couch-doc-update --host 127.0.0.1:5984 --db 'shared' --id '_design/transactions' --data '{}' --file '/srv/leap/couchdb/designs/shared/transactions.json' returned 1 instead of one of [0] at /srv/leap/puppet/modules/site_couchdb/manifests/upload_design.pp:12 Apr 12 06:08:18 = warning: puppet did not finish successfully. ------------------------------------------------------------------- error log: /var/log/leap/rewire/develop/deploy-rewdevcouch2-2015-04-12-060019-error.log comlete log: /var/log/leap/rewire/develop/deploy-rewdevcouch2-2015-04-12-060019.log Tested on Sun Apr 12 06:08:19 UTC 2015 on "rewdevcouch2" with following versions/git commit IDs: Provider (/home/testbot/platform-test/rewire/develop/rewire): not under version control = leap command v1.7.1 (develop ea5be4ea7b6f0b269ac54655f01c7cd6dc28ece7) = leap platform v0.7 (develop 051b80d25bd0fe400769976c3210718689f832b4)
Output of error log below: Apr 12 06:07:53 - [rewdevcouch1.rewire.org] err: /Stage[main]/Site_couchdb::Add_users/Couchdb::Add_user[webapp]/Couchdb::Document[update_user_webapp]/Exec[couch-doc-update --netrc-file /etc/couchdb/couchdb.netrc --host 127.0.0.1:5986 --db _users --id org.couchdb.user:webapp --data '{"type": "user", "name": "webapp", "roles": ["tokens","identities","users"], "password_sha": "e766b8dafbfe871bfa0f9d679f6335a995c26f98", "salt": "edc4c9386cc3ebb8b1c9e1b044dae4ad"}']/returns: change from notrun to 0 failed: couch-doc-update --netrc-file /etc/couchdb/couchdb.netrc --host 127.0.0.1:5986 --db _users --id org.couchdb.user:webapp --data '{"type": "user", "name": "webapp", "roles": ["tokens","identities","users"], "password_sha": "e766b8dafbfe871bfa0f9d679f6335a995c26f98", "salt": "edc4c9386cc3ebb8b1c9e1b044dae4ad"}' returned 1 instead of one of [0] at /srv/leap/puppet/modules/couchdb/manifests/document.pp:24 Apr 12 06:08:19 = warning: puppet did not finish successfully. ------------------------------------------------------------------- error log: /var/log/leap/rewire/develop/deploy-rewdevcouch1-2015-04-12-060011-error.log comlete log: /var/log/leap/rewire/develop/deploy-rewdevcouch1-2015-04-12-060011.log Tested on Sun Apr 12 06:08:19 UTC 2015 on "rewdevcouch1" with following versions/git commit IDs: Provider (/home/testbot/platform-test/rewire/develop/rewire): not under version control = leap command v1.7.1 (develop ea5be4ea7b6f0b269ac54655f01c7cd6dc28ece7) = leap platform v0.7 (develop 051b80d25bd0fe400769976c3210718689f832b4)
but leap test is all green (i'll open a different issue for this):
Tested on Sun Apr 12 06:12:42 UTC 2015 on these nodes: "rewdevcouch1 rewdevcouch2 rewdevmx1 rewdevvpn1 rewdevweb1 rewdevplain1" with following versions/git commit IDs: Provider (/home/testbot/platform-test/rewire/develop/rewire): not under version control = leap command v1.7.1 (develop ea5be4ea7b6f0b269ac54655f01c7cd6dc28ece7) = leap platform v0.7 (develop 051b80d25bd0fe400769976c3210718689f832b4) test-2015-04-12-060001.log Running leap test on 2015-04-12-061234 Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Are daemons running? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Is CouchDB running? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Is cluster membership ok? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Are configured nodes online? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Do ACL users exist? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Do required databases exist? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] PASS: CouchDB > Can records be created? Apr 12 06:12:41 = [rewdevcouch2.rewire.org] 10 tests: 10 passes, 0 skips, 0 warnings, 0 failures, 0 errors Apr 12 06:12:41 = [rewdevplain1.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevplain1.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevplain1.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevplain1.rewire.org] 3 tests: 3 passes, 0 skips, 0 warnings, 0 failures, 0 errors Apr 12 06:12:41 = [rewdevvpn1.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevvpn1.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevvpn1.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevvpn1.rewire.org] PASS: OpenVPN > Are daemons running? Apr 12 06:12:41 = [rewdevvpn1.rewire.org] 4 tests: 4 passes, 0 skips, 0 warnings, 0 failures, 0 errors Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Are daemons running? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Is CouchDB running? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Is cluster membership ok? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Are configured nodes online? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Do ACL users exist? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Do required databases exist? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: CouchDB > Can records be created? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] PASS: Soledad > Is Soledad running? Apr 12 06:12:41 = [rewdevcouch1.rewire.org] 11 tests: 11 passes, 0 skips, 0 warnings, 0 failures, 0 errors Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Mx > Can contact couchdb? Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Mx > Can contact couchdb via haproxy? Apr 12 06:12:41 = [rewdevmx1.rewire.org] PASS: Mx > Are MX daemons running? Apr 12 06:12:41 = [rewdevmx1.rewire.org] 6 tests: 6 passes, 0 skips, 0 warnings, 0 failures, 0 errors Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Network > Can connect to internet? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Network > Is stunnel running? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Network > Is shorewall running? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Can contact couchdb? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Can contact couchdb via haproxy? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Are daemons running? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Can access webapp? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Can create and authenticate and delete user via API? Apr 12 06:12:41 = [rewdevweb1.rewire.org] PASS: Webapp > Can sync Soledad? Apr 12 06:12:41 = [rewdevweb1.rewire.org] 9 tests: 9 passes, 0 skips, 0 warnings, 0 failures, 0 errors OK - "leap test" is all green !
btw, i added the "Exec<||> { logoutput => on_failure }" statement to site_config::default, hoping we get more details in the logs, but this doesn't seem to work - i have an idea why, and am opening another issue.
(from redmine: created on 2015-04-12, closed on 2015-05-15, relates #6851 (closed), relates #6852)