gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Help needed to implement Pause and Resume feature in Geo


From: Aravinda
Subject: [Gluster-devel] Help needed to implement Pause and Resume feature in Geo-replication
Date: Wed, 02 Apr 2014 14:42:21 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

Hi All,

We are trying to implement pause/resume feature for GlusterFS geo-replication, which will be used before taking GlusterFS snapshot.(pause geo-rep, take snapshot, resume geo-rep)

Geo-replication involves
1. crawling(xtime based and changelog based) and identifying changes
2. Processing changes and queue for
        a) Entry operations on slave to keep same GFID on replicated files.
        b) Rsync or tarssh to sync files to slave.

As of now the idea is to stop processing on receiving pause signal(entry ops and rsync will stop eventually since processing is stopped) but crawling and identifying changes will continue. Sent initial patch(http://review.gluster.org/#/c/7322/) for the same.

Plan:
gluster cli will send SIGUSR1 to geo-rep monitor process, then monitor will send SIGUSR1 to all the worker processes.
Worker processes uses os.pipe() and select to handle the signal received from monitor.

Problem:
Signal handling is not working in monitor. (No error/traceback), looks like python's limitation(http://bugs.python.org/issue5315)

Alternate solution(Involves lot of changes in existing geo-rep code):
Moving crawling as separate process(outside the monitor process group), glustercli pids SIGSTOP to monitor pid group to pause and SIGCONT to monitor pid group to Resume.

Please suggest what can be done to effectively handle signal or pause/resume.

--
regards
Aravinda



reply via email to

[Prev in Thread] Current Thread [Next in Thread]