-
Notifications
You must be signed in to change notification settings - Fork 57
/
Copy pathworkflow.txt
314 lines (205 loc) · 9.3 KB
/
workflow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
INTERNAL DOCUMENTATION
This file gives sample "workflows" for presumed common activities of
users of the zrep replication program
However, this is used as a coding aid, to help me figure out
workflow INTERNAL to the PROGRAM
high level operation types:
init sync clear reconfig status failover
*** Workflow for initial set up for filesystem "pool/fs on host1"
to be replicated to "host2pool" on host2
0. Check if filesystem already known by zrep. Quit if it is.
Quit if localfilesystem does not exist.
1. set zrep:xyz properties on local master filesystem
2. Create new remote fs, and set readonly flag.
(overwrite allows admins to handcreate fs with special options)
3. sync up.
Sample usage with comments below:
host1# zrep -i pool/fs host2 host2pool
###alternative might allow pool/fs@snap
## important if machines have very slow link that requires
## being synced offline.
# note that unlike zfs send -I inc1 inc2,
# zfs send fs@snap0 DOES NOT carry over all snapshots!
# just the one specified in the zfs send command!
## need to get global lock here?
## Naw, creation of snap _0 can serve for that
# if fail, dont have lock so quit
#creates snap pool/fs/@zrep_xxxx_xxx_0
#old way: manually creates remote, empty filesystem
...(ssh host2 zfs create -o readonly=on pool2host/fs )
( zfs send pool/fs@zrep_xxx_xxx_0 |
zfs recv -F host2pool/fs )
#new way: autocreate in one shot with full zfs send,
# set readonly after the create)
(rename pool/fs@zrep_xxx to pool/fs@zrep_xxx_sent)
# Q: "Is renaming snap okay?"
# A: Note that future incrementals do not copy over/rename
# those snapshots on the other side!
# Think of it like renaming a file doesnt change access time.
# Plus, CAN do incremental from common base even if not same name.
# So it's okay to rename snap
oooo Q: Do we create remote-side read-only?
oooo A: can we sync further, if it remains read-only?
oooo Yes we can. In which case, good safety measure is make
oooo it read-only at creation-time
*** Workflow for zrep to do zrep -S fs (sync of some fs)
- get list of latest snapshots for fs@zrep_xxxxx
Check for multiple hosts? ugh.
For now, only allow single host per zrep fs. Get from
zrep:dest-host property
- Verify that our machine is the zrep master for this fs
- lock somewhere in here
+ Check for command-line option, for
"QUIETLY quit, if zrep is already running" ?
But maybe should also have some kind of time limit,
"only if lockfile is newer than X amount of time"?
- identify most recent sent zrep snapshot. Use as incremental base.
- create new snapshot
- prepare for incremental send, then send
Make sure to send -Ipb
(use -F for recv? no, to avoid split-brain? see 'status')
+ optimize for "remote host is us" case, (skip ssh)
- rename the new snapshot, to xxx_sent, if successful.
If NOT successful.. leave the snapshot around, just in case it
is useful to user for some reason.
- do expires (see sub-function expire)
- release lock
THINGS TO WATCH OUT FOR:
x When constructing incremental, need to use
x *most recent snapshot* (on remote side) as base.
x recv will fail otherwise.
x EVEN IF recent remote snapshot not related to zrep !!
x It is technically possible to force, if you rollback remote
x to be at the base you wish to use. Presuming it exists on remote.
*** Workflow for zrep to do zrep -S fs@snap_sent
- Special case for zrep sync! This is a shortcut for
ssh $remhost zfs rollback -r fs@snap_sent
with added local cleanup duties.
It is useful for "hey lets test out something on our remote host...
and also for debugging zrep :-}
= print out a warning about deleting all newer remote snapshots...
sleep 10 seconds.. then do it
+ grab lock
+ ssh the command
+ .... Then have to do something about LOCAL "newer" snapshots?
Delete or rename?
But... what if they WANT it to be temporary?
So, not delete local snaphots ....
Must rename to not have _sent, though, so as to not mess up things.
_somethingelse ??
- do expires (see sub-function expire)
+ release lock
*** Workflow for zrep to do zrep -S all (sync of ALL fs)
(SEE ALSO zrep -P -S, below)
(otherwise...)
- Iterate through,
for z in [list zfs filesystems]
zrep -S $z
done
*** Workflow for zrep -P x -S all
- Parallel-ize "sync all" operation
+ First, grab global lock.
+ make list of needed filesystems
+ make use of secondary lockfile. but what...
/var/run/zrep.lock.batch$$ ($$== PARENT/controller pid)
*** Workflow for zrep status (fs)
(For each fs)
- IF and only if zrep:destfs present (ie: zrep configured fs)
(easy way: go through "zfs get -s local all" !!)
+ get list of all relevant snapshots
+ print out if "paused" ....
+ print out most recent synced snapshot, and creation time
+ print out what savecount will be
+ need to have some kind of "error flag" if remote sync fails?
Should be different from if we chose to force remote
side to a specific snapshot, and so have leftover snapshots?
************************************************
*** Workflow for zrep to do zrep failover pool/fs
General failover/takeover comments:
failover and takeover commands, are paired.
Normally, automated process uses ssh to remote host, to call
the "other" of the pair.
'failover' actions need to come before takeover, for safety.
Both of them just reset the various properties of the filesystem(s)
And rename snapshots if needed.
They dont actually make data flow anywhere.
-L means "local only". Will not attempt to contact other side.
In normal conditions we do need to identify last common snapshot,
and make sure it is same for both sides.
Identify sequence number?
*** zrep failover
(this is run from the 'master' side, to put into slave mode)
+ grab fs lock
+ set readonly on main fs
+ reverse zrep:dest-fs, zrep:src-fs props on fs
+ if -L NOT used, attempt to sync first, with a new sync/snap.
(with -F ? maybe not)
+ if -L used, rollback to most recent 'sent'
+ release fs lock
+ ssh other side, trigger zrep takeover -L
*** zrep takeover
'takeover' is matching pair to zrep failover.
- Handles making current system active master.
In emergency situations, equivalent to
"Just gimme disk access NOW...
worry about resync later"
However, also gets used as part of "normal" failover.
- Need to try REALLY REALLY REALLY hard, to avoid
split-brain problem if "force" option used.
Because of this, take OUT, the -F option to our normal zfs recv.
In that case, as soon as we make any change
(such as creating a first snapshot on OUR side),
any attempted sync from other side will fail.
+ if -L is not used,
- ssh to other side, and call 'zrep failover', WITHOUT -L.
That will call back to us, for takeover again, and
also call us with -L
(it will also probaly push one more sync over)
So, EXIT when ssh over!
- if -L IS used:
+ set filesystem lock
+ readonly removed.
(side effect: will break any attempted sync from other side as
soon as any change, or even an 'ls', is done.)
+ If specific snap named, rollback to snap
+ switch around dest/src properties on fs
+ set "zrep:master" property
+ undo filesystem lock
*** add feature for "pause/skip replication" on a fs, for "sync all"
+ make sure "status" shows the pause
### sub-function: expire
- Any time an expire is triggered on one side, run on other as well
- pay attention to REMOTE value for zrep:savecount. it may be different!
Q? What if equal fix of zrep_xxxx and zrep_xxxx_sent? treat as whole,
or only 'count' the sent ones?
A: expire is meant to preserve free disk space. The snaps happen,
and take up same amount of disk space, whether or not they are _sent.
Therefore, must treat sent and unsent equally.
That being said.. we MUST NOT delete the last sent snapshot!!
- Steps
+ get global lock
+ figure out which ones to save
+ dont miss expiring zrep_host1_host2_###_batch## also, if need be.
+ do the destroys
+ release global lock
NEED TO TEST incremental across early and late snaps, with stuff in middle.
*** Workflow for LOCK/UNLOCK ( global zrep lock)
- originally, wanted to use zfs 'hold' as locks.
problem: one fs can be synced to multiple dest hosts.
Cannot grab 'hold' on parent filesystem; holds are only for
snapshots.
- Instead, use global lock:
ln -s /proc/$$ /var/run/zrep.lock
Hold for short time only
Validate lock with "ls -F /var/run/zrep.lock/."
Remove if invalid?
Only allow one active (non-"status") instance to run at once?
Or just "discourage" multiple, and keep global lock very short time. yea.
General lock flow:
0. Global lock is only a mutex to acquire snapshot specific hold.
0.1 Create snapshot if needed
1. Grab global lock
2. Check for pre-existing hold on snapshot. Quit if there is one.
3. create "hold" on snapshot, specific with ID
4. release global lock
5. (continue to do operations)