Advanced Topic with Python Channel Access¶
This chapter contains a variety of “usage notes” and implementation details that may help in getting the best performance from the pyepics module.
The wait and timeout options for get(), ca.get_complete()¶
The get functions, epics.caget()
, pv.get()
and epics.ca.get()
all ask for data to be transferred over the network. For large data arrays
or slow networks, this can can take a noticeable amount of time. For PVs
that have been disconnected, the get call will fail to return a value at
all. For this reason, these functions all take a timeout keyword option.
The lowest level epics.ca.get()
also has a wait option, and a companion
function epics.ca.get_complete()
. This section describes the details of
these.
If you’re using epics.caget()
or pv.get()
you can supply a
timeout value. If the value returned is None
, then either the PV has
truly disconnected or the timeout passed before receiving the value. If
the get is incomplete, in that the PV is connected but the data has
simply not been received yet, a subsequent epics.caget()
or
pv.get()
will eventually complete and receive the value. That is, if
a PV for a large waveform record reports that it is connected, but a
pv.get()
returns None, simply trying again later will probably work:
>>> p = epics.PV('LargeWaveform')
>>> val = p.get()
>>> val
>>> time.sleep(10)
>>> val = p.get()
At the lowest level (which pv.get()
and epics.caget()
use),
epics.ca.get()
issues a get-request with an internal callback function.
That is, it calls the CA library function
libca.ca_array_get_callback()
with a pre-defined callback function.
With wait=True (the default), epics.ca.get()
then waits up to the timeout
or until the CA library calls the specified callback function. If the
callback has been called, the value can then be converted and returned.
If the callback is not called in time or if wait=False is used but the PV
is connected, the callback will be called eventually, and simply waiting
(or using epics.ca.pend_event()
if epics.ca.PREEMPTIVE_CALLBACK
is
False
) may be sufficient for the data to arrive. Under this condition,
you can call epics.ca.get_complete()
, which will NOT issue a new request
for data to be sent, but wait (for up to a timeout time) for the previous
get request to complete.
epics.ca.get_complete()
will return None
if the timeout is exceeded or
if there is not an “incomplete get” that it can wait to complete. Thus,
you should use the return value from epics.ca.get_complete()
with care.
Note that pv.get()
(and so epics.caget()
) will normally rely on
the PV value to be filled in automatically by monitor callbacks. If
monitor callbacks are disabled (as is done for large arrays and can be
turned off) or if the monitor hasn’t been called yet, pv.get()
will
check whether it should can epics.ca.get()
or epics.ca.get_complete()
.
If not specified, the timeout for epics.ca.get_complete()
(and all other
get functions) will be set to:
timeout = 0.5 + log10(count)
Again, that’s the maximum time that will be waited, and if the data is received faster than that, the get will return as soon as it can.
Strategies for connecting to a large number of PVs¶
Occasionally, you may find that you need to quickly connect to a large number of PVs, say to write values to disk. The most straightforward way to do this, say:
import epics
pvnamelist = read_list_pvs()
pv_vals = {}
for name in pvnamelist:
pv = epics.PV(name)
pv_vals[name] = pv.get()
or even just:
values = [epics.caget(name) for name in pvnamelist]
does incur some performance penalty. To minimize the penalty, we need to understand its cause.
Creating a PV object (using any of pv.PV
, or pv.get_pv()
,
or epics.caget()
) will automatically use connection and event
callbacks in an attempt to keep the PV alive and up-to-date during the
seesion. Normally, this is an advantage, as you don’t need to explicitly
deal with many aspects of Channel Access. But creating a PV does request
some network traffic, and the PV will not be “fully connected” and ready
to do a PV.get()
until all the connection and event callbacks are
established. In fact, PV.get()
will not run until those connections
are all established. This takes very close to 30 milliseconds for each PV.
That is, for 1000 PVs, the above approach will take about 30 seconds.
The simplest remedy is to allow all those connections to happen in parallel and in the background by first creating all the PVs and then getting their values. That would look like:
# improve time to get multiple PVs: Method 1
import epics
pvnamelist = read_list_pvs()
pvs = [epics.PV(name) for name in pvnamelist]
values = [p.get() for p in pvs]
Though it doesn’t look that different, this improves performance by a factor of 100, so that getting 1000 PV values will take around 0.4 seconds.
Can it be improved further? The answer is Yes, but at a price. For the discussion here, we’ll can the original version “Method 0” and the method of creating all the PVs then getting their values “Method 1”. With both of these approaches, the script has fully connected PV objects for all PVs named, so that subsequent use of these PVs will be very efficient.
But this can be made even faster by turning off any connection or event
callbacks, avoiding PV objects altogether, and using the epics.ca
interface. This has been encapsulated into epics.caget_many()
which
can be used as:
# get multiple PVs as fast as possible: Method 2
import epics
pvnamelist = read_list_pvs()
values = epics.caget_many(pvlist)
In tests using 1000 PVs that were all really connected, Method 2 will take
about 0.25 seconds, compared to 0.4 seconds for Method 1 and 30 seconds for
Method 0. To understand what epics.caget_many()
is doing, a more
complete version of this looks like this:
# epics.caget_many made explicit: Method 3
from epics import ca
pvnamelist = read_list_pvs()
pvdata = {}
pvchids = []
# create, don't connect or create callbacks
for name in pvnamelist:
chid = ca.create_channel(name, connect=False, auto_cb=False) # note 1
pvchids.append(chid)
# connect
for chid in pvchids:
ca.connect_channel(chid)
# request get, but do not wait for result
ca.poll()
for chid in pvchids:
ca.get(chid, wait=False) # note 2
# now wait for get() to complete
ca.poll()
for chid in pvchids:
val = ca.get_complete(data[0])
pvdata[ca.name(chid)] = val
The code here probably needs detailed explanation. As mentioned above, it
uses the ca level, not PV objects. Second, the call to
epics.ca.create_channel()
(Note 1) uses connect=False and auto_cb=False
which mean to not wait for a connection before returning, and to not
automatically assign a connection callback. Normally, these are not what
you want, as you want a connected channel and to be informed if the
connection state changes, but we’re aiming for maximum speed here. We then
use epics.ca.connect_channel()
to connect all the channels. Next (Note 2),
we tell the CA library to request the data for the channel without waiting
around to receive it. The main point of not having epics.ca.get()
wait for
the data for each channel as we go is that each data transfer takes time.
Instead we request data to be sent in a separate thread for all channels
without waiting. Then we do wait by calling epics.ca.poll()
once and only
once, (not len(pvnamelist) times!). Finally, we use the
epics.ca.get_complete()
method to convert the data that has now been
received by the companion thread to a python value.
Method 2 and 3 have essentially the same runtime, which is somewhat faster
than Method 1, and much faster than Method 0. Which method you should use
depends on use case. In fact, the test shown here only gets the PV values
once. If you’re writing a script to get 1000 PVs, write them to disk, and
exit, then Method 2 (epics.caget_many()
) may be exactly what you
want. But if your script will get 1000 PVs and stay alive doing other
work, or even if it runs a loop to get 1000 PVs and write them to disk once
a minute, then Method 1 will actually be faster. That is doing
epics.caget_many()
in a loop, as with:
# caget_many() 10 times
import epics
import time
pvnamelist = read_list_pvs()
for i in range(10):
values = epics.caget_many(pvlist)
time.sleep(0.01)
will take around considerably longer than creating the PVs once and getting their values in a loop with:
# pv.get() 10 times
import epics
import time
pvnamelist = read_list_pvs()
pvs = [epics.PV(name) for name in pvnamelist]
for i in range(10):
values = [p.get() for p in pvs]
time.sleep(0.01)
In tests with 1000 PVs, looping with epics.caget_many()
took about
1.5 seconds, while the version looping over PV.get()
took about 0.5
seconds.
To be clear, it is connecting to Epics PVs that is expensive, not the
retreiving of data from connected PVs. You can lower the connection
expense by not retaining the connection or creating monitors on the PVs,
but if you are going to re-use the PVs, that savings will be lost quickly.
In short, use Method 1 over epics.caget_many()
unless you’ve benchmarked
your use-case and have demonstrated that epics.caget_many()
is better for
your needs.
time.sleep() or epics.poll()?¶
In order for a program to communicate with Epics devices, it needs to allow
some time for this communication to happen. With
epics.ca.PREEMPTIVE_CALLBACK
set to True
, this communication will
be handled in a thread separate from the main Python thread. This means
that CA events can happen at any time, and epics.ca.pend_event()
does not
need to be called to explicitly allow for event processing.
Still, some time must be released from the main Python thread on occasion
in order for events to be processed. The simplest way to do this is with
time.sleep()
, so that an event loop can simply be:
>>> while True:
>>> time.sleep(0.001)
Unfortunately, the time.sleep()
method is not a very high-resolution
clock, with typical resolutions of 1 to 10 ms, depending on the system.
Thus, even though events will be asynchronously generated and epics with
pre-emptive callbacks does not require epics.ca.pend_event()
or
epics.ca.poll()
to be run, better performance may be achieved with an event
loop of:
>>> while True:
>>> epics.poll(evt=1.e-5, iot=0.1)
as the loop will be run more often than using time.sleep()
.
Using Python Threads¶
An important feature of the PyEpics package is that it can be used with Python threads, as Epics 3.14 supports threads for client code. Even in the best of cases, working with threads can be somewhat tricky and lead to unexpected behavior, and the Channel Access library adds a small level of complication for using CA with Python threads. The result is that some precautions may be in order when using PyEpics and threads. This section discusses the strategies for using threads with PyEpics.
First, to use threads with Channel Access, you must have
epics.ca.PREEMPTIVE_CALLBACK
= True
. This is the default
value, but if epics.ca.PREEMPTIVE_CALLBACK
has been set to
False
, threading will not work.
Second, if you are using PV
objects and not making heavy use of
the epics.ca
module (that is, not making and passing around chids), then
the complications below are mostly hidden from you. If you’re writing
threaded code, it’s probably a good idea to read this just to understand
what the issues are.
Channel Access Contexts¶
The Channel Access library uses a concept of contexts for its own thread
model, with contexts holding sets of threads as well as Channels and
Process Variables. For non-threaded work, a process will use a single
context that is initialized prior doing any real CA work (done in
epics.ca.initialize_libca()
). In a threaded application, each new thread
begins with a new, uninitialized context that must be initialized or
replaced. Thus each new python thread that will interact with CA must
either explicitly create its own context with epics.ca.create_context()
(and then, being a good citizen, destroy this context as the thread ends
with epics.ca.destroy_context()
) or attach to an existing context.
The generally recommended approach is to use a single CA context throughout an entire process and have each thread attach to the first context created (probably from the main thread). This avoids many potential pitfalls (and crashes), and can be done fairly simply. It is the default mode when using PV objects.
The most explicit use of contexts is to put epics.ca.create_context()
at the start of each function call as a thread target, and
epics.ca.destroy_context()
at the end of each thread. This will
cause all the activity in that thread to be done in its own context. This
works, but means more care is needed, and so is not the recommended.
The best way to attach to the initially created context is to call
epics.ca.use_initial_context()
before any other CA calls in each
function that will be called by Thread.run()
. Equivalently, you can
add a withInitialContext()
decorator to the function. Creating a PV
object will implicitly do this for you, as long as it is your first CA
action in the function. Each time you do a PV.get()
or
PV.put()
(or a few other methods), it will also check that the initial
context is being used.
Of course, this approach requires CA to be initialized already. Doing that in the main thread is highly recommended. If it happens in a child thread, that thread must exist for all CA work, so either the life of the process or with great care for processes that do only some CA calls. If you are writing a threaded application in which the first real CA calls are inside a child thread, it is recommended that you initialize CA in the main thread,
As a convenience, the CAThread
in the epics.ca
module is
is a very thin wrapper around the standard threading.Thread
which
adding a call of epics.ca.use_initial_context()
just before your
threaded function is run. This allows your target functions to not
explicitly set the context, but still ensures that the initial context is
used in all functions.
How to work with CA and Threads¶
Summarizing the discussion above, to use threads you must use run in PREEMPTIVE_CALLBACK mode. Furthermore, it is recommended that you use a single context, and that you initialize CA in the main program thread so that your single CA context belongs to the main thread. Using PV objects exclusively makes this easy, but it can also be accomplished relatively easily using the lower-level ca interface. The options for using threads (in approximate order of reliability) are then:
1. use PV objects for threading work. This ensures you’re working in a single CA context.
2. use
CAThread
instead ofThread
for threads that will use CA calls.3. put
epics.ca.use_initial_context()
at the top of all functions that might be a Thread target function, or decorate them withwithInitialContext()
decorator, @withInitialContext.4. use
epics.ca.create_context()
at the top of all functions that are inside a new thread, and be sure to putepics.ca.destroy_context()
at the end of the function.5. ignore this advise and hope for the best. If you’re not creating new PVs and only reading values of PVs created in the main thread inside a child thread, you may not see a problems, at least not until you try to do something fancier.
Thread Examples¶
This is a simplified version of test code using Python threads. It is based on code originally from Friedrich Schotte, NIH, and included as thread_test.py in the tests directory of the source distribution.
In this example, we define a run_test procedure which will create PVs from a supplied list, and monitor these PVs, printing out the values when they change. Two threads are created and run concurrently, with overlapping PV lists, though one thread is run for a shorter time than the other.
import epics
import threading
import pvnames
def test_basic_thread():
result = []
def thread():
epics.ca.use_initial_context()
pv = epics.get_pv(pvnames.double_pv)
result.append(pv.get())
epics.ca.use_initial_context()
t = threading.Thread(target=thread)
t.start()
t.join()
assert len(result) and result[0] is not None
def test_basic_cathread():
result = []
def thread():
pv = epics.get_pv(pvnames.double_pv)
result.append(pv.get())
epics.ca.use_initial_context()
t = epics.ca.CAThread(target=thread)
t.start()
t.join()
assert len(result) and result[0] is not None
def test_attach_context():
result = []
def thread():
epics.ca.create_context()
pv = epics.get_pv(pvnames.double_pv2)
assert pv.wait_for_connection()
result.append(pv.get())
epics.ca.detach_context()
epics.ca.attach_context(ctx)
pv = epics.get_pv(pvnames.double_pv)
assert pv.wait_for_connection()
result.append(pv.get())
epics.ca.use_initial_context()
ctx = epics.ca.current_context()
t = threading.Thread(target=thread)
t.start()
t.join()
assert len(result) == 2 and result[0] is not None
print(result)
def test_pv_from_main():
result = []
def thread():
result.append(pv.get())
epics.ca.use_initial_context()
pv = epics.get_pv(pvnames.double_pv2)
t = epics.ca.CAThread(target=thread)
t.start()
t.join()
assert len(result) and result[0] is not None
In light of the long discussion above, a few remarks are in order: This
code uses the standard Thread library and explicitly calls
epics.ca.use_initial_context()
prior to any CA calls in the target
function. Also note that the run_test()
function is first called
from the main thread, so that the initial CA context does belong to the
main thread. Finally, the epics.ca.use_initial_context()
call in
run_test()
above could be replaced with
epics.ca.create_context()
, and run OK.
The output from this will look like:
First, create a PV in the main thread:
Run 2 Background Threads simultaneously:
-> thread "A" will run for 3.000 sec, monitoring ['Py:ao1', 'Py:ai1', 'Py:long1']
-> thread "B" will run for 6.000 sec, monitoring ['Py:ai1', 'Py:long1', 'Py:ao2']
Py:ao1 = 8.3948 (A)
Py:ai1 = 3.14 (B)
Py:ai1 = 3.14 (A)
Py:ao1 = 0.7404 (A)
Py:ai1 = 4.07 (B)
Py:ai1 = 4.07 (A)
Py:long1 = 3 (B)
Py:long1 = 3 (A)
Py:ao1 = 13.0861 (A)
Py:ai1 = 8.49 (B)
Py:ai1 = 8.49 (A)
Py:ao2 = 30 (B)
Completed Thread A
Py:ai1 = 9.42 (B)
Py:ao2 = 30 (B)
Py:long1 = 4 (B)
Py:ai1 = 3.35 (B)
Py:ao2 = 31 (B)
Py:ai1 = 4.27 (B)
Py:ao2 = 31 (B)
Py:long1 = 5 (B)
Py:ai1 = 8.20 (B)
Py:ao2 = 31 (B)
Completed Thread B
Done
Note that while both threads A and B are running, a callback for the PV Py:ai1 is generated in each thread.
Note also that the callbacks for the PVs created in each thread are explicitly cleared with:
[p.clear_callbacks() for p in pvs]
Without this, the callbacks for thread A will persist even after the thread has completed!
Using Multiprocessing with PyEpics¶
An alternative to Python threads that has some very interesting and
important features is to use multiple processes, as with the standard
Python multiprocessing
module. While using multiple processes has
some advantages over threads, it also has important implications for use
with PyEpics. The basic issue is that multiple processes need to be fully
separate, and do not share global state. For epics Channel Access, this
means that all those things like established communication channels,
callbacks, and Channel Access context cannot easily be share between
processes.
The solution is to use a CAProcess
, which acts just like
multiprocessing.Process
, but knows how to separate contexts
between processes. This means that you will have to create PV objects for
each process (even if they point to the same PV).
-
class
CAProcess
(group=None, target=None, name=None, args=(), kwargs={})¶ a subclass of
multiprocessing.Process
that clears the global Channel Access context before running you target function in its own process.
-
class
CAPool
(processes=None, initializer=None, initargs=(), maxtasksperchild=None)¶ a subclass of
multiprocessing.pool.Pool
, creating a Pool ofCAProcess
instances.
A simple example of using multiprocessing successfully is given:
from __future__ import print_function
import epics
import time
import multiprocessing as mp
import threading
import pvnames
PVN1 = pvnames.double_pv # 'Py:ao2'
PVN2 = pvnames.double_pv2 # 'Py:ao3'
def subprocess(*args):
print('==subprocess==', args)
mypvs = [epics.get_pv(pvname) for pvname in args]
for i in range(10):
time.sleep(0.750)
out = [(p.pvname, p.get(as_string=True)) for p in mypvs]
out = ', '.join(["%s=%s" % o for o in out])
print('==sub (%d): %s' % (i, out))
def test_mpprocess():
def monitor(pvname=None, char_value=None, **kwargs):
print('--main:monitor %s=%s' % (pvname, char_value))
print('--main:')
pv1 = epics.get_pv(PVN1)
print('--main:init %s=%s' % (PVN1, pv1.get()))
pv1.add_callback(callback=monitor)
try:
proc1 = epics.CAProcess(target=subprocess,
args=(PVN1, PVN2))
proc1.start()
proc1.join()
except KeyboardInterrupt:
print('--main: killing subprocess')
proc1.terminate()
print('--main: subprocess complete')
time.sleep(0.5)
print('--main:final %s=%s' % (PVN1, pv1.get()))
here, the main process and the subprocess can each interact with the same
PV, though they need to create a separate connection (here, using PV
)
in each process.
Note that different CAProcess
instances can communicate via
standard multiprocessing.Queue
. At this writing, no testing has
been done on using multiprocessing Managers.