For a driver development I am doing there is also a DMA needed. I used the white-paper how to develop a driver with VISA for PCIe/PXIe. (http://www.ni.com/white-paper/3251/en/)
An important note is missing in the white paper and also in the description of the VI "VISA Memory Allocation". The memory allocation is aligning at the 4096 boundary, but first it uses 8 bytes for metadata, so the offset you receive is not aligned at 4KB.
In the hardware description of my DMA controller is stated:
Note that the bus logical address must be 32-bit double word aligned, and all DMA transfer must not cross the naturally aligned 4096 byte boundary as specified by the PCI Express based specification 1.1 (i.e. start address plus transfer length cannot cross whole multiples of 4096).
Because the transfer size of my card is always 4096 bytes and I used the offset given by the memalloc vi, it didn't work and it did cost many hours of searching around because the memory allocation seems to be transparent.
So I think this undocumented nasty feature should be documented and maybe a Boolean can be added to choose if the offset should be aligned to the pagesize. The problem can be avoided with allocating 4096 bytes more than you need, and search the right boundary yourself. Like this LabVIEW can implement it also, the first page is used for metadata.
But it should be at least documented.