jro_hard/gen_acq Commit - r128:129 · Jicamarca Repository

aaguilar -

r128:129

parent child

Context file:

r128:129

Collapse all files

trunk/firmware/referencias/20140429-dma.pdf created 10644 0 0

NO CONTENT: new file 10644

trunk/firmware/referencias/DMA-API.txt created 10644 +704 0

This diff has been collapsed as it changes many lines, (704 lines changed) Show them Hide them
		@@ -0,0 +1,704
	1	Dynamic DMA mapping using the generic device
	2	============================================
	3
	4	James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
	5
	6	This document describes the DMA API. For a more gentle introduction
	7	of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt.
	8
	9	This API is split into two pieces. Part I describes the basic API.
	10	Part II describes extensions for supporting non-consistent memory
	11	machines. Unless you know that your driver absolutely has to support
	12	non-consistent platforms (this is usually only legacy platforms) you
	13	should only use the API described in part I.
	14
	15	Part I - dma_ API
	16	-------------------------------------
	17
	18	To get the dma_ API, you must #include <linux/dma-mapping.h>. This
	19	provides dma_addr_t and the interfaces described below.
	20
	21	A dma_addr_t can hold any valid DMA or bus address for the platform. It
	22	can be given to a device to use as a DMA source or target. A CPU cannot
	23	reference a dma_addr_t directly because there may be translation between
	24	its physical address space and the bus address space.
	25
	26	Part Ia - Using large DMA-coherent buffers
	27	------------------------------------------
	28
	29	void *
	30	dma_alloc_coherent(struct device *dev, size_t size,
	31	dma_addr_t *dma_handle, gfp_t flag)
	32
	33	Consistent memory is memory for which a write by either the device or
	34	the processor can immediately be read by the processor or device
	35	without having to worry about caching effects. (You may however need
	36	to make sure to flush the processor's write buffers before telling
	37	devices to read that memory.)
	38
	39	This routine allocates a region of <size> bytes of consistent memory.
	40
	41	It returns a pointer to the allocated region (in the processor's virtual
	42	address space) or NULL if the allocation failed.
	43
	44	It also returns a <dma_handle> which may be cast to an unsigned integer the
	45	same width as the bus and given to the device as the bus address base of
	46	the region.
	47
	48	Note: consistent memory can be expensive on some platforms, and the
	49	minimum allocation length may be as big as a page, so you should
	50	consolidate your requests for consistent memory as much as possible.
	51	The simplest way to do that is to use the dma_pool calls (see below).
	52
	53	The flag parameter (dma_alloc_coherent() only) allows the caller to
	54	specify the GFP_ flags (see kmalloc()) for the allocation (the
	55	implementation may choose to ignore flags that affect the location of
	56	the returned memory, like GFP_DMA).
	57
	58	void *
	59	dma_zalloc_coherent(struct device *dev, size_t size,
	60	dma_addr_t *dma_handle, gfp_t flag)
	61
	62	Wraps dma_alloc_coherent() and also zeroes the returned memory if the
	63	allocation attempt succeeded.
	64
	65	void
	66	dma_free_coherent(struct device dev, size_t size, void cpu_addr,
	67	dma_addr_t dma_handle)
	68
	69	Free a region of consistent memory you previously allocated. dev,
	70	size and dma_handle must all be the same as those passed into
	71	dma_alloc_coherent(). cpu_addr must be the virtual address returned by
	72	the dma_alloc_coherent().
	73
	74	Note that unlike their sibling allocation calls, these routines
	75	may only be called with IRQs enabled.
	76
	77
	78	Part Ib - Using small DMA-coherent buffers
	79	------------------------------------------
	80
	81	To get this part of the dma_ API, you must #include <linux/dmapool.h>
	82
	83	Many drivers need lots of small DMA-coherent memory regions for DMA
	84	descriptors or I/O buffers. Rather than allocating in units of a page
	85	or more using dma_alloc_coherent(), you can use DMA pools. These work
	86	much like a struct kmem_cache, except that they use the DMA-coherent allocator,
	87	not __get_free_pages(). Also, they understand common hardware constraints
	88	for alignment, like queue heads needing to be aligned on N-byte boundaries.
	89
	90
	91	struct dma_pool *
	92	dma_pool_create(const char name, struct device dev,
	93	size_t size, size_t align, size_t alloc);
	94
	95	dma_pool_create() initializes a pool of DMA-coherent buffers
	96	for use with a given device. It must be called in a context which
	97	can sleep.
	98
	99	The "name" is for diagnostics (like a struct kmem_cache name); dev and size
	100	are like what you'd pass to dma_alloc_coherent(). The device's hardware
	101	alignment requirement for this type of data is "align" (which is expressed
	102	in bytes, and must be a power of two). If your device has no boundary
	103	crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
	104	from this pool must not cross 4KByte boundaries.
	105
	106
	107	void dma_pool_alloc(struct dma_pool pool, gfp_t gfp_flags,
	108	dma_addr_t *dma_handle);
	109
	110	This allocates memory from the pool; the returned memory will meet the
	111	size and alignment requirements specified at creation time. Pass
	112	GFP_ATOMIC to prevent blocking, or if it's permitted (not
	113	in_interrupt, not holding SMP locks), pass GFP_KERNEL to allow
	114	blocking. Like dma_alloc_coherent(), this returns two values: an
	115	address usable by the CPU, and the DMA address usable by the pool's
	116	device.
	117
	118
	119	void dma_pool_free(struct dma_pool pool, void vaddr,
	120	dma_addr_t addr);
	121
	122	This puts memory back into the pool. The pool is what was passed to
	123	dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what
	124	were returned when that routine allocated the memory being freed.
	125
	126
	127	void dma_pool_destroy(struct dma_pool *pool);
	128
	129	dma_pool_destroy() frees the resources of the pool. It must be
	130	called in a context which can sleep. Make sure you've freed all allocated
	131	memory back to the pool before you destroy it.
	132
	133
	134	Part Ic - DMA addressing limitations
	135	------------------------------------
	136
	137	int
	138	dma_supported(struct device *dev, u64 mask)
	139
	140	Checks to see if the device can support DMA to the memory described by
	141	mask.
	142
	143	Returns: 1 if it can and 0 if it can't.
	144
	145	Notes: This routine merely tests to see if the mask is possible. It
	146	won't change the current mask settings. It is more intended as an
	147	internal API for use by the platform than an external API for use by
	148	driver writers.
	149
	150	int
	151	dma_set_mask_and_coherent(struct device *dev, u64 mask)
	152
	153	Checks to see if the mask is possible and updates the device
	154	streaming and coherent DMA mask parameters if it is.
	155
	156	Returns: 0 if successful and a negative error if not.
	157
	158	int
	159	dma_set_mask(struct device *dev, u64 mask)
	160
	161	Checks to see if the mask is possible and updates the device
	162	parameters if it is.
	163
	164	Returns: 0 if successful and a negative error if not.
	165
	166	int
	167	dma_set_coherent_mask(struct device *dev, u64 mask)
	168
	169	Checks to see if the mask is possible and updates the device
	170	parameters if it is.
	171
	172	Returns: 0 if successful and a negative error if not.
	173
	174	u64
	175	dma_get_required_mask(struct device *dev)
	176
	177	This API returns the mask that the platform requires to
	178	operate efficiently. Usually this means the returned mask
	179	is the minimum required to cover all of memory. Examining the
	180	required mask gives drivers with variable descriptor sizes the
	181	opportunity to use smaller descriptors as necessary.
	182
	183	Requesting the required mask does not alter the current mask. If you
	184	wish to take advantage of it, you should issue a dma_set_mask()
	185	call to set the mask to the value returned.
	186
	187
	188	Part Id - Streaming DMA mappings
	189	--------------------------------
	190
	191	dma_addr_t
	192	dma_map_single(struct device dev, void cpu_addr, size_t size,
	193	enum dma_data_direction direction)
	194
	195	Maps a piece of processor virtual memory so it can be accessed by the
	196	device and returns the bus address of the memory.
	197
	198	The direction for both APIs may be converted freely by casting.
	199	However the dma_ API uses a strongly typed enumerator for its
	200	direction:
	201
	202	DMA_NONE no direction (used for debugging)
	203	DMA_TO_DEVICE data is going from the memory to the device
	204	DMA_FROM_DEVICE data is coming from the device to the memory
	205	DMA_BIDIRECTIONAL direction isn't known
	206
	207	Notes: Not all memory regions in a machine can be mapped by this API.
	208	Further, contiguous kernel virtual space may not be contiguous as
	209	physical memory. Since this API does not provide any scatter/gather
	210	capability, it will fail if the user tries to map a non-physically
	211	contiguous piece of memory. For this reason, memory to be mapped by
	212	this API should be obtained from sources which guarantee it to be
	213	physically contiguous (like kmalloc).
	214
	215	Further, the bus address of the memory must be within the
	216	dma_mask of the device (the dma_mask is a bit mask of the
	217	addressable region for the device, i.e., if the bus address of
	218	the memory ANDed with the dma_mask is still equal to the bus
	219	address, then the device can perform DMA to the memory). To
	220	ensure that the memory allocated by kmalloc is within the dma_mask,
	221	the driver may specify various platform-dependent flags to restrict
	222	the bus address range of the allocation (e.g., on x86, GFP_DMA
	223	guarantees to be within the first 16MB of available bus addresses,
	224	as required by ISA devices).
	225
	226	Note also that the above constraints on physical contiguity and
	227	dma_mask may not apply if the platform has an IOMMU (a device which
	228	maps an I/O bus address to a physical memory address). However, to be
	229	portable, device driver writers may not assume that such an IOMMU
	230	exists.
	231
	232	Warnings: Memory coherency operates at a granularity called the cache
	233	line width. In order for memory mapped by this API to operate
	234	correctly, the mapped region must begin exactly on a cache line
	235	boundary and end exactly on one (to prevent two separately mapped
	236	regions from sharing a single cache line). Since the cache line size
	237	may not be known at compile time, the API will not enforce this
	238	requirement. Therefore, it is recommended that driver writers who
	239	don't take special care to determine the cache line size at run time
	240	only map virtual regions that begin and end on page boundaries (which
	241	are guaranteed also to be cache line boundaries).
	242
	243	DMA_TO_DEVICE synchronisation must be done after the last modification
	244	of the memory region by the software and before it is handed off to
	245	the driver. Once this primitive is used, memory covered by this
	246	primitive should be treated as read-only by the device. If the device
	247	may write to it at any point, it should be DMA_BIDIRECTIONAL (see
	248	below).
	249
	250	DMA_FROM_DEVICE synchronisation must be done before the driver
	251	accesses data that may be changed by the device. This memory should
	252	be treated as read-only by the driver. If the driver needs to write
	253	to it at any point, it should be DMA_BIDIRECTIONAL (see below).
	254
	255	DMA_BIDIRECTIONAL requires special handling: it means that the driver
	256	isn't sure if the memory was modified before being handed off to the
	257	device and also isn't sure if the device will also modify it. Thus,
	258	you must always sync bidirectional memory twice: once before the
	259	memory is handed off to the device (to make sure all memory changes
	260	are flushed from the processor) and once before the data may be
	261	accessed after being used by the device (to make sure any processor
	262	cache lines are updated with data that the device may have changed).
	263
	264	void
	265	dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
	266	enum dma_data_direction direction)
	267
	268	Unmaps the region previously mapped. All the parameters passed in
	269	must be identical to those passed in (and returned) by the mapping
	270	API.
	271
	272	dma_addr_t
	273	dma_map_page(struct device dev, struct page page,
	274	unsigned long offset, size_t size,
	275	enum dma_data_direction direction)
	276	void
	277	dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
	278	enum dma_data_direction direction)
	279
	280	API for mapping and unmapping for pages. All the notes and warnings
	281	for the other mapping APIs apply here. Also, although the <offset>
	282	and <size> parameters are provided to do partial page mapping, it is
	283	recommended that you never use these unless you really know what the
	284	cache width is.
	285
	286	int
	287	dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
	288
	289	In some circumstances dma_map_single() and dma_map_page() will fail to create
	290	a mapping. A driver can check for these errors by testing the returned
	291	DMA address with dma_mapping_error(). A non-zero return value means the mapping
	292	could not be created and the driver should take appropriate action (e.g.
	293	reduce current DMA mapping usage or delay and try again later).
	294
	295	int
	296	dma_map_sg(struct device dev, struct scatterlist sg,
	297	int nents, enum dma_data_direction direction)
	298
	299	Returns: the number of bus address segments mapped (this may be shorter
	300	than <nents> passed in if some elements of the scatter/gather list are
	301	physically or virtually adjacent and an IOMMU maps them with a single
	302	entry).
	303
	304	Please note that the sg cannot be mapped again if it has been mapped once.
	305	The mapping process is allowed to destroy information in the sg.
	306
	307	As with the other mapping interfaces, dma_map_sg() can fail. When it
	308	does, 0 is returned and a driver must take appropriate action. It is
	309	critical that the driver do something, in the case of a block driver
	310	aborting the request or even oopsing is better than doing nothing and
	311	corrupting the filesystem.
	312
	313	With scatterlists, you use the resulting mapping like this:
	314
	315	int i, count = dma_map_sg(dev, sglist, nents, direction);
	316	struct scatterlist *sg;
	317
	318	for_each_sg(sglist, sg, count, i) {
	319	hw_address[i] = sg_dma_address(sg);
	320	hw_len[i] = sg_dma_len(sg);
	321	}
	322
	323	where nents is the number of entries in the sglist.
	324
	325	The implementation is free to merge several consecutive sglist entries
	326	into one (e.g. with an IOMMU, or if several pages just happen to be
	327	physically contiguous) and returns the actual number of sg entries it
	328	mapped them to. On failure 0, is returned.
	329
	330	Then you should loop count times (note: this can be less than nents times)
	331	and use sg_dma_address() and sg_dma_len() macros where you previously
	332	accessed sg->address and sg->length as shown above.
	333
	334	void
	335	dma_unmap_sg(struct device dev, struct scatterlist sg,
	336	int nhwentries, enum dma_data_direction direction)
	337
	338	Unmap the previously mapped scatter/gather list. All the parameters
	339	must be the same as those and passed in to the scatter/gather mapping
	340	API.
	341
	342	Note: <nents> must be the number you passed in, not the number of
	343	bus address entries returned.
	344
	345	void
	346	dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
	347	enum dma_data_direction direction)
	348	void
	349	dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size,
	350	enum dma_data_direction direction)
	351	void
	352	dma_sync_sg_for_cpu(struct device dev, struct scatterlist sg, int nelems,
	353	enum dma_data_direction direction)
	354	void
	355	dma_sync_sg_for_device(struct device dev, struct scatterlist sg, int nelems,
	356	enum dma_data_direction direction)
	357
	358	Synchronise a single contiguous or scatter/gather mapping for the CPU
	359	and device. With the sync_sg API, all the parameters must be the same
	360	as those passed into the single mapping API. With the sync_single API,
	361	you can use dma_handle and size parameters that aren't identical to
	362	those passed into the single mapping API to do a partial sync.
	363
	364	Notes: You must do this:
	365
	366	- Before reading values that have been written by DMA from the device
	367	(use the DMA_FROM_DEVICE direction)
	368	- After writing values that will be written to the device using DMA
	369	(use the DMA_TO_DEVICE) direction
	370	- before and after handing memory to the device if the memory is
	371	DMA_BIDIRECTIONAL
	372
	373	See also dma_map_single().
	374
	375	dma_addr_t
	376	dma_map_single_attrs(struct device dev, void cpu_addr, size_t size,
	377	enum dma_data_direction dir,
	378	struct dma_attrs *attrs)
	379
	380	void
	381	dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr,
	382	size_t size, enum dma_data_direction dir,
	383	struct dma_attrs *attrs)
	384
	385	int
	386	dma_map_sg_attrs(struct device dev, struct scatterlist sgl,
	387	int nents, enum dma_data_direction dir,
	388	struct dma_attrs *attrs)
	389
	390	void
	391	dma_unmap_sg_attrs(struct device dev, struct scatterlist sgl,
	392	int nents, enum dma_data_direction dir,
	393	struct dma_attrs *attrs)
	394
	395	The four functions above are just like the counterpart functions
	396	without the _attrs suffixes, except that they pass an optional
	397	struct dma_attrs*.
	398
	399	struct dma_attrs encapsulates a set of "DMA attributes". For the
	400	definition of struct dma_attrs see linux/dma-attrs.h.
	401
	402	The interpretation of DMA attributes is architecture-specific, and
	403	each attribute should be documented in Documentation/DMA-attributes.txt.
	404
	405	If struct dma_attrs* is NULL, the semantics of each of these
	406	functions is identical to those of the corresponding function
	407	without the _attrs suffix. As a result dma_map_single_attrs()
	408	can generally replace dma_map_single(), etc.
	409
	410	As an example of the use of the *_attrs functions, here's how
	411	you could pass an attribute DMA_ATTR_FOO when mapping memory
	412	for DMA:
	413
	414	#include <linux/dma-attrs.h>
	415	/* DMA_ATTR_FOO should be defined in linux/dma-attrs.h and
	416	* documented in Documentation/DMA-attributes.txt */
	417	...
	418
	419	DEFINE_DMA_ATTRS(attrs);
	420	dma_set_attr(DMA_ATTR_FOO, &attrs);
	421	....
	422	n = dma_map_sg_attrs(dev, sg, nents, DMA_TO_DEVICE, &attr);
	423	....
	424
	425	Architectures that care about DMA_ATTR_FOO would check for its
	426	presence in their implementations of the mapping and unmapping
	427	routines, e.g.:
	428
	429	void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
	430	size_t size, enum dma_data_direction dir,
	431	struct dma_attrs *attrs)
	432	{
	433	....
	434	int foo = dma_get_attr(DMA_ATTR_FOO, attrs);
	435	....
	436	if (foo)
	437	/* twizzle the frobnozzle */
	438	....
	439
	440
	441	Part II - Advanced dma_ usage
	442	-----------------------------
	443
	444	Warning: These pieces of the DMA API should not be used in the
	445	majority of cases, since they cater for unlikely corner cases that
	446	don't belong in usual drivers.
	447
	448	If you don't understand how cache line coherency works between a
	449	processor and an I/O device, you should not be using this part of the
	450	API at all.
	451
	452	void *
	453	dma_alloc_noncoherent(struct device *dev, size_t size,
	454	dma_addr_t *dma_handle, gfp_t flag)
	455
	456	Identical to dma_alloc_coherent() except that the platform will
	457	choose to return either consistent or non-consistent memory as it sees
	458	fit. By using this API, you are guaranteeing to the platform that you
	459	have all the correct and necessary sync points for this memory in the
	460	driver should it choose to return non-consistent memory.
	461
	462	Note: where the platform can return consistent memory, it will
	463	guarantee that the sync points become nops.
	464
	465	Warning: Handling non-consistent memory is a real pain. You should
	466	only use this API if you positively know your driver will be
	467	required to work on one of the rare (usually non-PCI) architectures
	468	that simply cannot make consistent memory.
	469
	470	void
	471	dma_free_noncoherent(struct device dev, size_t size, void cpu_addr,
	472	dma_addr_t dma_handle)
	473
	474	Free memory allocated by the nonconsistent API. All parameters must
	475	be identical to those passed in (and returned by
	476	dma_alloc_noncoherent()).
	477
	478	int
	479	dma_get_cache_alignment(void)
	480
	481	Returns the processor cache alignment. This is the absolute minimum
	482	alignment and width that you must observe when either mapping
	483	memory or doing partial flushes.
	484
	485	Notes: This API may return a number larger than the actual cache
	486	line, but it will guarantee that one or more cache lines fit exactly
	487	into the width returned by this call. It will also always be a power
	488	of two for easy alignment.
	489
	490	void
	491	dma_cache_sync(struct device dev, void vaddr, size_t size,
	492	enum dma_data_direction direction)
	493
	494	Do a partial sync of memory that was allocated by
	495	dma_alloc_noncoherent(), starting at virtual address vaddr and
	496	continuing on for size. Again, you must observe the cache line
	497	boundaries when doing this.
	498
	499	int
	500	dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
	501	dma_addr_t device_addr, size_t size, int
	502	flags)
	503
	504	Declare region of memory to be handed out by dma_alloc_coherent() when
	505	it's asked for coherent memory for this device.
	506
	507	phys_addr is the CPU physical address to which the memory is currently
	508	assigned (this will be ioremapped so the CPU can access the region).
	509
	510	device_addr is the bus address the device needs to be programmed
	511	with to actually address this memory (this will be handed out as the
	512	dma_addr_t in dma_alloc_coherent()).
	513
	514	size is the size of the area (must be multiples of PAGE_SIZE).
	515
	516	flags can be ORed together and are:
	517
	518	DMA_MEMORY_MAP - request that the memory returned from
	519	dma_alloc_coherent() be directly writable.
	520
	521	DMA_MEMORY_IO - request that the memory returned from
	522	dma_alloc_coherent() be addressable using read()/write()/memcpy_toio() etc.
	523
	524	One or both of these flags must be present.
	525
	526	DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by
	527	dma_alloc_coherent of any child devices of this one (for memory residing
	528	on a bridge).
	529
	530	DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions.
	531	Do not allow dma_alloc_coherent() to fall back to system memory when
	532	it's out of memory in the declared region.
	533
	534	The return value will be either DMA_MEMORY_MAP or DMA_MEMORY_IO and
	535	must correspond to a passed in flag (i.e. no returning DMA_MEMORY_IO
	536	if only DMA_MEMORY_MAP were passed in) for success or zero for
	537	failure.
	538
	539	Note, for DMA_MEMORY_IO returns, all subsequent memory returned by
	540	dma_alloc_coherent() may no longer be accessed directly, but instead
	541	must be accessed using the correct bus functions. If your driver
	542	isn't prepared to handle this contingency, it should not specify
	543	DMA_MEMORY_IO in the input flags.
	544
	545	As a simplification for the platforms, only one such region of
	546	memory may be declared per device.
	547
	548	For reasons of efficiency, most platforms choose to track the declared
	549	region only at the granularity of a page. For smaller allocations,
	550	you should use the dma_pool() API.
	551
	552	void
	553	dma_release_declared_memory(struct device *dev)
	554
	555	Remove the memory region previously declared from the system. This
	556	API performs no in-use checking for this region and will return
	557	unconditionally having removed all the required structures. It is the
	558	driver's job to ensure that no parts of this memory region are
	559	currently in use.
	560
	561	void *
	562	dma_mark_declared_memory_occupied(struct device *dev,
	563	dma_addr_t device_addr, size_t size)
	564
	565	This is used to occupy specific regions of the declared space
	566	(dma_alloc_coherent() will hand out the first free region it finds).
	567
	568	device_addr is the device address of the region requested.
	569
	570	size is the size (and should be a page-sized multiple).
	571
	572	The return value will be either a pointer to the processor virtual
	573	address of the memory, or an error (via PTR_ERR()) if any part of the
	574	region is occupied.
	575
	576	Part III - Debug drivers use of the DMA-API
	577	-------------------------------------------
	578
	579	The DMA-API as described above has some constraints. DMA addresses must be
	580	released with the corresponding function with the same size for example. With
	581	the advent of hardware IOMMUs it becomes more and more important that drivers
	582	do not violate those constraints. In the worst case such a violation can
	583	result in data corruption up to destroyed filesystems.
	584
	585	To debug drivers and find bugs in the usage of the DMA-API checking code can
	586	be compiled into the kernel which will tell the developer about those
	587	violations. If your architecture supports it you can select the "Enable
	588	debugging of DMA-API usage" option in your kernel configuration. Enabling this
	589	option has a performance impact. Do not enable it in production kernels.
	590
	591	If you boot the resulting kernel will contain code which does some bookkeeping
	592	about what DMA memory was allocated for which device. If this code detects an
	593	error it prints a warning message with some details into your kernel log. An
	594	example warning message may look like this:
	595
	596	------------[ cut here ]------------
	597	WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448
	598	check_unmap+0x203/0x490()
	599	Hardware name:
	600	forcedeth 0000:00:08.0: DMA-API: device driver frees DMA memory with wrong
	601	function [device address=0x00000000640444be] [size=66 bytes] [mapped as
	602	single] [unmapped as page]
	603	Modules linked in: nfsd exportfs bridge stp llc r8169
	604	Pid: 0, comm: swapper Tainted: G W 2.6.28-dmatest-09289-g8bb99c0 #1
	605	Call Trace:
	606	<IRQ> [<ffffffff80240b22>] warn_slowpath+0xf2/0x130
	607	[<ffffffff80647b70>] _spin_unlock+0x10/0x30
	608	[<ffffffff80537e75>] usb_hcd_link_urb_to_ep+0x75/0xc0
	609	[<ffffffff80647c22>] _spin_unlock_irqrestore+0x12/0x40
	610	[<ffffffff8055347f>] ohci_urb_enqueue+0x19f/0x7c0
	611	[<ffffffff80252f96>] queue_work+0x56/0x60
	612	[<ffffffff80237e10>] enqueue_task_fair+0x20/0x50
	613	[<ffffffff80539279>] usb_hcd_submit_urb+0x379/0xbc0
	614	[<ffffffff803b78c3>] cpumask_next_and+0x23/0x40
	615	[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
	616	[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
	617	[<ffffffff803c7ea3>] check_unmap+0x203/0x490
	618	[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
	619	[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
	620	[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
	621	[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
	622	[<ffffffff8026ffe9>] handle_edge_irq+0xc9/0x150
	623	[<ffffffff8020e3ab>] do_IRQ+0xcb/0x1c0
	624	[<ffffffff8020c093>] ret_from_intr+0x0/0xa
	625	<EOI> <4>---[ end trace f6435a98e2a38c0e ]---
	626
	627	The driver developer can find the driver and the device including a stacktrace
	628	of the DMA-API call which caused this warning.
	629
	630	Per default only the first error will result in a warning message. All other
	631	errors will only silently counted. This limitation exist to prevent the code
	632	from flooding your kernel log. To support debugging a device driver this can
	633	be disabled via debugfs. See the debugfs interface documentation below for
	634	details.
	635
	636	The debugfs directory for the DMA-API debugging code is called dma-api/. In
	637	this directory the following files can currently be found:
	638
	639	dma-api/all_errors This file contains a numeric value. If this
	640	value is not equal to zero the debugging code
	641	will print a warning for every error it finds
	642	into the kernel log. Be careful with this
	643	option, as it can easily flood your logs.
	644
	645	dma-api/disabled This read-only file contains the character 'Y'
	646	if the debugging code is disabled. This can
	647	happen when it runs out of memory or if it was
	648	disabled at boot time
	649
	650	dma-api/error_count This file is read-only and shows the total
	651	numbers of errors found.
	652
	653	dma-api/num_errors The number in this file shows how many
	654	warnings will be printed to the kernel log
	655	before it stops. This number is initialized to
	656	one at system boot and be set by writing into
	657	this file
	658
	659	dma-api/min_free_entries
	660	This read-only file can be read to get the
	661	minimum number of free dma_debug_entries the
	662	allocator has ever seen. If this value goes
	663	down to zero the code will disable itself
	664	because it is not longer reliable.
	665
	666	dma-api/num_free_entries
	667	The current number of free dma_debug_entries
	668	in the allocator.
	669
	670	dma-api/driver-filter
	671	You can write a name of a driver into this file
	672	to limit the debug output to requests from that
	673	particular driver. Write an empty string to
	674	that file to disable the filter and see
	675	all errors again.
	676
	677	If you have this code compiled into your kernel it will be enabled by default.
	678	If you want to boot without the bookkeeping anyway you can provide
	679	'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
	680	Notice that you can not enable it again at runtime. You have to reboot to do
	681	so.
	682
	683	If you want to see debug messages only for a special device driver you can
	684	specify the dma_debug_driver=<drivername> parameter. This will enable the
	685	driver filter at boot time. The debug code will only print errors for that
	686	driver afterwards. This filter can be disabled or changed later using debugfs.
	687
	688	When the code disables itself at runtime this is most likely because it ran
	689	out of dma_debug_entries. These entries are preallocated at boot. The number
	690	of preallocated entries is defined per architecture. If it is too low for you
	691	boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
	692	architectural default.
	693
	694	void debug_dmap_mapping_error(struct device *dev, dma_addr_t dma_addr);
	695
	696	dma-debug interface debug_dma_mapping_error() to debug drivers that fail
	697	to check DMA mapping errors on addresses returned by dma_map_single() and
	698	dma_map_page() interfaces. This interface clears a flag set by
	699	debug_dma_map_page() to indicate that dma_mapping_error() has been called by
	700	the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
	701	this flag is still set, prints warning message that includes call trace that
	702	leads up to the unmap. This interface can be called from dma_mapping_error()
	703	routines to enable DMA mapping error check debugging.
	704

trunk/firmware/referencias/Raspberry-Pi-DMA-Example-master/LICENSE created 10644 +24 0

			@@ -0,0 +1,24
		1	This is free and unencumbered software released into the public domain.
		2
		3	Anyone is free to copy, modify, publish, use, compile, sell, or
		4	distribute this software, either in source code form or as a compiled
		5	binary, for any purpose, commercial or non-commercial, and by any
		6	means.
		7
		8	In jurisdictions that recognize copyright laws, the author or authors
		9	of this software dedicate any and all copyright interest in the
		10	software to the public domain. We make this dedication for the benefit
		11	of the public at large and to the detriment of our heirs and
		12	successors. We intend this dedication to be an overt act of
		13	relinquishment in perpetuity of all present and future rights to this
		14	software under copyright law.
		15
		16	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
		17	EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
		18	MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
		19	IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
		20	OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
		21	ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
		22	OTHER DEALINGS IN THE SOFTWARE.
		23
		24	For more information, please refer to <http://unlicense.org>

trunk/firmware/referencias/Raspberry-Pi-DMA-Example-master/Makefile created 10644 +14 0

			@@ -0,0 +1,14
		1	CC := gcc
		2	CFLAGS := -g
		3
		4	all: dma-example dma-gpio
		5
		6	dma-example: dma-example.c
		7	$(CC) $(CFLAGS) -o dma-example dma-example.c
		8	dma-gpio: dma-gpio.c
		9	$(CC) $(CFLAGS) -O2 -std=gnu99 -o dma-gpio dma-gpio.c -lrt
		10
		11	clean:
		12	rm -rf dma-example dma-gpio
		13
		14	.PHONY: clean

trunk/firmware/referencias/Raspberry-Pi-DMA-Example-master/README.md created 10644 +22 0

			@@ -0,0 +1,22
		1	Raspberry-Pi-DMA-Example
		2	========================
		3
		4	Simplest example of copying memory from one region to another using DMA ("Direct Memory Access") in userland
		5
		6	Just type `make`, and then `sudo ./dma-example` (must use sudo to get permissions for writing to DMA peripheral)
		7
		8	The example simply copies the string "hello world" from one place in memory to another through the use of the Raspberry Pi's DMA peripheral.
		9
		10	Run `sudo ./dma-gpio` for an example which toggles a GPIO output pin at 500Hz using DMA. This code (dma-gpio.c) creates a 8ms circular buffer of future output states for all 64 IOs and uses DMA to sequentially copy this buffer into the memory-mapped GPIO registers at a rate of 250,000 frames per second. This allows one to output precise waveforms to any GPIO pin without worrying about Linux task scheduling. The PWM peripheral is used for pacing the DMA transaction, so simultaneous audio output will likely cause errors. Heavy network or USB usage will decrease the timing accuracy for frame rates of 500,000+ fps, due to bus-contention, but even downloading a file at 1MB/sec only has a very small impact at 250,000 fps.
		11
		12	Some code, namely for translating virtual addresses to physical ones within dma-example.c, was based on that found in the Raspberry Pi FM Transmitter which I think is by either Oliver Mattos or Oskar Weigl, but their website has been down for a while now. Some of the code can still be found here: http://www.raspians.com/turning-the-raspberry-pi-into-an-fm-transmitter/
		13
		14	Problems
		15	======
		16
		17	The virtual->physical mapping function in `dma-example.c` is not cache-coherent. That means that the dma engine might see different data than the cpu. The equivalent functions in `dma-gpio.c` behave correctly, so it is only a matter of porting that code once I have time.
		18
		19	License
		20	======
		21
		22	I'm putting this code in the public domain. However, the two functions in dma-example.c - `makeVirtPhysPage` and `freeVirtPhysPage` - were based on code found in the FM Transmitter, which was GPL-licensed. If you want to use this code under a non-GPL license, I would recommend replacing those functions with your own code, just to be extra safe. Disclaimer: I am not a lawyer.

trunk/firmware/referencias/Raspberry-Pi-DMA-Example-master/dma-example.c created 10644 +261 0

			@@ -0,0 +1,261
		1	/*
		2	* https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example : DMA Raspberry Pi Examples
		3	* Author: Colin Wallace
		4
		5	This is free and unencumbered software released into the public domain.
		6
		7	Anyone is free to copy, modify, publish, use, compile, sell, or
		8	distribute this software, either in source code form or as a compiled
		9	binary, for any purpose, commercial or non-commercial, and by any
		10	means.
		11
		12	In jurisdictions that recognize copyright laws, the author or authors
		13	of this software dedicate any and all copyright interest in the
		14	software to the public domain. We make this dedication for the benefit
		15	of the public at large and to the detriment of our heirs and
		16	successors. We intend this dedication to be an overt act of
		17	relinquishment in perpetuity of all present and future rights to this
		18	software under copyright law.
		19
		20	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
		21	EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
		22	MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
		23	IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
		24	OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
		25	ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
		26	OTHER DEALINGS IN THE SOFTWARE.
		27
		28	For more information, please refer to <http://unlicense.org/>
		29	*/
		30
		31	/*
		32	* processor documentation is at: http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
		33	* pg 38 for DMA
		34	*/
		35
		36	#include <sys/mman.h> //for mmap
		37	#include <unistd.h> //for NULL
		38	#include <stdio.h> //for printf
		39	#include <stdlib.h> //for exit
		40	#include <fcntl.h> //for file opening
		41	#include <stdint.h> //for uint32_t
		42	#include <string.h> //for memset
		43
		44	#define PAGE_SIZE 4096 //mmap maps pages of memory, so we must give it multiples of this size
		45
		46	//physical addresses for the DMA peripherals, as found in the processor documentation:
		47	#define DMA_BASE 0x20007000
		48	//DMA Channel register sets (format of these registers is found in DmaChannelHeader struct):
		49	#define DMACH0 0x20007000
		50	#define DMACH1 0x20007100
		51	#define DMACH2 0x20007200
		52	#define DMACH3 0x20007300
		53	//...
		54	#define DMACH(n) (DMACH0 + (n)*0x100)
		55	//Each DMA channel has some associated registers, but only CS (control and status), CONBLK_AD (control block address), and DEBUG are writeable
		56	//DMA is started by writing address of the first Control Block to the DMA channel's CONBLK_AD register and then setting the ACTIVE bit inside the CS register (bit 0)
		57	//Note: DMA channels are connected directly to peripherals, so physical addresses should be used (affects control block's SOURCE, DEST and NEXTCONBK addresses).
		58	#define DMAENABLE 0x20007ff0 //bit 0 should be set to 1 to enable channel 0. bit 1 enables channel 1, etc.
		59
		60	//flags used in the DmaChannelHeader struct:
		61	#define DMA_CS_RESET (1<<31)
		62	#define DMA_CS_ACTIVE (1<<0)
		63
		64	#define DMA_DEBUG_READ_ERROR (1<<2)
		65	#define DMA_DEBUG_FIFO_ERROR (1<<1)
		66	#define DMA_DEBUG_READ_LAST_NOT_SET_ERROR (1<<0)
		67
		68	//flags used in the DmaControlBlock struct:
		69	#define DMA_CB_TI_DEST_INC (1<<4)
		70	#define DMA_CB_TI_SRC_INC (1<<8)
		71
		72	//set bits designated by (mask) at the address (dest) to (value), without affecting the other bits
		73	//eg if x = 0b11001100
		74	// writeBitmasked(&x, 0b00000110, 0b11110011),
		75	// then x now = 0b11001110
		76	void writeBitmasked(volatile uint32_t *dest, uint32_t mask, uint32_t value) {
		77	uint32_t cur = *dest;
		78	uint32_t new = (cur & (~mask)) \| (value & mask);
		79	*dest = new;
		80	*dest = new; //added safety for when crossing memory barriers.
		81	}
		82
		83	struct DmaChannelHeader {
		84	uint32_t CS; //Control and Status
		85	//31 RESET; set to 1 to reset DMA
		86	//30 ABORT; set to 1 to abort current DMA control block (next one will be loaded & continue)
		87	//29 DISDEBUG; set to 1 and DMA won't be paused when debug signal is sent
		88	//28 WAIT_FOR_OUTSTANDING_WRITES; set to 1 and DMA will wait until peripheral says all writes have gone through before loading next CB
		89	//24-74 reserved
		90	//20-23 PANIC_PRIORITY; 0 is lowest priority
		91	//16-19 PRIORITY; bus scheduling priority. 0 is lowest
		92	//9-15 reserved
		93	//8 ERROR; read as 1 when error is encountered. error can be found in DEBUG register.
		94	//7 reserved
		95	//6 WAITING_FOR_OUTSTANDING_WRITES; read as 1 when waiting for outstanding writes
		96	//5 DREQ_STOPS_DMA; read as 1 if DREQ is currently preventing DMA
		97	//4 PAUSED; read as 1 if DMA is paused
		98	//3 DREQ; copy of the data request signal from the peripheral, if DREQ is enabled. reads as 1 if data is being requested, else 0
		99	//2 INT; set when current CB ends and its INTEN=1. Write a 1 to this register to clear it
		100	//1 END; set when the transfer defined by current CB is complete. Write 1 to clear.
		101	//0 ACTIVE; write 1 to activate DMA (load the CB before hand)
		102	uint32_t CONBLK_AD; //Control Block Address
		103	uint32_t TI; //transfer information; see DmaControlBlock.TI for description
		104	uint32_t SOURCE_AD; //Source address
		105	uint32_t DEST_AD; //Destination address
		106	uint32_t TXFR_LEN; //transfer length.
		107	uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1
		108	uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words)
		109	uint32_t DEBUG; //controls debug settings
		110	};
		111
		112	struct DmaControlBlock {
		113	uint32_t TI; //transfer information
		114	//31:27 unused
		115	//26 NO_WIDE_BURSTS
		116	//21:25 WAITS; number of cycles to wait between each DMA read/write operation
		117	//16:20 PERMAP; peripheral number to be used for DREQ signal (pacing). set to 0 for unpaced DMA.
		118	//12:15 BURST_LENGTH
		119	//11 SRC_IGNORE; set to 1 to not perform reads. Used to manually fill caches
		120	//10 SRC_DREQ; set to 1 to have the DREQ from PERMAP gate requests.
		121	//9 SRC_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves
		122	//8 SRC_INC; set to 1 to automatically increment the source address after each read (you'll want this if you're copying a range of memory)
		123	//7 DEST_IGNORE; set to 1 to not perform writes.
		124	//6 DEST_DREG; set to 1 to have the DREQ from PERMAP gate writes
		125	//5 DEST_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves
		126	//4 DEST_INC; set to 1 to automatically increment the destination address after each read (Tyou'll want this if you're copying a range of memory)
		127	//3 WAIT_RESP; make DMA wait for a response from the peripheral during each write. Ensures multiple writes don't get stacked in the pipeline
		128	//2 unused (0)
		129	//1 TDMODE; set to 1 to enable 2D mode
		130	//0 INTEN; set to 1 to generate an interrupt upon completion
		131	uint32_t SOURCE_AD; //Source address
		132	uint32_t DEST_AD; //Destination address
		133	uint32_t TXFR_LEN; //transfer length.
		134	uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1
		135	uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words)
		136	uint32_t _reserved[2];
		137	};
		138
		139	//allocate a page & simultaneously determine its physical address.
		140	//virtAddr and physAddr are essentially passed by-reference.
		141	//this allows for:
		142	//void virt, phys;
		143	//makeVirtPhysPage(&virt, &phys)
		144	//now, virt[N] exists for 0 <= N < PAGE_SIZE,
		145	// and phys+N is the physical address for virt[N]
		146	//based on http://www.raspians.com/turning-the-raspberry-pi-into-an-fm-transmitter/
		147	void makeVirtPhysPage(void virtAddr, void physAddr) {
		148	*virtAddr = valloc(PAGE_SIZE); //allocate one page of RAM
		149
		150	//force page into RAM and then lock it there:
		151	((int)virtAddr)[0] = 1;
		152	mlock(*virtAddr, PAGE_SIZE);
		153	memset(*virtAddr, 0, PAGE_SIZE); //zero-fill the page for convenience
		154
		155	//Magic to determine the physical address for this page:
		156	uint64_t pageInfo;
		157	int file = open("/proc/self/pagemap", 'r');
		158	lseek(file, ((uint32_t)virtAddr)/PAGE_SIZE8, SEEK_SET);
		159	read(file, &pageInfo, 8);
		160
		161	physAddr = (void)(uint32_t)(pageInfo*PAGE_SIZE);
		162	printf("makeVirtPhysPage virtual to phys: %p -> %p\n", virtAddr, physAddr);
		163	}
		164
		165	//call with virtual address to deallocate a page allocated with makeVirtPhysPage
		166	void freeVirtPhysPage(void* virtAddr) {
		167	munlock(virtAddr, PAGE_SIZE);
		168	free(virtAddr);
		169	}
		170
		171	//map a physical address into our virtual address space. memfd is the file descriptor for /dev/mem
		172	volatile uint32_t* mapPeripheral(int memfd, int addr) {
		173	///dev/mem behaves as a file. We need to map that file into memory:
		174	void *mapped = mmap(NULL, PAGE_SIZE, PROT_READ\|PROT_WRITE, MAP_SHARED, memfd, addr);
		175	//now, *mapped = memory at physical address of addr.
		176	if (mapped == MAP_FAILED) {
		177	printf("failed to map memory (did you remember to run as root?)\n");
		178	exit(1);
		179	} else {
		180	printf("mapped: %p\n", mapped);
		181	}
		182	return (volatile uint32_t*)mapped;
		183	}
		184
		185	int main() {
		186	//cat /sys/module/dma/parameters/dmachans gives a bitmask of DMA channels that are not used by GPU. Results: ch 1, 3, 6, 7 are reserved.
		187	//dmesg \| grep "DMA"; results: Ch 2 is used by SDHC host
		188	//ch 0 is known to be used for graphics acceleration
		189	//Thus, applications can use ch 4, 5, or the LITE channels @ 8 and beyond.
		190	int dmaChNum = 5;
		191	//First, open the linux device, /dev/mem
		192	//dev/mem provides access to the physical memory of the entire processor+ram
		193	//This is needed because Linux uses virtual memory, thus the process's memory at 0x00000000 will NOT have the same contents as the physical memory at 0x00000000
		194	int memfd = open("/dev/mem", O_RDWR \| O_SYNC);
		195	if (memfd < 0) {
		196	printf("Failed to open /dev/mem (did you remember to run as root?)\n");
		197	exit(1);
		198	}
		199	//now map /dev/mem into memory, but only map specific peripheral sections:
		200	volatile uint32_t *dmaBaseMem = mapPeripheral(memfd, DMA_BASE);
		201
		202	//configure DMA:
		203	//allocate 1 page for the source and 1 page for the destination:
		204	void virtSrcPage, physSrcPage;
		205	makeVirtPhysPage(&virtSrcPage, &physSrcPage);
		206	void virtDestPage, physDestPage;
		207	makeVirtPhysPage(&virtDestPage, &physDestPage);
		208
		209	//write a few bytes to the source page:
		210	char srcArray = (char)virtSrcPage;
		211	srcArray[0] = 'h';
		212	srcArray[1] = 'e';
		213	srcArray[2] = 'l';
		214	srcArray[3] = 'l';
		215	srcArray[4] = 'o';
		216	srcArray[5] = ' ';
		217	srcArray[6] = 'w';
		218	srcArray[7] = 'o';
		219	srcArray[8] = 'r';
		220	srcArray[9] = 'l';
		221	srcArray[10] = 'd';
		222	srcArray[11] = 0; //null terminator used for printf call.
		223
		224	//allocate 1 page for the control blocks
		225	void virtCbPage, physCbPage;
		226	makeVirtPhysPage(&virtCbPage, &physCbPage);
		227
		228	//dedicate the first 8 words of this page to holding the cb.
		229	struct DmaControlBlock cb1 = (struct DmaControlBlock)virtCbPage;
		230
		231	//fill the control block:
		232	cb1->TI = DMA_CB_TI_SRC_INC \| DMA_CB_TI_DEST_INC; //after each byte copied, we want to increment the source and destination address of the copy, otherwise we'll be copying to the same address.
		233	cb1->SOURCE_AD = (uint32_t)physSrcPage; //set source and destination DMA address
		234	cb1->DEST_AD = (uint32_t)physDestPage;
		235	cb1->TXFR_LEN = 12; //transfer 12 bytes
		236	cb1->STRIDE = 0; //no 2D stride
		237	cb1->NEXTCONBK = 0; //no next control block
		238
		239	printf("destination was initially: '%s'\n", (char*)virtDestPage);
		240
		241	//enable DMA channel (it's probably already enabled, but we want to be sure):
		242	writeBitmasked(dmaBaseMem + DMAENABLE - DMA_BASE, 1 << dmaChNum, 1 << dmaChNum);
		243
		244	//configure the DMA header to point to our control block:
		245	volatile struct DmaChannelHeader dmaHeader = (volatile struct DmaChannelHeader)(dmaBaseMem + (DMACH(dmaChNum) - DMA_BASE)/4); //dmaBaseMem is a uint32_t ptr, so divide by 4 before adding byte offset
		246	dmaHeader->CS = DMA_CS_RESET; //make sure to disable dma first.
		247	sleep(1); //give time for the reset command to be handled.
		248	dmaHeader->DEBUG = DMA_DEBUG_READ_ERROR \| DMA_DEBUG_FIFO_ERROR \| DMA_DEBUG_READ_LAST_NOT_SET_ERROR; // clear debug error flags
		249	dmaHeader->CONBLK_AD = (uint32_t)physCbPage; //we have to point it to the PHYSICAL address of the control block (cb1)
		250	dmaHeader->CS = DMA_CS_ACTIVE; //set active bit, but everything else is 0.
		251
		252	sleep(1); //give time for copy to happen
		253
		254	printf("destination reads: '%s'\n", (char*)virtDestPage);
		255
		256	//cleanup
		257	freeVirtPhysPage(virtCbPage);
		258	freeVirtPhysPage(virtDestPage);
		259	freeVirtPhysPage(virtSrcPage);
		260	return 0;
		261	}

trunk/firmware/referencias/Raspberry-Pi-DMA-Example-master/dma-gpio.c created 10644 +829 0

This diff has been collapsed as it changes many lines, (829 lines changed) Show them Hide them
		@@ -0,0 +1,829
	1	/*
	2	* https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example : DMA Raspberry Pi Examples
	3	* Author: Colin Wallace
	4
	5	This is free and unencumbered software released into the public domain.
	6
	7	Anyone is free to copy, modify, publish, use, compile, sell, or
	8	distribute this software, either in source code form or as a compiled
	9	binary, for any purpose, commercial or non-commercial, and by any
	10	means.
	11
	12	In jurisdictions that recognize copyright laws, the author or authors
	13	of this software dedicate any and all copyright interest in the
	14	software to the public domain. We make this dedication for the benefit
	15	of the public at large and to the detriment of our heirs and
	16	successors. We intend this dedication to be an overt act of
	17	relinquishment in perpetuity of all present and future rights to this
	18	software under copyright law.
	19
	20	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
	21	EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
	22	MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
	23	IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
	24	OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
	25	ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
	26	OTHER DEALINGS IN THE SOFTWARE.
	27
	28	For more information, please refer to <http://unlicense.org/>
	29	*/
	30	/*
	31	* processor documentation is at: http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
	32	* pg 38 for DMA
	33	* pg 61 for DMA DREQ PERMAP
	34	* pg 89 for gpio
	35	* pg 119 for PCM
	36	* pg 138 for PWM
	37	* pg 172 for timer info
	38	* Addendum is http://www.scribd.com/doc/127599939/BCM2835-Audio-clocks
	39	*
	40	* A few annotations for GPIO/DMA/PWM are available here: https://github.com/626Pilot/RaspberryPi-NeoPixel-WS2812/blob/master/ws2812-RPi.c
	41	* https://github.com/metachris/raspberrypi-pwm/blob/master/rpio-pwm/rpio_pwm.c
	42	* https://github.com/richardghirst/PiBits/blob/master/ServoBlaster/user/servod.c
	43	*
	44	* Cache info can be found here: http://www.freelists.org/post/raspi-internals/caches,18
	45	* 0x00000000 - L1 & L2 cache
	46	* 0x40000000 - L2 cache coherent (ie L1 writes are propagated to L2?)
	47	* 0x80000000 - L2 cache only
	48	* 0xc0000000 - direct uncached
	49	*
	50	* Useful DMA timings, etc: http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=50
	51	*
	52	* The general idea is to have a buffer of N blocks, where each block is the same size as the gpio registers,
	53	* and have the DMA module continually copying the data in this buffer into those registers.
	54	* In this way, we can have (say) 32 blocks, and then be able to buffer the next 32 IO frames.
	55	*
	56	* How is DMA transfer rate controlled?
	57	* We can use the DREQ (data request) feature.
	58	* PWM supports a configurable data consumption clock (defaults to 100MHz)
	59	* PWM (and SPI, PCM) can fire a DREQ signal any time its fifo falls below a certain point.
	60	* But we are never filling the FIFO, so DREQ would be permanently high.
	61	* Could feed PWM with dummy data, and use 2 DMA channels (one to PWM, one to GPIO, both gated), but the write-time to GPIOs may vary from the PWM, so gating may be improper
	62	* Or we can use the WAITS portion of the CB header. This allows up to 31 cycle delay -> ~25MHz?
	63	* Will have to manually determine timing characteristics though.
	64	* Or use 2 dma channels:
	65	* Have one sending the data into PWM, which is DREQ limited
	66	* Have another copying from PWM Fifo to GPIOs at a non-limited rate. This is peripheral -> peripheral, so I think it will have its own data bus.
	67	* Unfortunately, the destination can only be one word. Luckily, we have 2 PWM channels - one for setting & one for clearing GPIOs. All gpios that are broken out into the header are in the first register (verified)
	68	* Sadly, it appears that the PWM FIFO cannot be read from. One can read the current PWM output, but only if the FIFO is disabled, in which case the DREQ is too.
	69	*
	70	**Or use 1 dma channel, but additionally write to a dreq-able peripheral (PWM):
	71	* By using control-blocks, one can copy a word to the GPIOs, then have the next CB copy a word to the PWM fifo, and repeat
	72	* By having BOTH control-blocks be dreq-limited by the PWM's dreq, they can BOTH be rate-limited.
	73	* PWM clock works as so: 500MHz / clock_div = PWM_BITRATE (note: bitrate!)
	74	* PWM_BITRATE / PWM_RNG1 = #of FIFO writes/sec
	75	* Max PWM_BITRATE = 25MHz
	76	* Also, dest_addr = 0x7e20b000 // the testbus interface which is a dump peripheral that goes nowhere (http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=25 )
	77	*
	78	* DMA Control Block layout:
	79	* repeat #srcBlock times:
	80	* 1.copy srcBlock to gpios
	81	* 2.zero srcBlock
	82	* 3.move byte to PWM (paced via DREQ)
	83	* These are largely redundant; it may be possible to use less memory (each cb uses 32 bytes of memory)
	84	*
	85	* Problem: each "frame" is currently 6 words (but the last word is padding), and 1 PAGE_SIZE is not an integer multiple of 6*4
	86	* Thus, the very last frame on each page cannot be used with DMA. Because of this, too, the virtual addressing of each frame is messed up - we must skip one frame per page.
	87	* One solution is to append 2 more pad words to each frame (so that it is 8 words in length). This fixes all issues, but increases ram usage and potentially cache problems (L2 is 128KB). However, since data reads are sequential, even if all data doesn't fit in cache, it will be prefetched.
	88	* Another solution is to decrease frame size to 4 words, and use 2 control blocks for each frame (thus eliminating the 1-byte padding in the center). This will have an even LARGER impact on ram usage - effectively using 20 words/frame vs current 14 words/frame & alternative 16words/frame
	89	* Another solution is to directly mix src data with CB data. Each CB has 2 words of padding, and a data frame is 5 words, and each CB must be aligned to 8 words. Therefore, the following is possible, assuming each frame requires 3 CBs:
	90	* CB1.1(padded) \| CB1.2(padded) \| CB1.3(padded) \| CB2.1(padded) \| CB2.2(padded) \| CB2.3(unpadded) \| SRC(5) \| SRC(5) <- uses 56 words per 2 frames
	91	* HOWEVER, PAGE_SIZE is not an integral multiple of 56 words
	92	* Although, 1 of those CBs (the one which zeros the previous source) could be shared amongst multiple frames - that is, only zero every, say, 4 frames. The effect is:
	93	* *32 words for 1 frame grouped (5 src words - 2 means pad to 8 words for src)
	94	* 48 words for 2 frames grouped (10 src words - 2 means pad to 8 words for src)
	95	* 72 words for 3 frames grouped (15 src words - 2 means pad to 16 words for src)
	96	* 96 words for 4 frames grouped (20 src words - 2 means pad to 24 words for src)
	97	* 112 words for 5 frames grouped(25 src words - 2 means pad to 24 words for src)
	98	* 136 words for 6 frames grouped(30 src words - 2 means pad to 32 words for src)
	99	* 160 words for 7 frames grouped(35 src words - 2 means pad to 40 words for src)
	100	* 176 words for 8 frames grouped(40 src words - 2 means pad to 40 words for src)
	101	* 200 words for 9 frames grouped(45 src words - 2 means pad to 48 words for src)
	102	* 216 words for 10frames grouped(50 src words - 2 means pad to 48 words for src)
	103	* 240 words for 11frames grouped(55 src words - 2 means pad to 56 words for src)
	104	* 264 words for 12frames grouped(60 src words - 2 means pad to 64 words for src)
	105	* ...432 words for 20frames grouped(100src words - 2 means pad to 104 words for src)
	106	* ...*512 words for 24frames grouped(120src words - 2 means pad to 120 words for src)
	107	* As can be seen, this still requires extra padding. Could do 128 words for 5 frames, or 256 words for 11 frames (23.3 words/frame), and that requires funky math.
	108	* The 24 frame option would work OK. 24 is a relatively easy number to work with, and 21.3 words/frame (limit is 21 words/frame)
	109	* Another solution is to use the 2D stride functionality. The source frame is really 4 words and the destination is really 2 words, a 1 word gap, and then the other 2 words. Thus 2d stride can be used to skip over that one word gap.
	110	*
	111	* How to determine the current source word being processed?
	112	* dma header points to the physical CONBLOCK_AD. This can be linked to the virtual source address via a map.
	113	* OR: STRIDE register is unused in 1D mode. Could write the src index that this block is linked to in that register. But then we can't use stride feature.
	114	* Rather, we can't use the stride feature on ONE cb per frame. So, use stride on the buffer->GPIO cb, and use the stride register to indicate index on the zeros-copy and the PWM cb. Can tell which CB we're looking at based on the 2DEN flag. If we're looking at the buffer->GPIO cb, then instead look at NEXTCON_BK
	115	* NOTE: if 2d stride is disabled, it appears that the DMA engine doesn't even load the STRIDE register (it's read as garbage). It may PERHAPS display the last loaded word.
	116	* Note: unused fields are read as "Don't care", meaning we can't use them to store user-data.
	117	*
	118	* http://www.raspberrypi.org/forums/viewtopic.php?f=44&t=26907
	119	* Says gpu halts all DMA for 16us every 500ms. To bypass, add 'disable_pvt=1' to /boot/cmdline.txt
	120	* http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=25
	121	* Says it's possible to get access to a 250MHz clock.
	122	* How to make DMA more consistent (ie reduce bus contention?):
	123	* disable interrupts 1 uS before any 'real' transaction, enable them afterwards
	124	* Make sure dummy writes DON'T READ FROM RAM (ie, use src_ignore = 1)
	125	* boot with disable_pvt=1 (prevents gpu from halting everything to adjust ram refresh rate twice per second) in /boot/cmdline.txt. Does this affect system stability?
	126	*/
	127
	128	#include <sys/mman.h> //for mmap
	129	#include <sys/time.h> //for timespec
	130	#include <time.h> //for timespec / nanosleep (need -std=gnu99)
	131	#include <signal.h> //for sigaction
	132	#include <unistd.h> //for NULL
	133	#include <stdio.h> //for printf
	134	#include <stdlib.h> //for exit, valloc
	135	//#include <malloc.h> //some implementations declare valloc inside malloc.h
	136	#include <fcntl.h> //for file opening
	137	#include <stdint.h> //for uint32_t
	138	#include <string.h> //for memset
	139	#include <errno.h> //for errno
	140	#include <pthread.h> //for pthread_setschedparam
	141
	142	//config settings:
	143	#define PWM_FIFO_SIZE 1 //The DMA transaction is paced through the PWM FIFO. The PWM FIFO consumes 1 word every N uS (set in clock settings). Once the fifo has fewer than PWM_FIFO_SIZE words available, it will request more data from DMA. Thus, a high buffer length will be more resistant to clock drift, but may occasionally request multiple frames in a short succession (faster than FRAME_PER_SEC) in the presence of bus contention, whereas a low buffer length will always space frames AT LEAST 1/FRAMES_PER_SEC seconds apart, but may experience clock drift.
	144	#define SOURCE_BUFFER_FRAMES 8192 //number of gpio timeslices to buffer. These are processed at ~1 million/sec. So 1000 framse is 1 ms. Using a power-of-two is a good idea as it simplifies some of the arithmetic (modulus operations)
	145	#define SCHED_PRIORITY 30 //Linux scheduler priority. Higher = more realtime
	146
	147	#define NOMINAL_CLOCK_FREQ 500000000 //PWM Clock runs at 500 MHz, unless overclocking
	148	#define BITS_PER_CLOCK 10 //# of bits to be used in each PWM cycle. Effectively acts as a clock divisor for us, since the PWM clock is in bits/second
	149	#define CLOCK_DIV 200 //# to divide the NOMINAL_CLOCK_FREQ by before passing it to the PWM peripheral.
	150	//gpio frames per second is a product of the nominal clock frequency divided by BITS_PER_CLOCK and divided again by CLOCK_DIV
	151	//At 500,000 frames/sec, memory bandwidth does not appear to be an issue (jitter of -1 to +2 uS)
	152	//attempting 1,000,000 frames/sec results in an actual 800,000 frames/sec, though with a lot of jitter.
	153	//Note that these numbers might very with heavy network or usb usage.
	154	// eg at 500,000 fps, with 1MB/sec network download, jitter is -1 to +30 uS
	155	// at 250,000 fps, with 1MB/sec network download, jitter is only -3 to +3 uS
	156	#define FRAMES_PER_SEC NOMINAL_CLOCK_FREQ/BITS_PER_CLOCK/CLOCK_DIV
	157	#define SEC_TO_FRAME(s) ((int64_t)(s)*FRAMES_PER_SEC)
	158	#define USEC_TO_FRAME(u) (SEC_TO_FRAME(u)/1000000)
	159	#define FRAME_TO_SEC(f) ((int64_t)(f)BITS_PER_CLOCKCLOCK_DIV/NOMINAL_CLOCK_FREQ)
	160	#define FRAME_TO_USEC(f) FRAME_TO_SEC((int64_t)(f)*1000000)
	161
	162	#define TIMER_BASE 0x20003000
	163	#define TIMER_CLO 0x00000004 //lower 32-bits of 1 MHz timer
	164	#define TIMER_CHI 0x00000008 //upper 32-bits
	165
	166
	167	#define GPIO_BASE 0x20200000 //base address of the GPIO control registers.
	168	#define GPIO_BASE_BUS 0x7E200000 //this is the physical bus address of the GPIO module. This is only used when other peripherals directly connected to the bus (like DMA) need to read/write the GPIOs
	169	#define PAGE_SIZE 4096 //mmap maps pages of memory, so we must give it multiples of this size
	170	#define GPFSEL0 0x00000000 //gpio function select. There are 6 of these (32 bit registers)
	171	#define GPFSEL1 0x00000004
	172	#define GPFSEL2 0x00000008
	173	#define GPFSEL3 0x0000000c
	174	#define GPFSEL4 0x00000010
	175	#define GPFSEL5 0x00000014
	176	//bits 2-0 of GPFSEL0: set to 000 to make Pin 0 an output. 001 is an input. Other combinations represent alternate functions
	177	//bits 3-5 are for pin 1.
	178	//...
	179	//bits 27-29 are for pin 9.
	180	//GPFSEL1 repeats, but bits 2-0 are Pin 10, 27-29 are pin 19.
	181	//...
	182	#define GPSET0 0x0000001C //GPIO Pin Output Set. There are 2 of these (32 bit registers)
	183	#define GPSET1 0x00000020
	184	//writing a '1' to bit N of GPSET0 makes that pin HIGH.
	185	//writing a '0' has no effect.
	186	//GPSET0[0-31] maps to pins 0-31
	187	//GPSET1[0-21] maps to pins 32-53
	188	#define GPCLR0 0x00000028 //GPIO Pin Output Clear. There are 2 of these (32 bits each)
	189	#define GPCLR1 0x0000002C
	190	//GPCLR acts the same way as GPSET, but clears the pin instead.
	191	#define GPLEV0 0x00000034 //GPIO Pin Level. There are 2 of these (32 bits each)
	192
	193	//physical addresses for the DMA peripherals, as found in the processor documentation:
	194	#define DMA_BASE 0x20007000
	195	#define DMACH(n) (0x100*(n))
	196	//DMA Channel register sets (format of these registers is found in DmaChannelHeader struct):
	197	//#define DMACH0 0x00000000
	198	//#define DMACH1 0x00000100
	199	//#define DMACH2 0x00000200
	200	//#define DMACH3 0x00000300
	201	//...
	202	//Each DMA channel has some associated registers, but only CS (control and status), CONBLK_AD (control block address), and DEBUG are writeable
	203	//DMA is started by writing address of the first Control Block to the DMA channel's CONBLK_AD register and then setting the ACTIVE bit inside the CS register (bit 0)
	204	//Note: DMA channels are connected directly to peripherals, so physical addresses should be used (affects control block's SOURCE, DEST and NEXTCONBK addresses).
	205	#define DMAENABLE 0x00000ff0 //bit 0 should be set to 1 to enable channel 0. bit 1 enables channel 1, etc.
	206
	207	//flags used in the DmaChannelHeader struct:
	208	#define DMA_CS_RESET (1<<31)
	209	#define DMA_CS_ABORT (1<<30)
	210	#define DMA_CS_DISDEBUG (1<<28) //DMA will not stop when debug signal is asserted
	211	#define DMA_CS_PRIORITY(x) ((x)&0xf << 16) //higher priority DMA transfers are serviced first, it would appear
	212	#define DMA_CS_PRIORITY_MAX DMA_CS_PRIORITY(7)
	213	#define DMA_CS_PANIC_PRIORITY(x) ((x)&0xf << 20)
	214	#define DMA_CS_PANIC_PRIORITY_MAX DMA_CS_PANIC_PRIORITY(7)
	215	#define DMA_CS_END (1<<1)
	216	#define DMA_CS_ACTIVE (1<<0)
	217
	218	#define DMA_DEBUG_READ_ERROR (1<<2)
	219	#define DMA_DEBUG_FIFO_ERROR (1<<1)
	220	#define DMA_DEBUG_READ_LAST_NOT_SET_ERROR (1<<0)
	221
	222	//flags used in the DmaControlBlock struct:
	223	#define DMA_CB_TI_NO_WIDE_BURSTS (1<<26)
	224	#define DMA_CB_TI_PERMAP_NONE (0<<16)
	225	#define DMA_CB_TI_PERMAP_DSI (1<<16)
	226	//... (more found on page 61 of BCM2835 pdf
	227	#define DMA_CB_TI_PERMAP_PWM (5<<16)
	228	//...
	229	#define DMA_CB_TI_SRC_DREQ (1<<10)
	230	#define DMA_CB_TI_SRC_INC (1<<8)
	231	#define DMA_CB_TI_DEST_DREQ (1<<6)
	232	#define DMA_CB_TI_DEST_INC (1<<4)
	233	#define DMA_CB_TI_TDMODE (1<<1)
	234
	235
	236	//https://dev.openwrt.org/browser/trunk/target/linux/brcm2708/patches-3.10/0070-bcm2708_fb-DMA-acceleration-for-fb_copyarea.patch?rev=39770 says that YLENGTH should actually be written as # of copies MINUS ONE
	237	#define DMA_CB_TXFR_LEN_YLENGTH(y) (((y-1)&0x4fff) << 16)
	238	#define DMA_CB_TXFR_LEN_XLENGTH(x) ((x)&0xffff)
	239	#define DMA_CB_TXFR_YLENGTH_MASK (0x4fff << 16)
	240	#define DMA_CB_STRIDE_D_STRIDE(x) (((x)&0xffff) << 16)
	241	#define DMA_CB_STRIDE_S_STRIDE(x) ((x)&0xffff)
	242
	243
	244	//Dma Control Blocks must be located at addresses that are multiples of 32 bytes
	245	#define DMA_CONTROL_BLOCK_ALIGNMENT 32
	246
	247	#define PWM_BASE 0x2020C000
	248	#define PWM_BASE_BUS 0x7E20C000
	249	#define PWM_CTL 0x00000000 //control register
	250	#define PWM_STA 0x00000004 //status register
	251	#define PWM_DMAC 0x00000008 //DMA control register
	252	#define PWM_RNG1 0x00000010 //channel 1 range register (# output bits to use per sample)
	253	#define PWM_DAT1 0x00000014 //channel 1 data
	254	#define PWM_FIF1 0x00000018 //channel 1 fifo (write to this register to queue an output)
	255	#define PWM_RNG2 0x00000020 //channel 2 range register
	256	#define PWM_DAT2 0x00000024 //channel 2 data
	257
	258	#define PWM_CTL_USEFIFO2 (1<<13)
	259	#define PWM_CTL_REPEATEMPTY2 (1<<10)
	260	#define PWM_CTL_ENABLE2 (1<<8)
	261	#define PWM_CTL_CLRFIFO (1<<6)
	262	#define PWM_CTL_USEFIFO1 (1<<5)
	263	#define PWM_CTL_REPEATEMPTY1 (1<<2)
	264	#define PWM_CTL_ENABLE1 (1<<0)
	265
	266	#define PWM_STA_BUSERR (1<<8)
	267	#define PWM_STA_GAPERRS (0xf << 4)
	268	#define PWM_STA_FIFOREADERR (1<<3)
	269	#define PWM_STA_FIFOWRITEERR (1<<2)
	270	#define PWM_STA_ERRS PWM_STA_BUSERR \| PWM_STA_GAPERRS \| PWM_STA_FIFOREADERR \| PWM_STA_FIFOWRITEERR
	271
	272	#define PWM_DMAC_EN (1<<31)
	273	#define PWM_DMAC_PANIC(P) (((P)&0xff)<<8)
	274	#define PWM_DMAC_DREQ(D) (((D)&0xff)<<0)
	275
	276	//The following is undocumented :( Taken from http://www.scribd.com/doc/127599939/BCM2835-Audio-clocks
	277	#define CLOCK_BASE 0x20101000
	278	#define CM_PWMCTL 0xa0
	279	#define CM_PWMDIV 0xa4
	280	//each write to CM_PWMTL and CM_PWMDIV requires the password to be written:
	281	#define CM_PWMCTL_PASSWD 0x5a000000
	282	#define CM_PWMDIV_PASSWD 0x5a000000
	283	//MASH is used to achieve fractional clock dividers by introducing artificial jitter.
	284	//if you want constant frequency (even if it may not be at 100% CORRECT frequency), use MASH0
	285	//if clock divisor is integral, then there's no need to use MASH, and anything above MASH1 can introduce jitter.
	286	#define CM_PWMCTL_MASH(x) (((x)&0x3) << 9)
	287	#define CM_PWMCTL_MASH0 CM_PWMTRL_MASH(0)
	288	#define CM_PWMCTL_MASH1 CM_PWMTRL_MASH(1)
	289	#define CM_PWMCTL_MASH2 CM_PWMTRL_MASH(2)
	290	#define CM_PWMCTL_MASH3 CM_PWMTRL_MASH(3)
	291	#define CM_PWMCTL_FLIP (1<<8) //use to inverse clock polarity
	292	#define CM_PWMCTL_BUSY (1<<7) //read-only flag that indicates clock generator is running.
	293	#define CM_PWMCTL_KILL (1<<5) //write a 1 to stop & reset clock generator. USED FOR DEBUG ONLY
	294	#define CM_PWMCTL_ENAB (1<<4) //gracefully stop/start clock generator. BUSY flag will go low once clock is off.
	295	#define CM_PWMCTL_SRC(x) ((x)&0xf) //clock source. 0=gnd. 1=oscillator. 2-3=debug. 4=PLLA per. 5=PLLC per. 6=PLLD per. 7=HDMI aux. 8-15=GND
	296	#define CM_PWMCTL_SRC_OSC CM_PWMCTL_SRC(1)
	297	#define CM_PWMCTL_SRC_PLLA CM_PWMCTL_SRC(4)
	298	#define CM_PWMCTL_SRC_PLLC CM_PWMCTL_SRC(5)
	299	#define CM_PWMCTL_SRC_PLLD CM_PWMCTL_SRC(6)
	300
	301	//max clock divisor is 4095
	302	#define CM_PWMDIV_DIVI(x) (((x)&0xfff) << 12)
	303	#define CM_PWMDIV_DIVF(x) ((x)&0xfff)
	304
	305	struct DmaChannelHeader {
	306	//Note: dma channels 7-15 are 'LITE' dma engines (or is it 8-15?), with reduced performance & functionality.
	307	//Note: only CS, CONBLK_AD and DEBUG are directly writeable
	308	volatile uint32_t CS; //Control and Status
	309	//31 RESET; set to 1 to reset DMA
	310	//30 ABORT; set to 1 to abort current DMA control block (next one will be loaded & continue)
	311	//29 DISDEBUG; set to 1 and DMA won't be paused when debug signal is sent
	312	//28 WAIT_FOR_OUTSTANDING_WRITES(0x10000000); set to 1 and DMA will wait until peripheral says all writes have gone through before loading next CB
	313	//24-74 reserved
	314	//20-23 PANIC_PRIORITY; 0 is lowest priority
	315	//16-19 PRIORITY; bus scheduling priority. 0 is lowest
	316	//9-15 reserved
	317	//8 ERROR; read as 1 when error is encountered. error can be found in DEBUG register.
	318	//7 reserved
	319	//6 WAITING_FOR_OUTSTANDING_WRITES; read as 1 when waiting for outstanding writes
	320	//5 DREQ_STOPS_DMA(0x20); read as 1 if DREQ is currently preventing DMA
	321	//4 PAUSED(0x10); read as 1 if DMA is paused
	322	//3 DREQ; copy of the data request signal from the peripheral, if DREQ is enabled. reads as 1 if data is being requested (or PERMAP=0), else 0
	323	//2 INT; set when current CB ends and its INTEN=1. Write a 1 to this register to clear it
	324	//1 END; set when the transfer defined by current CB is complete. Write 1 to clear.
	325	//0 ACTIVE(0x01); write 1 to activate DMA (load the CB before hand)
	326	volatile uint32_t CONBLK_AD; //Control Block Address
	327	volatile uint32_t TI; //transfer information; see DmaControlBlock.TI for description
	328	volatile uint32_t SOURCE_AD; //Source address
	329	volatile uint32_t DEST_AD; //Destination address
	330	volatile uint32_t TXFR_LEN; //transfer length. ONLY THE LOWER 16 BITS ARE USED IN LITE DMA ENGINES
	331	volatile uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1. NOT AVAILABLE IN LITE DMA ENGINES
	332	volatile uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words)
	333	volatile uint32_t DEBUG; //controls debug settings
	334	//29-31 unused
	335	//28 LITE (0x10000000)
	336	//25-27 VERSION
	337	//16-24 DMA_STATE (dma engine state machine)
	338	//8-15 DMA_ID (AXI bus id)
	339	//4-7 OUTSTANDING_WRITES
	340	//3 unused
	341	//2 READ_ERROR
	342	//1 WRITE_ERROR
	343	//0 READ_LAST_NOT_SET_ERROR
	344	};
	345	void logDmaChannelHeader(struct DmaChannelHeader *h) {
	346	printf("Dma Ch Header:\n CS: 0x%08x\n CONBLK_AD: 0x%08x\n TI: 0x%08x\n SOURCE_AD: 0x%08x\n DEST_AD: 0x%08x\n TXFR_LEN: %u\n STRIDE: 0x%08x\n NEXTCONBK: 0x%08x\n DEBUG: 0x%08x\n", h->CS, h->CONBLK_AD, h->TI, h->SOURCE_AD, h->DEST_AD, h->TXFR_LEN, h->STRIDE, h->NEXTCONBK, h->DEBUG);
	347	}
	348
	349	struct DmaControlBlock {
	350	volatile uint32_t TI; //transfer information
	351	//31:27 unused
	352	//26 NO_WIDE_BURSTS
	353	//21:25 WAITS; number of cycles to wait between each DMA read/write operation
	354	//16:20 PERMAP(0x000Y0000); peripheral number to be used for DREQ signal (pacing). set to 0 for unpaced DMA.
	355	//12:15 BURST_LENGTH
	356	//11 SRC_IGNORE; set to 1 to not perform reads. Used to manually fill caches
	357	//10 SRC_DREQ; set to 1 to have the DREQ from PERMAP gate requests.
	358	//9 SRC_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves
	359	//8 SRC_INC; set to 1 to automatically increment the source address after each read (you'll want this if you're copying a range of memory)
	360	//7 DEST_IGNORE; set to 1 to not perform writes.
	361	//6 DEST_DREQ; set to 1 to have the DREQ from PERMAP gate writes
	362	//5 DEST_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves
	363	//4 DEST_INC; set to 1 to automatically increment the destination address after each read (Tyou'll want this if you're copying a range of memory)
	364	//3 WAIT_RESP; make DMA wait for a response from the peripheral during each write. Ensures multiple writes don't get stacked in the pipeline
	365	//2 unused (0)
	366	//1 TDMODE; set to 1 to enable 2D mode
	367	//0 INTEN; set to 1 to generate an interrupt upon completion
	368	volatile uint32_t SOURCE_AD; //Source address
	369	volatile uint32_t DEST_AD; //Destination address
	370	volatile uint32_t TXFR_LEN; //transfer length.
	371	//in 2D mode, TXFR_LEN is separated into two half-words to indicate Y transfers of length X, and STRIDE is added to the src/dest address after each transfer of length X.
	372	//30:31 unused
	373	//16-29 YLENGTH
	374	//0-15 XLENGTH
	375	volatile uint32_t STRIDE; //2D Mode Stride (amount to increment/decrement src/dest after each 1d copy when in 2d mode). Only used if TI.TDMODE = 1
	376	//16-31 D_STRIDE; signed (2's complement) byte increment/decrement to apply to destination addr after each XLENGTH transfer
	377	//0-15 S_STRIDE; signed (2's complement) byte increment/decrement to apply to source addr after each XLENGTH transfer
	378	volatile uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words)
	379	uint32_t _reserved[2];
	380	};
	381
	382	void logDmaControlBlock(struct DmaControlBlock *b) {
	383	printf("Dma Control Block:\n TI: 0x%08x\n SOURCE_AD: 0x%08x\n DEST_AD: 0x%08x\n TXFR_LEN: 0x%08x\n STRIDE: 0x%08x\n NEXTCONBK: 0x%08x\n unused: 0x%08x %08x\n", b->TI, b->SOURCE_AD, b->DEST_AD, b->TXFR_LEN, b->STRIDE, b->NEXTCONBK, b->_reserved[0], b->_reserved[1]);
	384	}
	385
	386	struct PwmHeader {
	387	volatile uint32_t CTL; // 0x00000000 //control register
	388	//16-31 reserved
	389	//15 MSEN2 (0: PWM algorithm, 1:M/S transmission used)
	390	//14 reserved
	391	//13 USEF2 (0: data register is used for transmission, 1: FIFO is used for transmission)
	392	//12 POLA2 (0: 0=low, 1=high. 1: 0=high, 1=low (inversion))
	393	//11 SBIT2; defines the state of the output when no transmission is in place
	394	//10 RPTL2; 0: transmission interrupts when FIFO is empty. 1: last data in FIFO is retransmitted when FIFO is empty
	395	//9 MODE2; 0: PWM mode. 1: serializer mode
	396	//8 PWMEN2; 0: channel is disabled. 1: channel is enabled
	397	//7 MSEN1;
	398	//6 CLRF1; writing a 1 to this bit clears the channel 1 (and channel 2?) fifo
	399	//5 USEF1;
	400	//4 POLA1;
	401	//3 SBIT1;
	402	//2 RPTL1;
	403	//1 MODE1;
	404	//0 PWMEN1;
	405	volatile uint32_t STA; // 0x00000004 //status register
	406	//13-31 reserved
	407	//9-12 STA1-4; indicates whether each channel is transmitting
	408	//8 BERR; Bus Error Flag. Write 1 to clear
	409	//4-7 GAPO1-4; Gap Occured Flag. Write 1 to clear
	410	//3 RERR1; Fifo Read Error Flag (attempt to read empty fifo). Write 1 to clear
	411	//2 WERR1; Fifo Write Error Flag (attempt to write to full fifo). Write 1 to clear
	412	//1 EMPT1; Reads as 1 if fifo is empty
	413	//0 FULL1; Reads as 1 if fifo is full
	414	volatile uint32_t DMAC; // 0x00000008 //DMA control register
	415	//31 ENAB; set to 1 to enable DMA
	416	//16-30 reserved
	417	//8-15 PANIC; DMA threshold for panic signal
	418	//0-7 DREQ; DMA threshold for DREQ signal
	419	uint32_t _padding1;
	420	volatile uint32_t RNG1; // 0x00000010 //channel 1 range register (# output bits to use per sample)
	421	//0-31 PWM_RNGi; #of bits to modulate PWM. (eg if PWM_RNGi=1024, then each 32-bit sample sent through the FIFO will be modulated into 1024 bits.)
	422	volatile uint32_t DAT1; // 0x00000014 //channel 1 data
	423	//0-31 PWM_DATi; Stores the 32-bit data to be sent to the PWM controller ONLY WHEN USEFi=0 (FIFO is disabled)
	424	volatile uint32_t FIF1; // 0x00000018 //channel 1 fifo (write to this register to queue an output)
	425	//writing to this register will queue a sample into the fifo. If 2 channels are enabled, then each even sample (0-indexed) is sent to channel 1, and odd samples are sent to channel 2. WRITE-ONLY
	426	uint32_t _padding2;
	427	volatile uint32_t RNG2; // 0x00000020 //channel 2 range register
	428	volatile uint32_t DAT2; // 0x00000024 //channel 2 data
	429	//0-31 PWM_DATi; Stores the 32-bit data to be sent to the PWM controller ONLY WHEN USEFi=1 (FIFO is enabled). TODO: Typo???
	430	};
	431
	432	struct GpioBufferFrame {
	433	//custom structure used for storing the GPIO buffer.
	434	//These BufferFrame's are DMA'd into the GPIO memory, potentially using the DmaEngine's Stride facility
	435	uint32_t gpset[2];
	436	uint32_t gpclr[2];
	437	};
	438
	439	struct DmaChannelHeader *dmaHeader; //must be global for cleanup()
	440
	441	void setSchedPriority(int priority) {
	442	//In order to get the best timing at a decent queue size, we want the kernel to avoid interrupting us for long durations.
	443	//This is done by giving our process a high priority. Note, must run as super-user for this to work.
	444	struct sched_param sp;
	445	sp.sched_priority=priority;
	446	int ret;
	447	if (ret = pthread_setschedparam(pthread_self(), SCHED_FIFO, &sp)) {
	448	printf("Warning: pthread_setschedparam (increase thread priority) returned non-zero: %i\n", ret);
	449	}
	450	}
	451
	452	void writeBitmasked(volatile uint32_t *dest, uint32_t mask, uint32_t value) {
	453	//set bits designated by (mask) at the address (dest) to (value), without affecting the other bits
	454	//eg if x = 0b11001100
	455	// writeBitmasked(&x, 0b00000110, 0b11110011),
	456	// then x now = 0b11001110
	457	uint32_t cur = *dest;
	458	uint32_t new = (cur & (~mask)) \| (value & mask);
	459	*dest = new;
	460	*dest = new; //best to be safe when crossing memory boundaries
	461	}
	462
	463
	464	uint64_t readSysTime(volatile uint32_t *timerBaseMem) {
	465	return ((uint64_t)(timerBaseMem + TIMER_CHI/4) << 32) + (uint64_t)((timerBaseMem + TIMER_CLO/4));
	466	}
	467
	468	size_t ceilToPage(size_t size) {
	469	//round up to nearest page-size multiple
	470	if (size & (PAGE_SIZE-1)) {
	471	size += PAGE_SIZE - (size & (PAGE_SIZE-1));
	472	}
	473	return size;
	474	}
	475
	476	uintptr_t virtToPhys(void* virt, int pagemapfd) {
	477	uintptr_t pgNum = (uintptr_t)(virt)/PAGE_SIZE;
	478	int byteOffsetFromPage = (uintptr_t)(virt)%PAGE_SIZE;
	479	uint64_t physPage;
	480	///proc/self/pagemap is a uint64_t array where the index represents the virtual page number and the value at that index represents the physical page number.
	481	//So if virtual address is 0x1000000, read the value at array index 0x1000000/PAGE_SIZE and multiply that by PAGE_SIZE to get the physical address.
	482	//because files are bytestreams, one must explicitly multiply each byte index by 8 to treat it as a uint64_t array.
	483	int err = lseek(pagemapfd, pgNum*8, SEEK_SET);
	484	if (err != pgNum*8) {
	485	printf("WARNING: virtToPhys %p failed to seek (expected %i got %i. errno: %i)\n", virt, pgNum*8, err, errno);
	486	}
	487	read(pagemapfd, &physPage, 8);
	488	if (!physPage & (1ull<<63)) { //bit 63 is set to 1 if the page is present in ram
	489	printf("WARNING: virtToPhys %p has no physical address\n", virt);
	490	}
	491	physPage = physPage & ~(0x1ffull << 55); //bits 55-63 are flags.
	492	uintptr_t mapped = (uintptr_t)(physPage*PAGE_SIZE + byteOffsetFromPage);
	493	return mapped;
	494	}
	495
	496	uintptr_t virtToUncachedPhys(void *virt, int pagemapfd) {
	497	return virtToPhys(virt, pagemapfd) \| 0x40000000; //bus address of the ram is 0x40000000. With this binary-or, writes to the returned address will bypass the CPU (L1) cache, but not the L2 cache. 0xc0000000 should be the base address if L2 must also be bypassed. However, the DMA engine is aware of L2 cache - just not the L1 cache (source: http://en.wikibooks.org/wiki/Aros/Platforms/Arm_Raspberry_Pi_support#Framebuffer )
	498	}
	499
	500
	501	//allocate some memory and lock it so that its physical address will never change
	502	void* makeLockedMem(size_t size) {
	503	//void* mem = valloc(size); //memory returned by valloc is not zero'd
	504	size = ceilToPage(size);
	505	void *mem = mmap(
	506	NULL, //let kernel place memory where it wants
	507	size, //length
	508	PROT_WRITE \| PROT_READ, //ask for read and write permissions to memory
	509	MAP_SHARED \|
	510	MAP_ANONYMOUS \| //no underlying file; initialize to 0
	511	MAP_NORESERVE \| //don't reserve swap space
	512	MAP_LOCKED, //lock into virtual ram. Physical ram may still change!
	513	-1, // File descriptor
	514	0); //no offset into file (file doesn't exist).
	515	if (mem == MAP_FAILED) {
	516	printf("makeLockedMem failed\n");
	517	exit(1);
	518	}
	519	memset(mem, 0, size); //simultaneously zero the pages and force them into memory
	520	mlock(mem, size);
	521	return mem;
	522	}
	523
	524	//free memory allocated with makeLockedMem
	525	void freeLockedMem(void* mem, size_t size) {
	526	size = ceilToPage(size);
	527	munlock(mem, size);
	528	munmap(mem, size);
	529	}
	530
	531	void* makeUncachedMemView(void* virtaddr, size_t bytes, int memfd, int pagemapfd) {
	532	//by default, writing to any virtual address will go through the CPU cache.
	533	//this function will return a pointer that behaves the same as virtaddr, but bypasses the CPU L1 cache (note that because of this, the returned pointer and original pointer should not be used in conjunction, else cache-related inconsistencies will arise)
	534	//Note: The original memory should not be unmapped during the lifetime of the uncached version, as then the OS won't know that our process still owns the physical memory.
	535	bytes = ceilToPage(bytes);
	536	//first, just allocate enough virtual memory for the operation. This is done so that we can do the later mapping to a contiguous range of virtual memory:
	537	void *mem = mmap(
	538	NULL, //let kernel place memory where it wants
	539	bytes, //length
	540	PROT_WRITE \| PROT_READ, //ask for read and write permissions to memory
	541	MAP_SHARED \|
	542	MAP_ANONYMOUS \| //no underlying file; initialize to 0
	543	MAP_NORESERVE \| //don't reserve swap space
	544	MAP_LOCKED, //lock into virtual ram. Physical ram may still change!
	545	-1, // File descriptor
	546	0); //no offset into file (file doesn't exist).
	547	//now, free the virtual memory and immediately remap it to the physical addresses used in virtaddr
	548	munmap(mem, bytes); //Might not be necessary; MAP_FIXED indicates it can map an already-used page
	549	for (int offset=0; offset<bytes; offset += PAGE_SIZE) {
	550	void *mappedPage = mmap(mem+offset, PAGE_SIZE, PROT_WRITE\|PROT_READ, MAP_SHARED\|MAP_FIXED\|MAP_NORESERVE\|MAP_LOCKED, memfd, virtToUncachedPhys(virtaddr+offset, pagemapfd));
	551	if (mappedPage != mem+offset) { //We need these mappings to be contiguous over virtual memory (in order to replicate the virtaddr array), so we must ensure that the address we requested from mmap was actually used.
	552	printf("Failed to create an uncached view of memory at addr %p+0x%08x\n", virtaddr, offset);
	553	exit(1);
	554	}
	555	}
	556	memset(mem, 0, bytes); //Although the cached version might have been reset, those writes might not have made it through.
	557	return mem;
	558	}
	559
	560	//free memory allocated with makeLockedMem
	561	void freeUncachedMemView(void* mem, size_t size) {
	562	size = ceilToPage(size);
	563	munmap(mem, size);
	564	}
	565
	566	//map a physical address into our virtual address space. memfd is the file descriptor for /dev/mem
	567	volatile uint32_t* mapPeripheral(int memfd, int addr) {
	568	///dev/mem behaves as a file. We need to map that file into memory:
	569	//NULL = virtual address of mapping is chosen by kernel.
	570	//PAGE_SIZE = map 1 page.
	571	//PROT_READ\|PROT_WRITE means give us read and write priveliges to the memory
	572	//MAP_SHARED means updates to the mapped memory should be written back to the file & shared with other processes
	573	//memfd = /dev/mem file descriptor
	574	//addr = offset in file to map
	575	void *mapped = mmap(NULL, PAGE_SIZE, PROT_READ\|PROT_WRITE, MAP_SHARED, memfd, addr);
	576	//now, *mapped = memory at physical address of addr.
	577	if (mapped == MAP_FAILED) {
	578	printf("failed to map memory (did you remember to run as root?)\n");
	579	exit(1);
	580	} else {
	581	printf("mapped: %p\n", mapped);
	582	}
	583	return (volatile uint32_t*)mapped;
	584	}
	585
	586
	587	void cleanup() {
	588	printf("Cleanup\n");
	589	//disable DMA. Otherwise, it will continue to run in the background, potentially overwriting future user data.
	590	if(dmaHeader) {
	591	writeBitmasked(&dmaHeader->CS, DMA_CS_ACTIVE, 0);
	592	usleep(100);
	593	writeBitmasked(&dmaHeader->CS, DMA_CS_RESET, DMA_CS_RESET);
	594	}
	595	//could also disable PWM, but that's not imperative.
	596	}
	597
	598	void cleanupAndExit(int sig) {
	599	cleanup();
	600	printf("Exiting with error; caught signal: %i\n", sig);
	601	exit(1);
	602	}
	603
	604	void sleepUntilMicros(uint64_t micros, volatile uint32_t* timerBaseMem) {
	605	//Note: cannot use clock_nanosleep with an absolute time, as the process clock may differ from the RPi clock.
	606	//this function doesn't need to be super precise, so we can tolerate interrupts.
	607	//Therefore, we can use a relative sleep:
	608	uint64_t cur = readSysTime(timerBaseMem);
	609	if (micros > cur) { //avoid overflow caused by unsigned arithmetic
	610	uint64_t dur = micros - cur;
	611	//usleep(dur); //nope, causes problems!
	612	struct timespec t;
	613	t.tv_sec = dur/1000000;
	614	t.tv_nsec = (dur - t.tv_sec1000000)1000;
	615	nanosleep(&t, NULL);
	616	}
	617	}
	618
	619
	620	//int64_t _lastTimeAtFrame0;
	621
	622	void queue(int pin, int mode, uint64_t micros, struct GpioBufferFrame* srcArray, volatile uint32_t* timerBaseMem, struct DmaChannelHeader* dmaHeader) {
	623	//This function takes a pin, a mode (0=off, 1=on) and a time. It then manipulates the GpioBufferFrame array in order to ensure that the pin switches to the desired level at the desired time. It will sleep if necessary.
	624	//Sleep until we are on the right iteration of the circular buffer (otherwise we cannot queue the command)
	625	uint64_t callTime = readSysTime(timerBaseMem); //only used for debugging
	626	uint64_t desiredTime = micros - FRAME_TO_USEC(SOURCE_BUFFER_FRAMES);
	627	sleepUntilMicros(desiredTime, timerBaseMem);
	628	uint64_t awakeTime = readSysTime(timerBaseMem); //only used for debugging
	629
	630	//get the current source index at the current time:
	631	//must ensure we aren't interrupted during this calculation, hence the two timers instead of 1.
	632	//Note: getting the curTime & srcIdx don't have to be done for every call to queue - it could be done eg just once per buffer.
	633	// It should be calculated regularly though, to counter clock drift & PWM FIFO underflows
	634	// It is done in this function only for simplicity
	635	int srcIdx;
	636	uint64_t curTime1, curTime2;
	637	int tries=0;
	638	do {
	639	curTime1 = readSysTime(timerBaseMem);
	640	srcIdx = dmaHeader->STRIDE; //the source index is stored in the otherwise-unused STRIDE register, for efficiency
	641	curTime2 = readSysTime(timerBaseMem);
	642	++tries;
	643	} while (curTime2-curTime1 > 1 \|\| (srcIdx & DMA_CB_TXFR_YLENGTH_MASK)); //allow 1 uS variability.
	644	//Uncomment the following lines and the above declaration of _lastTimeAtFrame0 to log jitter information:
	645	//int64_t curTimeAtFrame0 = curTime2 - FRAME_TO_USEC(srcIdx);
	646	//printf("Timing diff: %lli\n", (curTimeAtFrame0-_lastTimeAtFrame0)%FRAME_TO_USEC(SOURCE_BUFFER_FRAMES));
	647	//_lastTimeAtFrame0 = curTimeAtFrame0;
	648	//if timing diff is positive, then then curTimeAtFrame0 > _lastTimeAtFrame0
	649	//curTime2 - srcIdx2 > curTime1 - srcIdx1
	650	//curTime2 - curTime2 > srcIdx2 - srcIdx1
	651	//more uS have elapsed than frames; DMA cannot keep up
	652
	653	//calculate the frame# at which to place the event:
	654	int usecFromNow = micros - curTime2;
	655	int framesFromNow = USEC_TO_FRAME(usecFromNow);
	656	if (framesFromNow < 10) { //Not safe to schedule less than ~10uS into the future (note: should be operating on usecFromNow, not framesFromNow)
	657	printf("Warning: behind schedule: %i (%i uSec) (tries: %i) (sleep %llu -> %llu (wanted %llu))\n", framesFromNow, usecFromNow, tries, callTime, awakeTime, desiredTime);
	658	framesFromNow = 10;
	659	}
	660	int newIdx = (srcIdx + framesFromNow)%SOURCE_BUFFER_FRAMES;
	661	//Now queue the command:
	662	if (mode == 0) { //turn output off
	663	srcArray[newIdx].gpclr[pin>31] \|= 1 << (pin%32);
	664	} else { //turn output on
	665	srcArray[newIdx].gpset[pin>31] \|= 1 << (pin%32);
	666	}
	667	}
	668
	669	int main() {
	670	volatile uint32_t gpioBaseMem, dmaBaseMem, pwmBaseMem, timerBaseMem, *clockBaseMem;
	671	//emergency clean-up:
	672	for (int i = 0; i < 64; i++) { //catch all signals (like ctrl+c, ctrl+z, ...) to ensure DMA is disabled
	673	struct sigaction sa;
	674	memset(&sa, 0, sizeof(sa));
	675	sa.sa_handler = cleanupAndExit;
	676	sigaction(i, &sa, NULL);
	677	}
	678	setSchedPriority(SCHED_PRIORITY);
	679	//First, open the linux device, /dev/mem
	680	//dev/mem provides access to the physical memory of the entire processor+ram
	681	//This is needed because Linux uses virtual memory, thus the process's memory at 0x00000000 will NOT have the same contents as the physical memory at 0x00000000
	682	int memfd = open("/dev/mem", O_RDWR \| O_SYNC);
	683	if (memfd < 0) {
	684	printf("Failed to open /dev/mem (did you remember to run as root?)\n");
	685	exit(1);
	686	}
	687	int pagemapfd = open("/proc/self/pagemap", O_RDONLY);
	688	//now map /dev/mem into memory, but only map specific peripheral sections:
	689	gpioBaseMem = mapPeripheral(memfd, GPIO_BASE);
	690	dmaBaseMem = mapPeripheral(memfd, DMA_BASE);
	691	pwmBaseMem = mapPeripheral(memfd, PWM_BASE);
	692	timerBaseMem = mapPeripheral(memfd, TIMER_BASE);
	693	clockBaseMem = mapPeripheral(memfd, CLOCK_BASE);
	694
	695	int outPin = 10;
	696	//now set our pin as an output:
	697	volatile uint32_t fselAddr = (volatile uint32_t)(gpioBaseMem + GPFSEL0/4 + outPin/10);
	698	writeBitmasked(fselAddr, 0x7 << (3(outPin%10)), 0x1 << (3(outPin%10)));
	699	//Note: PWM pacing still works, even with no physical outputs, so we don't need to set gpio pin 18 to its alternate function.
	700
	701	//Often need to copy zeros with DMA. This array can be the source. Needs to all lie on one page
	702	void *zerosPageCached = makeLockedMem(PAGE_SIZE);
	703	void *zerosPage = makeUncachedMemView(zerosPageCached, PAGE_SIZE, memfd, pagemapfd);
	704
	705	//configure DMA...
	706	//First, allocate memory for the source:
	707	size_t numSrcBlocks = SOURCE_BUFFER_FRAMES; //We want apx 1M blocks/sec.
	708	size_t srcPageBytes = numSrcBlocks*sizeof(struct GpioBufferFrame);
	709	void *virtSrcPageCached = makeLockedMem(srcPageBytes);
	710	void *virtSrcPage = makeUncachedMemView(virtSrcPageCached, srcPageBytes, memfd, pagemapfd);
	711	printf("mappedPhysSrcPage: %p\n", virtToPhys(virtSrcPage, pagemapfd));
	712
	713	//cast virtSrcPage to a GpioBufferFrame array:
	714	struct GpioBufferFrame srcArray = (struct GpioBufferFrame)virtSrcPage; //Note: calling virtToPhys on srcArray will return NULL. Use srcArrayCached for that.
	715	struct GpioBufferFrame srcArrayCached = (struct GpioBufferFrame)virtSrcPageCached;
	716	//srcArray[0].gpset[0] = (1 << outPin); //set pin ON
	717	//srcArray[numSrcBlocks/2].gpclr[0] = (1 << outPin); //set pin OFF;
	718
	719	//configure PWM clock:
	720	(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD \| (((clockBaseMem + CM_PWMCTL/4))&(~CM_PWMCTL_ENAB)); //disable clock
	721	do {} while (*(clockBaseMem + CM_PWMCTL/4) & CM_PWMCTL_BUSY); //wait for clock to deactivate
	722	*(clockBaseMem + CM_PWMDIV/4) = CM_PWMDIV_PASSWD \| CM_PWMDIV_DIVI(CLOCK_DIV); //configure clock divider (running at 500MHz undivided)
	723	*(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD \| CM_PWMCTL_SRC_PLLD; //source 500MHz base clock, no MASH.
	724	*(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD \| CM_PWMCTL_SRC_PLLD \| CM_PWMCTL_ENAB; //enable clock
	725	do {} while (*(clockBaseMem + CM_PWMCTL/4) & CM_PWMCTL_BUSY == 0); //wait for clock to activate
	726
	727	//configure rest of PWM:
	728	struct PwmHeader pwmHeader = (struct PwmHeader)(pwmBaseMem);
	729
	730	pwmHeader->DMAC = 0; //disable DMA
	731	pwmHeader->CTL \|= PWM_CTL_CLRFIFO; //clear pwm
	732	usleep(100);
	733
	734	pwmHeader->STA = PWM_STA_ERRS; //clear PWM errors
	735	usleep(100);
	736
	737	pwmHeader->DMAC = PWM_DMAC_EN \| PWM_DMAC_DREQ(PWM_FIFO_SIZE) \| PWM_DMAC_PANIC(PWM_FIFO_SIZE); //DREQ is activated at queue < PWM_FIFO_SIZE
	738	pwmHeader->RNG1 = BITS_PER_CLOCK; //used only for timing purposes; #writes to PWM FIFO/sec = PWM CLOCK / RNG1
	739	pwmHeader->CTL = PWM_CTL_REPEATEMPTY1 \| PWM_CTL_ENABLE1 \| PWM_CTL_USEFIFO1;
	740
	741	//allocate memory for the control blocks
	742	size_t cbPageBytes = numSrcBlocks * sizeof(struct DmaControlBlock) * 3; //3 cbs for each source block
	743	void *virtCbPageCached = makeLockedMem(cbPageBytes);
	744	void *virtCbPage = makeUncachedMemView(virtCbPageCached, cbPageBytes, memfd, pagemapfd);
	745	//fill the control blocks:
	746	struct DmaControlBlock cbArrCached = (struct DmaControlBlock)virtCbPageCached;
	747	struct DmaControlBlock cbArr = (struct DmaControlBlock)virtCbPage;
	748	printf("#dma blocks: %i, #src blocks: %i\n", numSrcBlocks*3, numSrcBlocks);
	749	for (int i=0; i<numSrcBlocks*3; i += 3) {
	750	//pace DMA through PWM
	751	cbArr[i].TI = DMA_CB_TI_PERMAP_PWM \| DMA_CB_TI_DEST_DREQ \| DMA_CB_TI_NO_WIDE_BURSTS \| DMA_CB_TI_TDMODE;
	752	cbArr[i].SOURCE_AD = virtToUncachedPhys(srcArrayCached + i/3, pagemapfd); //The data written doesn't matter, but using the GPIO source will hopefully bring it into L2 for more deterministic timing of the next control block.
	753	cbArr[i].DEST_AD = PWM_BASE_BUS + PWM_FIF1; //write to the FIFO
	754	cbArr[i].TXFR_LEN = DMA_CB_TXFR_LEN_YLENGTH(1) \| DMA_CB_TXFR_LEN_XLENGTH(4);
	755	cbArr[i].STRIDE = i/3;
	756	cbArr[i].NEXTCONBK = virtToUncachedPhys(cbArrCached+i+1, pagemapfd); //have to use the cached version because the uncached version isn't listed in pagemap(?)
	757	//copy buffer to GPIOs
	758	cbArr[i+1].TI = DMA_CB_TI_SRC_INC \| DMA_CB_TI_DEST_INC \| DMA_CB_TI_NO_WIDE_BURSTS \| DMA_CB_TI_TDMODE;
	759	cbArr[i+1].SOURCE_AD = virtToUncachedPhys(srcArrayCached + i/3, pagemapfd);
	760	cbArr[i+1].DEST_AD = GPIO_BASE_BUS + GPSET0;
	761	cbArr[i+1].TXFR_LEN = DMA_CB_TXFR_LEN_YLENGTH(2) \| DMA_CB_TXFR_LEN_XLENGTH(8);
	762	cbArr[i+1].STRIDE = DMA_CB_STRIDE_D_STRIDE(4) \| DMA_CB_STRIDE_S_STRIDE(0);
	763	cbArr[i+1].NEXTCONBK = virtToUncachedPhys(cbArrCached+i+2, pagemapfd);
	764	//clear buffer (TODO: investigate using a 4-word copy ("burst") )
	765	cbArr[i+2].TI = DMA_CB_TI_DEST_INC \| DMA_CB_TI_NO_WIDE_BURSTS \| DMA_CB_TI_TDMODE;
	766	cbArr[i+2].SOURCE_AD = virtToUncachedPhys(zerosPageCached, pagemapfd);
	767	cbArr[i+2].DEST_AD = virtToUncachedPhys(srcArrayCached + i/3, pagemapfd);
	768	cbArr[i+2].TXFR_LEN = DMA_CB_TXFR_LEN_YLENGTH(1) \| DMA_CB_TXFR_LEN_XLENGTH(sizeof(struct GpioBufferFrame));
	769	cbArr[i+2].STRIDE = i/3; //might be better to use the NEXT index
	770	int nextIdx = i+3 < numSrcBlocks*3 ? i+3 : 0; //last block should loop back to the first block
	771	cbArr[i+2].NEXTCONBK = virtToUncachedPhys(cbArrCached + nextIdx, pagemapfd); //(uint32_t)physCbPage + ((void*)&cbArr[(i+2)%maxIdx] - virtCbPage);
	772	}
	773	for (int i=0; i<cbPageBytes; i+=PAGE_SIZE) {
	774	printf("virt cb[%i] -> phys: 0x%08x (0x%08x)\n", i, virtToPhys(i+(void)cbArrCached, pagemapfd), virtToUncachedPhys(i+(void)cbArrCached, pagemapfd));
	775	}
	776	//source: http://virtualfloppy.blogspot.com/2014/01/dma-support-at-last.html
	777	//cat /sys/module/dma/parameters/dmachans gives a bitmask of DMA channels that are not used by GPU. Results: ch 1, 3, 6, 7 are reserved.
	778	//dmesg \| grep "DMA"; results: Ch 2 is used by SDHC host
	779	//ch 0 is known to be used for graphics acceleration
	780	//Thus, applications can use ch 4, 5, or the LITE channels @ 8 and beyond.
	781	//If using LITE channels, then we can't use the STRIDE feature, so that narrows it down to ch 4 and ch 5.
	782	int dmaCh = 5;
	783	//enable DMA channel (it's probably already enabled, but we want to be sure):
	784	writeBitmasked(dmaBaseMem + DMAENABLE, 1 << dmaCh, 1 << dmaCh);
	785
	786	//configure the DMA header to point to our control block:
	787	dmaHeader = (struct DmaChannelHeader)(dmaBaseMem + DMACH(dmaCh)/4); //must divide by 4, as dmaBaseMem is uint32_t
	788	printf("Previous DMA header:\n");
	789	logDmaChannelHeader(dmaHeader);
	790	//abort any previous DMA:
	791	//dmaHeader->NEXTCONBK = 0; //NEXTCONBK is read-only.
	792	dmaHeader->CS \|= DMA_CS_ABORT; //make sure to disable dma first.
	793	usleep(100); //give time for the abort command to be handled.
	794
	795	dmaHeader->CS = DMA_CS_RESET;
	796	usleep(100);
	797
	798	writeBitmasked(&dmaHeader->CS, DMA_CS_END, DMA_CS_END); //clear the end flag
	799	dmaHeader->DEBUG = DMA_DEBUG_READ_ERROR \| DMA_DEBUG_FIFO_ERROR \| DMA_DEBUG_READ_LAST_NOT_SET_ERROR; // clear debug error flags
	800	uint32_t firstAddr = virtToUncachedPhys(cbArrCached, pagemapfd);
	801	printf("starting DMA @ CONBLK_AD=0x%08x\n", firstAddr);
	802	dmaHeader->CONBLK_AD = firstAddr; //(uint32_t)physCbPage + ((void*)cbArr - virtCbPage); //we have to point it to the PHYSICAL address of the control block (cb1)
	803	dmaHeader->CS = DMA_CS_PRIORITY(7) \| DMA_CS_PANIC_PRIORITY(7) \| DMA_CS_DISDEBUG; //high priority (max is 7)
	804	dmaHeader->CS = DMA_CS_PRIORITY(7) \| DMA_CS_PANIC_PRIORITY(7) \| DMA_CS_DISDEBUG \| DMA_CS_ACTIVE; //activate DMA.
	805
	806	uint64_t startTime = readSysTime(timerBaseMem);
	807	printf("DMA Active @ %llu uSec\n", startTime);
	808	/*while (dmaHeader->CS & DMA_CS_ACTIVE) {
	809	logDmaChannelHeader(dmaHeader);
	810	} //wait for DMA transfer to complete.*/
	811	for (int i=1; ; ++i) { //generate the output sequence:
	812	//logDmaChannelHeader(dmaHeader);
	813	//this just toggles outPin every few us:
	814	queue(outPin, i%2, startTime + 1000*i, srcArray, timerBaseMem, dmaHeader);
	815	}
	816	//Exit routine:
	817	cleanup();
	818	printf("Exiting cleanly:\n");
	819	freeUncachedMemView(virtCbPage, cbPageBytes);
	820	freeLockedMem(virtCbPageCached, cbPageBytes);
	821	freeUncachedMemView(virtSrcPage, srcPageBytes);
	822	freeLockedMem(virtSrcPageCached, srcPageBytes);
	823	freeUncachedMemView(zerosPage, PAGE_SIZE);
	824	freeLockedMem(zerosPageCached, PAGE_SIZE);
	825	close(pagemapfd);
	826	close(memfd);
	827	return 0;
	828	}
	829

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages